Large Language Models
We use various Large Language Models (LLMs) from OpenAI and cohere. We use a Retrieval-Augmented-Generation (RAG) approach where relevant information is provided as part of the context to the LLM. The LLM is instructed to focus on the provided information. cohere models are used for reranking text elements as part of the RAG pipeline. This improves the answer quality and reduces the risk of hallucination.
Hosting
The LLMs are all hosted by Microsoft in the data center Switzerland North for OpenAI models and Sweden for cohere models. No user data (e.g. knowledge base content, user prompts) are used for LLM model training or fine-tuning. Microsoft is an approved sub processor for ZOA.
Embeddings
In order to find the relevant knowledge base items, we apply the RAG approach mentioned above. When a knowledge base item is saved, we generate embeddings from it and store the embeddings vector in the database. If the user asks a prompt, we first need to generate an embeddings vector from the user prompt in order that we can find the semantically most relevant knowledge base entries. The content of the most similar and relevant knowledge base items is then used as context for the LLM to answer the user prompt.
Storage
All AI conversations are stored on our server. Only the user can access her / his personal conversations. The company administrator of a ZOA account does not have access to AI conversations of the team members.