Mavue Chatbot Assistant
To assist our customers through the application and allow them to access and manipulate their data, we have created an AI-based service. The first version of the UI was a chat interface, enabling users to communicate and ask questions about the application, their specific data, or even general ESRS or sustainability-related topics.
Technical Implementation
We have created a separate backend service leveraging FastAPI with Python. It implements a RESTful API and streams answers live to the clients. All LLM calls are made against OpenAI's ChatGPT 4.0. Additionally, LangChain is heavily used.
When the user asks a question, the web client hits the API. An initial LLM call determines if it is a simple question or an action.
If it's a Simple Question
The documentation agent takes over. This agent is essentially a RAG system (Retriever-Augmented Generation) leveraging Pinecone. The vector database is fed with documentation from Intercom and the company's Google Drive, which is created and controlled by the product and sustainability teams. To improve the performance and accuracy of the results, we use techniques like query expansion and cross-encoder re-ranking, as well as tools like embedding adaptors and Torch for training.
If it's an Action
The action agent takes over. The agent starts with a RAG to find the closest instruction. Instructions are created and documented by developers to help the agent take certain steps. After retrieving the instruction, the agent follows each step, with each step being an LLM call with a precise instruction. Some steps require a server call to the main backend service.
A Security Note: We preferred API calls to direct access to the database at this stage since this project is very dynamic, and we wanted to ensure that the LLM doesn't have full access to the database. When it calls the API, we have different levels of authentication and authorization, and we can also ensure that users have access to their specific data. These layers are already implemented, allowing us to save time by not recreating them.
The agent is able to create GraphQL calls, hit the server, fetch the results, parse and analyze them. At the end, the agent looks at the whole process and streams the result to the web client. Some real-world examples that our users have asked for include:
- Invite as a normal user to the app.
- Make an ESRS report for 2023.
- Add 6000 liters of water consumption for our headquarters in 2023.
The next stpes
- Use an open-source LLM instead, fine-tuning it to use less precise instructions. It can be more flexible and dynamic. We won't need to introduce new instructions, saving a lot of time on maintaining them.
- Gather user feedbacks and fine-tuning our LLMs and adapters.