IceRock Assistant GPT

How We Taught AI to Answer Employee Questions and Reduced Internal Distractions

This time, IceRock acted as its own customer. The goal of the project was to solve an internal business problem: reduce distractions for the developers and improve access to our knowledge base.

We developed an intelligent chatbot integrated into the corporate Slack. This bot accesses our internal knowledge base in Confluence in real time, finds relevant information, and leverages GPT capabilities to provide employees with accurate and detailed answers to their questions.

The project started as an R&D initiative by one of our top developers, Alexey, aimed at exploring the emerging Retrieval-Augmented Generation (RAG) technology. It has since evolved into a full-fledged internal tool that saves time for dozens of our colleagues.

Task

In any fast-growing IT company, the volume of internal documentation, protocols, and guides grows exponentially. At IceRock, Confluence serves as a centralized repository for this knowledge. However, our experience shows that simply having a knowledge base does not guarantee its usage.

We encountered a classic problem:

Repeated questions in Slack: Employees, especially new ones, regularly used general channels to ask the same questions: “How do I request time off?”, “Where can I find a report template?”, “What is our code review protocol?”
Team distractions: All these questions had to be answered by other employees, most often senior developers or team leads. This pulled them out of their workflow, reduced productivity, and resulted in a loss of valuable working time.
Passive nature of the knowledge base: While information was available in Confluence, it was often easier and faster for people to ask their questions in chat rather than search for the appropriate document themselves.

Our task was twofold:

Business task: Create a tool that could handle routine responses to questions, allowing developers to focus on priority tasks. We needed to “revive” our knowledge base by making it a proactive participant in communication.
Technical task (R&D): The project also served as our “sandbox” for studying and applying RAG technology. It was crucial for us to understand how to make large language models (LLMs) respond using private, non-public data that is not included in their basic training sets.

Solution

We developed the IceRock Assistant GPT system, consisting of a backend service and Slack integration.

The solution is a chatbot that can be summoned in any Slack thread by mentioning it (@IceRock Assistant).

Key user scenario:

An employee has a question. They create a thread in the relevant Slack channel, mention the bot, and ask their question.
The bot responds instantly with: ”Please wait, searching for information...”
The backend service receives the question, analyzes it, searches for relevant documents in our Confluence knowledge base, and then uses OpenAI GPT to generate a detailed, user-friendly response.
Finally, the bot posts the generated response in the thread.

Key features of the solution:

Contextual “memory: The bot works within Slack threads. It “remembers” the entire comment history in the thread and takes it into account when generating subsequent responses. This allows you to communicate with the bot and ask follow-up questions (for example: “What if I’m a manager?”) — it will understand what you're talking about.
Hybrid response mode: If the bot finds relevant information in Confluence, it responds strictly based on that information. If there is no answer in our knowledge base (for example, to the question “What is the weather like in Lisbon?”), the bot responds like a standard ChatGPT, using its general knowledge base.

Development Process

The entire process was built around the RAG (Retrieval-Augmented Generation) architecture, which allows LLM queries to be “supplemented” with relevant data from external sources. The process can be divided into two independent pipelines: Indexing and Query Processing.

Knowledge Base Indexing Pipeline

For the bot to be able to find something, you first need to prepare the information and “feed” it to the bot. This process runs in the background on a schedule (every hour):

Document search: Our Kotlin backend service calls the Confluence API and searches for all pages marked with a special label known as ira (IceRock Assistant) label. This gives us flexible control over the bot’s access to specific knowledge.
Version checks: The system checks whether a page has changed since the last indexing.
Chunking: If a page is new or has been updated, its content is downloaded and divided into small logical pieces called “chunks.” This ensures effective vector search.
Creating embeddings: Each chunk of text is sent to the OpenAI API (text-embedding model), which returns its vector representation, known as an “embedding.” An embedding is essentially a long array of numbers that mathematically describes the meaning of a particular piece of text.
Saving to databases:
- The obtained vector (array of numbers) is saved to a specialized vector database called Qdrant.
- The source text of the chunk, along with the metadata (document ID, version number), is saved to a traditional relational database — PostgreSQL.

Message Processing Pipeline (Response to the User)

This process is triggered every time a user mentions the bot in Slack:

Receiving a query: Slack sends a webhook to our Kotlin backend with the message text and thread history.
Feedback: The backend immediately responds to Slack with a “Please wait...” message so that the user knows their request has been accepted.
Vectorization of the request: The backend takes the text of the user's question (along with the thread history for context) and sends it to the OpenAI API to obtain a “query vector.”
Vector search (Retrieval): This query vector is used to search the Qdrant database. Qdrant quickly finds the most semantically similar document vectors in its collection. For example, if a user asks, “How do I go about taking a vacation?”, Qdrant will find vectors that correspond to chunks of text about “taking time off,” “submitting a vacation request,” etc.
Contextual retrieval: Qdrant returns the IDs of the most relevant vectors. Using these IDs, our backend accesses PostgreSQL and retrieves the corresponding source text fragments from there.
Prompt augmentation: The backend forms the final, large prompt for the main LLM (GPT). This prompt contains:
- A system instruction (“You are an assistant...”).
- The comment history from the Slack thread (for “memory”).
- The text fragments found in PostgreSQL (the “augmented” context).
- The user's original question.
Response generation: This prompt is sent to the OpenAI API (GPT model). The model generates a response based primarily on the context provided from our knowledge base.
Sending the response: The generated response is sent to the Slack thread, replacing the “Please wait...” message.

The Hardest Part

Just like with any R&D project, the main challenges were the concepts rather than the code.

Challenge 1: Ensuring data relevance. A knowledge base is a living entity, with documents constantly being updated. If the bot responds with outdated information, it could do more harm than good.

How we solved this task
We implemented an automatic background indexer. Our Kotlin backend polls Confluence on a schedule (once an hour). It does not just download everything indiscriminately; instead, it intelligently checks the version numbers of documents it has already indexed. If the version in Confluence is newer than the one in our database, the backend automatically downloads the new version, recreates the embeddings, and updates them in the vector database. This ensures that the bot responds based on the latest information with minimal delay.

Challenge 2: Contextual “amnesia.”. The first version of the bot only responded to single requests, which is not how users typically communicate. Users ask follow-up questions, such as “What if I'm a team lead?” or “Is it the same for the Android department?” Without the dialog history, the bot did not understand what these questions were referring to.

How we solved this task
In the end, we tied the bot's logic to Slack threads. With each new query, our backend asks the Slack API for the entire message history in the current thread. This history is given to the LLM as part of the prompt, allowing the model to maintain the context of the exchange and provide relevant answers to follow-up questions.

Challenge 3: Architecture visualization.. The system turned out to be multi-component (Slack, Backend, Confluence, Qdrant, PostgreSQL, OpenAI), making it challenging to explain how everything fits together.

How we solved this task
During development, we actively used Mermaid, a code-based diagramming tool. This allowed us to describe and maintain two key diagrams directly in the project's README file in GitLab: “Knowledge Base Indexing” and “Message Processing.” This simplified both the development and the subsequent transfer of knowledge about the project.

Technology Stack

Platform

Slack

Backend

Kotlin

Data source (knowledge base)

Confluence

LLM and embeddings

OpenAI

Vector database

Qdrant

Relational database

PostgreSQL

Documentation and diagrams

Mermaid

Results

Creation of a working internal product: “IceRock Assistant GPT” has been successfully implemented and is being used by employees.

Reduced team workload: The bot handles most of the typical questions, allowing developers to stay focused on their main tasks.

“Revitalized” knowledge base: Our documentation in Confluence has evolved from a passive repository into an active tool directly integrated into the communication workflow.

Invaluable R&D experience: The IceRock team has gained first-hand experience of working with RAG, one of today's most sought-after AI technologies. We have learned all the intricacies of working with vector databases (Qdrant), the OpenAI API, and the logic behind building complex AI assistants.

Foundation for future products: This internal project has become the basis for future commercial offerings related to creating custom GPT assistants for our clients, trained on their own corporate data.

Let’s discuss your project!

It is free. We will tell you how the application will solve your problems.