Iskra Telecom is a major Internet service provider serving thousands of private and corporate customers.
IceRock Development, in collaboration with Iskra Telecom, developed and integrated a solution based on a Large Language Model (LLM) directly into the customer’s custom task tracker. A key feature of the project is its 100% on-premise deployment within the company's closed network.
The system analyzes long strings of comments in tasks (often ranging from 50 to 100+) and, with a single button press, generates a summary of the current status, completed work, and next steps. This solution has significantly reduced the time managers and operators spend getting into any particular task while completely eliminating the risk of confidential customer data leaking to external AI services.
Iskra Telecom's business processes are built around a self-developed task tracker. The company’s management has implemented a mandatory rule stating that all work, especially related to customer requests (e.g., “lost Internet connection,” “technical connection,” etc.), must be documented in detail in the comments for each task.
While this solved the problem of missing history, it created a new issue of information overload, which was equally serious.
When a new employee joined a task or a manager wanted to get a sense of its status, they had to manually sift through dozens — and sometimes even hundreds — of comments. The essence of the problem was getting lost in a barrage of technical information, customer clarifications, internal communications, as well as general noise and flood.
We were given a clear business task:
We proposed a solution that combined the power of modern LLMs with strict security requirements — a local AI summarizer.
We integrated a “Generate Summary” button into the task tracker interface. Clicking this button performs the following actions:
This way, any employee who opens a task can immediately grasp the issue without needing to read through a “wall” of comments, significantly boosting operational speed.
The process was divided into several key stages:
Architecture and technology stack selection
The main challenge was the requirement to avoid cloud-based AI. The solution had to operate within a “closed loop.” We immediately abandoned the idea of external APIs and focused on delivering an on-premise solution.
This required three components:
Hardware configuration
Following our recommendation, the customer purchased a specialized workstation — a Mac Studio with 96 GB of unified memory. This machine was chosen as the optimal solution in terms of price-to-performance ratio for local inference tasks (running AI). It was installed in the company's server room.
AI server deployment
We used LM Studio software. This tool allows you to download and run almost any open-weight model in just a few clicks. Most importantly, it automatically creates a local API server. This server fully replicates the OpenAI API, allowing our backend developers to use their familiar libraries and tools by simply changing the URL to the internal address of the Mac Studio.
Model selection and testing
We tested several models, including different versions of LLaMA and Mistral. Ultimately, the Qwen model (developed by Alibaba) showed the best results for summarizing technical texts in Russian. We experimented with different-sized versions to find the optimal balance between response speed and summary quality.
Integration
The integration process was the fastest stage:
The entire process, from purchasing equipment to putting it into operation, took about a week, demonstrating how efficient and quick deployment can be when you have the necessary expertise.
The hardest and trickiest part was finding a workaround for security restrictions.
The problem was that the comments from the tasks contained a vast array of confidential information, including customers' personal data (such as names and addresses), technical details of the network, and internal team discussions.
Transferring such information to any external service, be it ChatGPT, Claude, or their equivalents, was out of the question. This posed a direct risk of violating Federal Law No. 152 On Personal Data and incurring reputational damage for the telecom operator.
Regular, easy-to-integrate cloud AI solutions were strictly prohibited. We needed to figure out how to harness the full power of modern LLMs without sending a single byte of data outside the company.
We solved this problem by building a completely isolated (“air-gapped”) AI loop within the customer's infrastructure.
Thus, the entire lifecycle of a summarization request looks like this:
Employee's browser (internal network) ➔ Backend task tracker (internal network) ➔ Mac Studio with LM Studio (internal network) ➔ ... and back.
No packets containing confidential data ever escape the company’s environment. We solved the problem by demonstrating that cutting-edge AI technologies can be implemented even under the most rigorous corporate security requirements.
The results exceeded all expectations. We did not just add “another feature”; we fundamentally changed the speed of information processing within the company.
This case clearly demonstrates that you do not need to sacrifice security to implement AI. The right architecture and use of on-premise solutions allow you to take full advantage of the technology while maintaining complete control over your data.