SuperPower Engineer

Download App

Team News

May 18, 2025

Part 2: Giving our Rag Pipeline Superpowers: Memory, Management, and Ninja Mode

Integrating Memory into our LLM

Over the past two weeks, we've significantly upgraded our Retrieval-Augmented Generation (RAG) pipeline by integrating memory, ensuring our model remembers and utilizes previous interactions to enrich future responses.

By nature, Large Language Models (LLMs) are stateless. Each interaction stands alone without memory of previous conversations. To overcome this limitation, we've integrated a powerful framework offering multiple memory strategies:

Conversation Buffer Memory: Retains the entire conversation history, providing full context in subsequent interactions.
Conversation Buffer Window: Stores only a specific number of recent interactions, managing memory size efficiently.
Conversation Summary: Generates summaries of past conversations, allowing the model to retain context without overwhelming it with details.

For our specific use case, we opted for Conversation Buffer Memory, specifically retaining the last five conversation exchanges, balancing detailed memory with performance efficiency.

A computer on a table

AI-generated content may be incorrect.

Pinecone Integration and Multi-user Management

As we refined our RAG pipeline, we incorporated Pinecone, a test database, to systematically store conversation histories. Each interaction (user query and AI response) is indexed in Pinecone, effectively serving as a persistent memory store.

We advanced our setup further by assigning unique index within Pinecone for different users. Each user's conversation strings are independently stored within the user-allotted index under a specific namespace, which enhances user data management and security.

Tackling Cross-contamination in Document Retrieval

A critical issue we addressed was the potential cross-contamination, where the LLM could inadvertently mix documents from unrelated contexts to generate answers. Our solutions involved:

User-Specific Concatenation: Identifying the user's specific system type (e.g. type of alarm system) and appending it directly to their queries. This ensured retrieval of only relevant documents tailored to the user's context.
Namespace Segregation by Manufacturer: Creating separate namespaces within Pinecone for each alarm system manufacturer, further refining document retrieval and eliminating cross-contamination.

We're continuing extensive research to further optimize this aspect in the coming weeks.

Ninja Mode: Advanced Reasoning Activated

We introduced an exciting new feature, named Ninja Mode, transforming our AI assistant into an advanced troubleshooting expert. When activated, Ninja Mode leverages OpenAI’s state-of-the-art reasoning model, o1.

This advanced model enables the assistant to intelligently gather comprehensive contextual information, analyse complex situations methodically, and deliver highly precise troubleshooting insights.

Ninja Mode essentially provides our users with an expert-level AI technician capable of handling intricate queries with advanced reasoning capabilities.

Cartoon character in a black garment

AI-generated content may be incorrect.

Stay tuned as we continue to push the boundaries of AI capabilities, optimising our RAG pipeline further and exploring new frontiers in memory management and advanced AI troubleshooting!

‹ Superpower Countdown: Ready, Set, Demo! - Part 3

Dev Team Progress update - Part 1 ›

Blogs