The RAG Stack: Unlocking the Power of Knowledge Graphs

The adoption of Generative AI by enterprises is accelerating, with CIOs eager to move use cases into production despite challenges with model accuracy and hallucinations. Enterprises are becoming more sophisticated in deploying large models, using fine-tuning and RAG.

RAG will remain crucial in ‘compound systems,’ applied alongside fine-tuning for various use cases. Significant labor cost savings from Generative AI will rely heavily on RAG, especially for information retrieval tasks. As focus shifts to the ‘RAG stack,’ knowledge graphs will be key for more complex RAG processes and better performance.

Enabling Enterprise Adoption

We’re still in the early stages of true enterprise adoption of generative AI, with only a small percentage of CIOs having moved LLM projects into production so far.

This slow adoption persists despite significant boardroom pressure to accelerate AI initiatives. Early adopters are starting to shift from experimental innovation budgets to software and labor budgets. In Q1 2024, 73% of CIOs reported that AI/ML directly impacted their investment priorities, driven by the potential to save trillions in labor costs through Generative AI.

Challenges in Production

Model output accuracy and hallucinations are the main obstacles preventing enterprises from moving LLM use cases into production. This has led to a bifurcation in adoption cycles between internal and external use cases, with a much lower tolerance for hallucinations in external applications.

Advanced Deployment Strategies

Enterprises are becoming more sophisticated in various deployment strategies, such as combining SLMs with LLMs, optimizing inference, model routing, and employing agentic design patterns. However, the fundamental requirement for deploying simple applications remains the retrieval systems around off-the-shelf or fine-tuned models.

Fine-tuning and Retrieval-Augmented Generation (RAG) are the two primary methods companies use to customize LLMs, and they can often be used together.

There is growing agreement that out-of-the-box LLMs need augmentation:

“As more developers begin to build using LLMs, we believe that this focus is rapidly changing: state-of-the-art AI results are increasingly obtained by compound systems with multiple components, not just monolithic models.”

In enterprises, Databricks found that 60% of LLM applications utilize some form of RAG, while 30% employ multi-step chains.

However, from an economic perspective, RAG is often the better option for most enterprises:

An enterprise’s knowledge base is dynamic and constantly updating; repeated fine-tuning runs can become prohibitively expensive, whereas retrieval data sources can be updated cheaply, supporting more real-time factuality in critical use cases.
Context windows and recall are unlikely to reach their limits soon, but retrieving relevant context will always be cheaper than expanding context windows.
Each step of the RAG process (pre-processing, chunking strategy, post-processing, generation) can be optimized for performance gains.
RAG can incorporate citations to provide an audit trail and clear traceability of answers to sources, increasing trust in model outputs.

For public companies, every basis point of margin deceleration is heavily scrutinized by analysts. RAG will be crucial for enterprises to deploy LLMs at scale while maintaining cost efficiency.

Understanding The RAG Stack

A simple RAG stack

A basic Retrieval-Augmented Generation (RAG) stack works as follows: documents are converted into embeddings (mathematical representations) and stored in a vector database. When a query is made, it is also converted into embeddings, which are then compared with the stored vectors. The most semantically similar information is retrieved and sent to the LLM to construct an answer.

Advanced vs. Simple RAG Stacks

Advanced RAG stacks differ from simple ones by offering more refined manipulation of:

Data as it enters the system
Data representation and storage
Data processing by the LLM

Complexity in RAG Systems

Complex RAG systems handle intricate queries by using multi-hop retrieval, which extracts and combines information from multiple sources. This approach is crucial for answering complex questions that require linking information from various documents.

Importance of Knowledge Graphs

A key but often overlooked component of RAG stacks is knowledge graphs. These graphs enhance RAG performance by adding context specific to a company or domain, thereby unlocking the full potential of Generative AI. Given the high proportion of white-collar tasks involving information retrieval, improvements in RAG performance can yield significant economic benefits.

ServiceNow’s AI Moment

ServiceNow, with approximately $10 billion in ARR, is fully embracing AI. The workflow automation platform’s AI SKU has seen its fastest adoption rate ever. During their Analyst Day, management highlighted notable productivity gains for enterprises:

30% reduction in mean time to resolution
Over 80% improvement in self-service deflection
25% increase in developer velocity

ServiceNow integrates Knowledge Graphs to enhance their AI capabilities. These graphs make LLMs more deterministic by providing structured data on employee relationships, services, and integrations. By using knowledge graphs, ServiceNow’s enterprise customers can achieve more reliable and production-grade use cases through improved context retrieval in baseline RAG workloads.

The emerging advanced RAG stack

The emerging RAG stack includes several key components:

Data Pipes/Extraction:

Unstructured data is sourced from specific locations, with data extraction being a major challenge (e.g., text from PowerPoint or PDF documents).

Vector Databases:

Store mathematical embeddings or vectors representing the text.

Vector Ops:

Optimize processes for vector creation, optimization, and analytics.

Graph Databases:

Store structures representing semantic connections and relationships within the text.

Graph Ops:

Optimize processes for graph creation, orchestration, and analytics, requiring more workflow tools than Vector Ops due to various hierarchical and network representations.

Graph/Vector Databases:

Graphs can store knowledge or manage data retrieval from vector databases, either replacing or working alongside them.

LLM Orchestration:

Abstraction layers simplify managing information with multi-agent systems.

LLM:

Foundational models that RAG systems use to generate relevant answers.

Knowledge Graphs as an unlock

Knowledge graphs, pivotal for the future, consist of nodes and relationships, modeling complex data relationships similar to human thought processes.

Traditional Use: Historically, knowledge graphs linked terms across data silos, aiding big data analytics through a manual, labor-intensive process used mainly by large companies.

Modern Use in LLM RAG Systems: Now, they enhance LLM RAG systems by providing explicit word connections, reducing hallucinations, adding context, acting as memory, personalizing responses, and supplementing probabilistic LLMs. LLMs also automate knowledge graph creation, increasing their utility.

Roles in RAG Systems:

As a Data Store: Retrieving information, either in parallel with or replacing vector databases.
As a Semantic Structure Store: Storing semantic structures to retrieve vector chunks alongside vector databases.

Knowledge graphs improve RAG systems’ performance and reliability.

Knowledge graphs are crucial for optimizing RAG systems by structuring data effectively to enhance retrieval accuracy and context relevance. They play pivotal roles in various scenarios:

Data Structuring and Management: Integrating knowledge graphs involves understanding data importance, choosing optimal knowledge representation types (e.g., document structure or concept-based relationships), and managing interactions within RAG systems using workflow tools and discrete data pipelines.

Key Scenarios:

Conceptual Aggregation: Combining information from diverse sources, such as contact lists and industry data in venture capital management, enhances multi-hop reasoning capabilities for comprehensive insights.
Conceptual Alignment: Integrating new information with existing knowledge bases, like linking clinical care practices with patient records in healthcare, ensures seamless adoption of updated best practices for improved patient care.
Hierarchical Retrieval: Using hierarchical steps to categorize symptoms and diseases based on specific criteria, such as dog breeds in veterinary healthcare, enhances precision in complex decision-making processes.
Hierarchical Recommendation: Enhancing recommendation systems with hierarchical categorization, as seen in platforms like Airbnb and DoorDash, ensures contextually relevant suggestions aligned with user preferences.
Personalization and Memory: Personalizing user interactions by storing and retrieving user-specific preferences over time, such as hobbies in personal finance AI systems, ensures tailored recommendations that match individual interests.

Non-Relevant Scenarios

Knowledge graphs may not be necessary in scenarios with minimal errors that can be managed manually or in consumer-facing applications with higher error tolerance. Additionally, non-hierarchical FAQ-style knowledge bases or time-series data may not significantly benefit from knowledge graph implementations.

Framework for Assessment

Evaluate whether underlying data structures benefit from hierarchical or networked representations to determine the suitability of knowledge graphs for optimizing data management and decision-making processes within tailored RAG systems.

Case Study: Knowledge Graphs in Veterinary Healthcare

A veterinary healthcare startup sought to improve diagnostic accuracy across animal breeds using AI tools for radiologists. Initially, integrating animal history, expert opinions, and medical data from a vector database into LLM-based reports yielded inaccurate results due to generic disease information.

Partnering with WhyHow.AI, they implemented:

A focused knowledge graph on animal breeds, diseases, and treatments.
Continuous updates without manual rebuilding.

Impact:

Tailored retrieval per breed ensured precise diagnoses and treatments.
“WhyHow.AI’s knowledge graph transformed radiology reporting, optimizing data for specific workflows.” — WhyHow.AI design partner.

Knowledge Graphs: Strategic Long-Term Advantage

In the AI era, companies leverage knowledge graphs as a strategic asset, structuring proprietary data for actionable insights. Unlike training specific models, knowledge graphs provide scalable representations of industry expertise, enhancing RAG systems and future model refinement.

Key Insights:
“Structuring knowledge offers lasting advantages without model training costs.” — WhyHow.AI design partner (large listed decacorn).

Future Trends: Small Graphs & Multi-Agent Systems

Emerging RAG systems favor small, independently managed graphs over comprehensive ones. This approach supports multi-agent setups, improving data retrieval accuracy and efficiency in complex workflows.

Illustration:

Multi-agent systems streamline hospital onboarding by managing scoped data graphs, minimizing errors in dynamic information retrieval.

By adopting multi-graph systems, enterprises enhance information retrieval capabilities in evolving AI landscapes.

Conclusion:

As RAG gains traction in enterprise AI adoption, knowledge graphs are pivotal for enhancing deterministic outcomes in large probabilistic models. They are crucial infrastructure for future AI innovations like multi-agent systems. With AI’s potential to save trillions in labor costs, especially in information retrieval, the RAG stack is essential for deploying reliable internal and external use cases effectively.

The RAG Stack: Unlocking the Power of Knowledge Graphs

Resources