RAG — Retrieval-Augmented Generation

7 min readDec 18, 2023

“Questions are never indiscreet, answers sometimes are.” — Oscar Wilde

Retrieval-Augmented Generation (RAG) involves enhancing the output of a substantial language model by incorporating references from an authoritative knowledge base external to its training data sources before producing a response. Large Language Models (LLMs) undergo training on extensive datasets, employing billions of parameters to generate original content for tasks such as answering questions, language translation, and sentence completion. RAG expands upon the already formidable capabilities of LLMs by tailoring them to specific domains or an organization’s internal knowledge base, all without necessitating a retraining of the model. This approach proves to be a cost-effective means of refining LLM output, ensuring its continued relevance, accuracy, and utility across diverse contexts.

Why is Retrieval-Augmented Generation important?

Large Language Models (LLMs) stand as a pivotal artificial intelligence (AI) technology fueling intelligent chatbots and various natural language processing (NLP) applications. The objective is to develop bots capable of addressing user queries across diverse contexts by cross-referencing authoritative knowledge sources. Nevertheless, the inherent nature of LLM technology introduces an element of unpredictability in its responses. Furthermore, the static nature of LLM training data imposes a knowledge cut-off date, limiting its awareness of recent developments.

Recognized challenges associated with LLMs encompass:

Presenting inaccurate information when lacking a suitable answer.
Offering outdated or generic information instead of a specific, current response expected by the user.
Formulating responses sourced from non-authoritative origins.
Generating inaccuracies due to terminology confusion, where disparate training sources use identical terms to describe different concepts.

A fitting analogy for the Large Language Model is that of an excessively enthusiastic new employee who, despite confidently answering every question, remains uninformed about current events. Regrettably, such an approach can erode user trust and is not a trait desirable for chatbots to emulate.

Retrieval-Augmented Generation (RAG) emerges as a solution to address some of these challenges. RAG directs the LLM to retrieve pertinent information from pre-established, authoritative knowledge sources. This strategy provides organizations with increased control over the generated text output, offering users transparency into the LLM’s response-generation process.

Benefits of RAG

Retrieval-Augmented Generation (RAG) technology offers several key benefits for organizations engaged in generative AI development. Firstly, RAG facilitates cost-effective implementation by mitigating the high computational and financial expenses associated with retraining foundation models (FMs) for organization-specific data. This approach enhances the accessibility and usability of generative artificial intelligence (generative AI) technology.

Secondly, RAG ensures the currency of information by enabling developers to integrate the latest research, statistics, or news into generative models. By connecting the Large Language Model (LLM) directly to live social media feeds, news sites, or other frequently updated sources, RAG allows the LLM to deliver the most recent information to users, addressing the challenge of maintaining relevance.

Moreover, RAG contributes to enhanced user trust by allowing the LLM to present accurate information with transparent source attribution. The inclusion of citations or references in the output enables users to verify information independently, fostering confidence in the reliability of the generative AI solution.

Lastly, RAG provides developers with increased control over chat applications. Developers can efficiently test and improve applications, adapt the LLM’s information sources to changing requirements, and manage sensitive information retrieval at different authorization levels. This level of control ensures that the LLM generates appropriate responses and allows for troubleshooting and fixes, enabling organizations to implement generative AI technology confidently across diverse applications.

How RAG Works

Retrieval-Augmented Generation (RAG) is a technology that enhances the capabilities of Large Language Models (LLMs) by incorporating retrieval mechanisms to access external knowledge sources during the generation of responses. Here’s a simplified explanation of how Retrieval-Augmented Generation works:

Large Language Models (LLMs): RAG builds upon existing LLMs, which are powerful models trained on vast datasets to generate human-like text. These models are proficient in various natural language processing (NLP) tasks, such as answering questions and completing sentences.
Knowledge Base Integration: RAG introduces a knowledge base, which is a repository of external information, into the generation process. This knowledge base typically contains authoritative and domain-specific data relevant to the tasks the LLM is designed for.
Retrieval Mechanism: When faced with a task, such as generating a response to a user query, RAG employs a retrieval mechanism. Instead of relying solely on its pre-existing knowledge from training data, the model searches the external knowledge base for relevant information.
Contextualized Information Retrieval: The retrieval process is context-aware, meaning that it considers the specific context of the user query. This ensures that the information retrieved is not only relevant but also aligned with the user’s intent.
Combination of Retrieval and Generation: The retrieved information is then combined with the LLM’s generative capabilities to produce a coherent and contextually appropriate response. This hybrid approach leverages both the pre-existing knowledge of the model and the real-time, external information from the knowledge base.
Source Attribution: RAG often includes mechanisms for source attribution, allowing the model to indicate the origin of the information in the generated response. This transparency can enhance user trust by providing visibility into the model’s decision-making process.
Adaptability and Fine-Tuning: RAG allows for adaptability and fine-tuning by enabling developers to adjust the knowledge base or refine the retrieval mechanism based on changing requirements, emerging data, or evolving contexts. This ensures that the model remains up-to-date and aligned with the organization’s goals.

Retrieval-Augmented Generation vs Semantic Search

Retrieval-Augmented Generation (RAG) and semantic search are related concepts but have distinct focuses and functionalities in the field of natural language processing. Here are the key differences between Retrieval-Augmented Generation and semantic search:

Primary Objective

Retrieval-Augmented Generation (RAG): The primary goal of RAG is to enhance the generation capabilities of Large Language Models (LLMs) by integrating retrieval mechanisms that access external knowledge sources. RAG combines the generative power of LLMs with the ability to retrieve and incorporate information from a predefined knowledge base during the response generation process.
Semantic Search: Semantic search, on the other hand, is primarily concerned with improving the accuracy and relevance of search results by understanding the meaning (semantics) behind user queries. It focuses on matching the intent and context of the query with the content of documents or data in a database.

Application and Use Cases

RAG: RAG is commonly applied in tasks such as question-answering systems, chatbots, and content generation where the model needs to generate coherent and contextually appropriate responses by retrieving and incorporating information from external sources.
Semantic Search: Semantic search is commonly used in search engines and information retrieval systems to provide more accurate and contextually relevant results. It is widely applied in web search, document retrieval, and other information retrieval applications.

Generation vs. Retrieval:

RAG: RAG emphasizes the generation aspect, combining the generative capabilities of LLMs with the retrieval of external information to produce a comprehensive and contextually informed response.
Semantic Search: Semantic search is focused on retrieving information based on the meaning of the query, without the emphasis on generating new content. It aims to find existing documents or data that match the user’s intent.

User Interaction:

RAG: RAG is often used in interactive systems where the model generates responses to user queries. The model can incorporate real-time information from external sources to provide up-to-date and relevant answers.
Semantic Search: Semantic search is typically used in scenarios where users submit queries and expect relevant documents or information in return. It is less focused on generating responses and more on retrieving existing content.

Creating a Question Answering Bot on Medium Blogs

I used here langchain with RAG. Used CHROMA as vector store.

I have fetched data from webpage as JSON I had to use a text splitter which suits my data pattern.

text_splitter_rag = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=20,
    keep_separator=False,
    length_function=len,
    separators=["\\\\\\\\", ".", ","]
)

Have used text-embedding-ada-002 embedding and loaded to CHROMA.

Chroma.from_documents(json_docs, _embedding, persist_directory=vector_database)

Here is my Prompt template :

template = """Use the following pieces of context to answer the question at the end. If you don't get the answer from 
context, just say that you don't know, don't try to make up an answer. Use One sentences maximum. Keep the 
answer as concise as possible. Do not add anything extra, be specific. Question: {question} Helpful Answer:"""

Wow!! Nice Job!!

For any type of help regarding career counselling, resume building, discussing designs or know more about latest data engineering trends and technologies reach out to me at anigos.

P.S : I don’t charge money