
What is RAG, as quick as possible?
LLM
Large Language Models (LLM) are the next big thing in the field of Data Science. These models are trained on humongous datasets and possess the capability to perform various tasks such as answering questions, summarizing documents, translating languages, and completing sentences.
The underlying architecture of LLMs is the transformer, which comprises a set of neural networks containing encoder and decoder with self-attention capabilities. The encoder and decoder understand the relationships between words and phrases, using the lower representation of the data.
Types of LLM:
1. BERT
2. GPT
3. LLaMa
4. Mistral
5. Hugging Face, among others.
What if you have a task at hand that involves classifying data based on specific contexts, leveraging Large Language Models (LLMs) can indeed be advantageous. However, tuning the model according to the specific context of the data is crucial. To accomplish this, one approach is to utilize Retrieval-Augmented Generation (RAG) techniques.
RAG
Retrieval-Augmented Generation, also known as RAG, serves as a bridge between large language models and your personal dataset by providing an innovative strategy through a sophisticated retrieval system. It helps us optimize the output generated by the large language mode by referencing personal knowledge bases before generating responses.
For example,
Question: “What was the revenue of Company X in the year 2024?”
The revenue of Company X is $3 trillion as of 2024.
This may be generalized answer sourced from public sites and various other sources.
The problem here is:
1. accuracy of the generated response
2. lack of transparency regarding their sources
3. presence of false information
However, if you have access to Company X’s financial statements and incorporate this data into the models, you can use the same LLM architecture to generate a response more specific to the data that is known to you. This approach allows you to validate the response and enhances the reliability and relevance of the information, since it is tailored to the specific context of Company X’s data.
After employing RAG, the response may look like:
Question: “What was the revenue of Company X in the year 2024?”
Answer: According to the report “Company X 2023–2024,” the revenue is $204,000,000.
Is RAG a fine-tuning technique?
It is not a method for fine-tuning the models, rather RAG is an architecture that integrated a retrieval system with Large Language Models. This enables the user to enhance the generation of the response using personalized data. In other words, it allows users to provide more context to the models, specifying precisely what they want from the response.

Components of RAG
1. Create a vector representation of your personal data.
The data or the document that you want to use as a context to the LLM is first converted into a vector representation and stored in a database. The commonly used vector databases are:
a. Pinecone
b. Milvus
c. Elasticsearch
d. Deep Lake
Using a vector database [1] allows you to retrieve information fact and accurately using similarity searches.
2. Retrieval system
The primary function of this process is to convert the query into a vector representation and subsequently comparing it with entries within the vector databases. Through a similarity search, the system identifies the most closely related knowledge stored in the vector database.
3. Augmentation of response
Leveraging the context with the query to prompt the LLM to generate a response which is in the context.
Performance Evaluation
Evaluation of RAG models involves evaluating accuracy of retrieval component and generated text.
Retrieval Accuracy — Evaluates how efficiently the component fetches relevant documents or information snippets for a given query. This can be measured through metrics like Precision, Recall and F1-Score.
Generation Quality — Evaluates how efficiently the model generated the text. This is measured using metrics like BLEU (Bilingual Evaluation Understudy Score), Rouge (Recall Oriented Understudy for Gisting Evaluation), Perplexity, LSA (Latent Semantic Analysis), METEOR (Metric for Evaluation of Translation with Explicit Ordering), and TER (Translation Edit Rate). [2]
Conclusion
RAG technique presents a powerful approach for enhancing the response generated, by bridging the gap between LLMs and personalized dataset. Enabling more contextually relevant and reliable responses.
References
[1] https://www.cloudflare.com/learning/ai/what-is-vector-database/