Welcome to the Enterprise RAG Evaluation Pipeline! This project is designed to solve one of the biggest bottlenecks in modern AI: Trust.
Whether you are a recruiter looking at my architectural decisions, or a beginner trying to understand how to build AI systems that don't hallucinate, this guide will walk you through the entire process.
📌 The Business Problem (For Hiring Managers)
Retrieval-Augmented Generation (RAG) is the industry standard for querying private corporate documents. However, enterprises hesitate to deploy AI because of hallucinations and poor retrieval.
If an engineer changes the chunking strategy, or a new embedding model is swapped in, how do you know if the AI is still giving accurate answers?
You cannot manually test every prompt.
✅ The Solution
This project is an end-to-end, locally hosted RAG pipeline built with an Automated Evaluation Suite. It proves that AI responses can be programmatically tested for accuracy before being deployed to production.
🧠 Beginner's Tutorial: How This Actually Works
If you are new to AI engineering, here is exactly what is happening under the hood, broken down into three phases.
Phase 1: Data Ingestion (src/ingest.py)
AI models like ChatGPT or Llama don't magically know your company's private rules. We must provide the data.
Process:
Take a raw text file (example: corporate security policy).
Split it into smaller readable sections (chunks).
Convert chunks into numerical representations using an Embedding Model.
Store embeddings inside a Vector Database (ChromaDB).
This enables semantic search instead of keyword search.
Phase 2: Retrieval & Generation (src/query.py)
When a user asks:
"What is the password policy?"
The pipeline:
Searches the vector database for relevant chunks.
Retrieves only the most relevant policy sections.
Sends those sections as context to Llama 3.2.
The model generates an answer grounded strictly in retrieved data using LangChain Expression Language (LCEL).
Result: grounded, factual answers instead of hallucinations.