A production-ready Corrective Retrieval-Augmented Generation (CRAG) system built with LangChain, LangGraph, and FastAPI. This project implements an intelligent RAG pipeline that not only retrieves relevant documents but also validates, corrects, and improves retrieval quality through an agent-based workflow.
Query → Retrieve Documents → Generate Answer
Problem: If retrieved documents are irrelevant or low-quality, the answer will be poor.
Query → Retrieve → Grade Quality → Transform Query if Needed → Web Search if Necessary → Generate
Solution: Intelligent agent workflow that self-corrects by grading document relevance and taking corrective actions.
graph LR
A[User Query] --> B[Retrieve]
B --> C[FAISS+MMR]
C --> D[Rerank]
D --> E{Grade}
E -->|Relevant| F[Generate]
E -->|Partial| G[Filter]
E -->|Poor| H[Transform]
G --> F
H --> I[Web Search]
I --> F
F --> J[Groq LLM]
J --> K[Answer]
- LLM evaluates retrieved documents for relevance
- Filters out low-quality results automatically
- Ensures only useful context reaches generation
- Rewrites ambiguous or poor queries
- Improves retrieval on second attempt
- Adaptive query refinement
- Tavily API integration for external knowledge
- Activates when local documents insufficient
- Combines local + web results
- FAISS vector store with MMR search
- FastEmbed (BAAI/bge-small-en-v1.5) embeddings
- FlashRank (rank-T5-flan) reranking
- Self-query retriever support
- State machine orchestration
- Conditional routing logic
- Transparent decision-making
| Component | Technology |
|---|---|
| LLM | Groq (openai/gpt-oss-120b) |
| Embeddings | FastEmbed (BAAI/bge-small-en-v1.5) |
| Vector Store | FAISS |
| Reranker | FlashRank (rank-T5-flan) |
| Agent Framework | LangGraph |
| RAG Framework | LangChain 0.3.x |
| Web Search | Tavily API |
| Web Framework | FastAPI + Uvicorn |
| Observability | LangSmith (optional) |
| Document Source | "Attention Is All You Need" (Transformer paper) |
RAG Project/
├── project/
│ ├── config/
│ │ └── config.yaml # Model & pipeline configuration
│ ├── logger/
│ │ └── logging.py # Centralized logging
│ ├── exception/
│ │ └── except.py # Custom exception handling
│ ├── utils/
│ │ ├── config_loader.py # YAML config loader
│ │ └── model_loader.py # LLM & embedding initialization
│ ├── source/
│ │ └── data_preparation.py # PDF/ArXiv document loading
│ ├── model/
│ │ ├── retriever.py # FAISS retriever with MMR
│ │ └── reranking.py # FlashRank reranking
│ ├── prompts/
│ │ └── prompt_template.py # RAG, Router, WebSearch prompts
│ └── pipeline/
│ ├── rag.py # Core RAG pipeline
│ └── agents.py # CRAG agent workflow
├── templates/
│ └── index.html # Web UI template
├── static/
│ └── styles.css # Purple gradient theme
├── data/
│ └── attention-is-all-you-need.pdf
├── app.py # FastAPI application
├── main.py # CLI entry point
├── Dockerfile # Docker containerization
└── requirements.txt # Dependencies
git clone https://github.com/Abeshith/RAG-Project-PipeLine.git
cd RAG-Project-PipeLine
pip install -r requirements.txtCreate .env file:
GROQ_API_KEY=your_groq_api_key
GOOGLE_API_KEY=your_google_api_key
LANGSMITH_API_KEY=your_langsmith_key
TAVILY_API_KEY=your_tavily_key python app.pyVisit: http://localhost:8000
python main.pydocker build -t rag-project .
docker run -d -p 8000:8000 --env-file .env rag-projectQuery: "What is the attention mechanism in transformers?"
- Retrieval: FAISS finds top 3 most similar chunks from "Attention Is All You Need" paper
- Reranking: FlashRank reorders by relevance (top 3 kept)
- Grading: LLM evaluates each document:
- ✅ Doc 1: Relevant (explains attention)
- ✅ Doc 2: Relevant (shows formula)
- ❌ Doc 3: Not relevant (talks about training data)
- Decision: 2/3 relevant → Use filtered docs
- Generation: Groq LLM synthesizes answer from relevant docs
- Output: Comprehensive answer with LaTeX formulas (rendered via MathJax)
Query: "What are the latest improvements to transformers in 2024?"
- Retrieval: Finds documents from 2017 paper
- Grading: ❌ All documents marked "not relevant" (outdated info)
- Transform: Rewrites query → "Recent transformer architecture improvements 2024"
- Web Search: Tavily searches current web content
- Generation: Answer combines paper fundamentals + recent developments
- Modern UI: Purple gradient design with responsive layout
- MathJax Integration: Renders LaTeX formulas beautifully
- Transformer Visualization: Architecture diagram in header
- Real-time Search: Fast async FastAPI backend
- Error Handling: Graceful degradation with user-friendly messages