Advanced RAG Pipelines and Evaluation
-
Updated
Jan 29, 2026 - Python
Advanced RAG Pipelines and Evaluation
Advanced RAG pipeline optimization framework using DSPy. Implements modular RAG pipelines with Query-Rewriting, Sub-Query Decomposition, and Hybrid Search via Weaviate. Automates prompt tuning and few-shot selection using MIPRO, COPRO, and BootstrapFewShot optimizers on datasets like FreshQA, HotpotQA, TriviaQA, Wikipedia and PubMedQA.
🚀 Production-ready modular RAG monorepo: Local LLM inference (vLLM) • Hybrid retrieval with Qdrant • Semantic caching • Docling document parsing • Cross-encoder reranking • DeepEval evaluation • Full observability with Langfuse • Open WebUI chat interface • OpenAI-compatible API • Fully Dockerized
BNS-LexAI is an AI-powered legal information and case understanding assistant.
A hands-on exploration of Deepeval — an open-source framework for evaluating and red-teaming large language models (LLMs). This repository documents my journey of testing, benchmarking, and improving LLM reliability using custom prompts, metrics, and pipelines.
🧠 LLM Validation Platform: Advanced testing frameworks with DeepEval, Playwright MCP, OpenLLMetry observability, Neo4j knowledge graphs, and Burr+pytest TDD workflows
A robust, modular pipeline for automated LLM chatbot evaluation, using DeepEval, GROQ models, and Confident AI dashboard logging. Designed for systematic QA, reliable evaluation, and portfolio-quality results in AI/QA engineering.
Sandbox Q&A bot for technical docs with optional DeepEval, LangChain, RAGAS, and OpenAI Evals integrations to compare RAG evaluation workflows.
An evaluation example for Retrieval-Augmented Generation (RAG) that provides comparative analysis of two leading RAG techniques: Vector similarity ranking and Graph Database Cypher-based reasoning, through multiple performance indicators such as retrieval quality, generation accuracy, and factual consistency.
[UNDER DEVELOPMENT] Clinical-RAG is a production-grade, citation-backed AI system designed to bridge the "Trust Gap" in medical information retrieval.
Framework for evaluating and improving LLM-generated scientific abstracts using ROUGE metrics, semantic embeddings, and LLM-as-judge techniques.
Add a description, image, and links to the deepeval topic page so that developers can more easily learn about it.
To associate your repository with the deepeval topic, visit your repo's landing page and select "manage topics."