deepeval

Here are 16 public repositories matching this topic...

avnlp / rag-pipelines

Advanced RAG Pipelines and Evaluation

pubmed unstructured rag baml milvus earnings-calls contextual-ai llm langgraph rag-pipeline agentic-rag deepeval financebench healthbench

Updated Jan 29, 2026
Python

Advanced RAG pipeline optimization framework using DSPy. Implements modular RAG pipelines with Query-Rewriting, Sub-Query Decomposition, and Hybrid Search via Weaviate. Automates prompt tuning and few-shot selection using MIPRO, COPRO, and BootstrapFewShot optimizers on datasets like FreshQA, HotpotQA, TriviaQA, Wikipedia and PubMedQA.

metadata-extraction query-rewriting rag weaviate dspy rag-pipeline deepeval sub-query-generation

Updated Oct 31, 2025
Python

MERakram / Advanced-RAG-monorepo

Star

🚀 Production-ready modular RAG monorepo: Local LLM inference (vLLM) • Hybrid retrieval with Qdrant • Semantic caching • Docling document parsing • Cross-encoder reranking • DeepEval evaluation • Full observability with Langfuse • Open WebUI chat interface • OpenAI-compatible API • Fully Dockerized

python nlp ai self-hosted reranking rag fastapi vector-database cross-encoder qdrant vllm langfuse open-webui deepeval

Updated Jan 28, 2026
Python

JohnRitchie / qa-llm-guard

Star

python pytest allure testing-framework qa-automation llm-testing deepeval

Updated May 20, 2025
Python

adityapradhan202 / BNS-LexAI

Star

BNS-LexAI is an AI-powered legal information and case understanding assistant.

docker python3 fastapi streamlit generative-ai pineconedb google-ai-studio deepeval

Updated Feb 1, 2026
Jupyter Notebook

avi350751 / test-llm-with-deepeval

Star

A hands-on exploration of Deepeval — an open-source framework for evaluating and red-teaming large language models (LLMs). This repository documents my journey of testing, benchmarking, and improving LLM reliability using custom prompts, metrics, and pipelines.

evals deepeval llmtesting

Updated Nov 2, 2025
Jupyter Notebook

KooshaPari / kwality

Star

🧠 LLM Validation Platform: Advanced testing frameworks with DeepEval, Playwright MCP, OpenLLMetry observability, Neo4j knowledge graphs, and Burr+pytest TDD workflows

testing validation ai tdd neo4j observability claude playwright llm deepeval

Updated Nov 19, 2025
Makefile

sritajkumarpatel / learn_llmtesting_2025

Star

Project demonstrating LLM testing using Deepeval with OpenAI and local LLMs as judge

openai llm ollama deepeval

Updated Oct 29, 2025
Python

A5hit / FinChatbot_Eval

Star

A robust, modular pipeline for automated LLM chatbot evaluation, using DeepEval, GROQ models, and Confident AI dashboard logging. Designed for systematic QA, reliable evaluation, and portfolio-quality results in AI/QA engineering.

deepeval

Updated Nov 24, 2025
Python

ankitgmishra / Evals

Star

giskard deepeval

Updated Dec 16, 2025
Jupyter Notebook

LiteObject / eval-framework-sandbox

Sponsor

Star

Sandbox Q&A bot for technical docs with optional DeepEval, LangChain, RAGAS, and OpenAI Evals integrations to compare RAG evaluation workflows.

python qa ai evaluation-framework rag llm langchain qa-bot opeai retrieval-augmented-generation ollama llm-evaluation ragas deepeval

Updated Nov 19, 2025
Jupyter Notebook

messeb / py-deepeval-behave-bdd-testing-example

Sponsor

Star

An example that combines Behave (BDD testing) with DeepEval (LLM evaluation) to create human-readable, stakeholder-friendly tests for AI Agents / chatbots.

python bdd chatbot openai behave ai-agents deepeval

Updated Jan 11, 2026
Python

rimironenko / rostcamp

Star

openai-api llm generative-ai amazon-bedrock llm-testing deepeval

Updated Jan 17, 2026
Python

antdragiotis / rag-evaluation-framework-II

Star

An evaluation example for Retrieval-Augmented Generation (RAG) that provides comparative analysis of two leading RAG techniques: Vector similarity ranking and Graph Database Cypher-based reasoning, through multiple performance indicators such as retrieval quality, generation accuracy, and factual consistency.

neo4j rag deepeval

Updated Jan 26, 2026
Jupyter Notebook

SchadenKai / Clinical-RAG

Star

[UNDER DEVELOPMENT] Clinical-RAG is a production-grade, citation-backed AI system designed to bridge the "Trust Gap" in medical information retrieval.

milvus healthcare-ai langchain-python rag-pipeline rag-chatbot langgraph-python deepeval

Updated Feb 3, 2026
Python

nsourlos / llm-scientific-abstract-evaluation

Star

Framework for evaluating and improving LLM-generated scientific abstracts using ROUGE metrics, semantic embeddings, and LLM-as-judge techniques.

python text-generation semantic-similarity rouge-metric sentence-transformers scientific-abstracts openai-api dspy prompt-engineering llm-evaluation llm-as-a-judge deepeval

Updated Oct 18, 2025
Python

Improve this page

Add a description, image, and links to the deepeval topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the deepeval topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepeval

Here are 16 public repositories matching this topic...

avnlp / rag-pipelines

avnlp / dspy-opt

MERakram / Advanced-RAG-monorepo

JohnRitchie / qa-llm-guard

adityapradhan202 / BNS-LexAI

avi350751 / test-llm-with-deepeval

KooshaPari / kwality

sritajkumarpatel / learn_llmtesting_2025

A5hit / FinChatbot_Eval

ankitgmishra / Evals

LiteObject / eval-framework-sandbox

messeb / py-deepeval-behave-bdd-testing-example

rimironenko / rostcamp

antdragiotis / rag-evaluation-framework-II

SchadenKai / Clinical-RAG

nsourlos / llm-scientific-abstract-evaluation

Improve this page

Add this topic to your repo