Skip to content

RAG System in Production, using ranking selection strategy

License

Notifications You must be signed in to change notification settings

Ghasak/proRagSys

Repository files navigation

🚀 ProRAG System: Local-First Intelligent RAG

A powerful, local-first Retrieval-Augmented Generation (RAG) system with a two-stage retrieval pipeline, OCR support for scanned documents, and professional observability.

✨ Key Features

  • Two-Stage Retrieval: Vector search (Qdrant) + AI Re-ranking (Cross-Encoder).
  • Intelligent OCR: Automatic fallback to Tesseract for scanned/image-based PDFs.
  • Premium CLI: Highly verbose and organized output using rich.
  • Pro Observability: Industry-standard dashboard with Arize Phoenix.
  • Local-First: Complete privacy, running entirely on your machine via Docker and Pixi.

🛠 Prerequisites


⚡️ Quick Start

1. Setup Infrastructure

Start the local Qdrant database and Phoenix dashboard:

pixi run up

2. Ingest Documents

Place your PDFs in the data/ folder and run:

pixi run ingest

3. Query the System

Search through your documents with AI re-ranking:

pixi run query "What is RAG?"

⌨️ Automation (Pixi Tasks)

Command Description
pixi run up Start Docker containers (Qdrant + Phoenix)
pixi run down Stop all Docker containers
pixi run ingest Process PDFs and store embeddings
pixi run query "..." Search and re-rank results
pixi run test Run unit tests
pixi run stats Check collection statistics
pixi run dashboard Open Arize Phoenix in browser
pixi run qdrant_ui Open Qdrant Dashboard in browser

🏗 Project Structure

.
├── data/               # Source PDF documents
├── src/
│   └── prorag/         # Main package
│       ├── core/       # Config, Database, Model managers
│       ├── ingest/     # PDF processing & pipeline
│       ├── retrieval/  # Vector search & re-ranking
│       └── cli.py      # Unified CLI entry point
├── tests/              # Unit tests
├── docker-compose.yml  # Infrastructure as Code
└── pixi.toml          # Dependency & Task management

About

RAG System in Production, using ranking selection strategy

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages