Skip to content

FastAPI-based RAG app: upload PDFs, store OpenAI embeddings in Weaviate, and query with semantic search. Includes a simple HTML UI, Dockerized Weaviate, and ready-to-run setup via requirements.txt.

Notifications You must be signed in to change notification settings

codecraft26/DocSense-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG FastAPI Demo

A minimal Retrieval-Augmented Generation (RAG) service built with FastAPI, Weaviate (v4), and OpenAI. Upload a PDF, store chunk embeddings in Weaviate, and query with semantic search; the answer is generated with OpenAI.

What's New

  • Streaming answers (ChatGPT-style): new GET /query_stream endpoint streams tokens via SSE; the UI renders a typewriter-style answer.
  • Faster ingestion for large PDFs: larger overlapping chunks and batched OpenAI embedding requests significantly reduce processing time.

Features

  • PDF upload, chunking, OpenAI embeddings, Weaviate storage (v4 Collections API)
  • Query endpoint that retrieves top-k chunks via vector search and asks OpenAI to answer
  • Simple HTML UI to upload and query

Requirements

  • Python 3.13 (a virtual environment is provided in venv/)
  • Docker and Docker Compose (for Weaviate)
  • OpenAI API key

Environment Variables

Create a .env file in the project root with:

OPENAI_API_KEY=sk-...
OPENAI_ORG_ID=your_org_id_optional
WEAVIATE_URL=http://localhost:8080

Notes:

  • WEAVIATE_URL should point to your Weaviate instance. The included docker-compose.yml exposes http://localhost:8080 and gRPC on 50051.

Start Weaviate

From the project root:

docker compose up -d

This starts Weaviate (>= 1.27.x) compatible with the Python client v4.

Create and Activate Virtualenv

If needed (you may also use the provided venv/):

python3 -m venv venv
source venv/bin/activate
pip install -U pip
pip install fastapi uvicorn weaviate-client openai PyPDF2 python-dotenv

Run the API

source venv/bin/activate
uvicorn app.main:app --reload --port 8000

Using the App

  1. Open http://127.0.0.1:8000/ in your browser.
  2. Upload a real PDF (not a renamed file). The app extracts text and stores vectorized chunks in Weaviate.
  3. Ask a question in the query box.
    • The answer will now stream live (typewriter effect) as tokens arrive.
    • If streaming is unavailable, it automatically falls back to the non‑streaming endpoint.

Alternatively, using curl:

# Upload via curl (multipart form)
curl -F "file=@/path/to/your.pdf" http://127.0.0.1:8000/upload_pdf

# Query
curl -G "http://127.0.0.1:8000/query" --data-urlencode "q=What is this document about?"

# Streamed query (Server-Sent Events)
# Note: --no-buffer helps show events as they arrive; some shells/binaries auto-buffer
curl --no-buffer -N -G "http://127.0.0.1:8000/query_stream" --data-urlencode "q=What is this document about?"

Project Structure

app/
  main.py       # FastAPI app, Weaviate v4 client, endpoints
  index.html    # Simple UI for upload and query (now with streaming via EventSource)

docker-compose.yml # Weaviate (>= 1.27.x) with gRPC exposed

Demo

GitHub markdown does not reliably play embedded videos inline. Use the direct link below (it will preview in the browser on GitHub, or download):

If you want an inline preview in the README, convert a short clip to GIF and embed the GIF instead, e.g. assets/demo.gif.

Troubleshooting

  • Uvicorn not found: activate the virtualenv or use the full path venv/bin/uvicorn.
  • Weaviate version error: ensure Docker image is >= 1.27.x (compose uses a recent 1.27 tag) and port 8080 is free.
  • gRPC/HTTP connectivity: compose exposes 8080 and 50051. The client auto-derives security/ports from WEAVIATE_URL.
  • PDF parsing errors: the API returns a 400 if the file is empty, not a PDF, corrupted, or has no extractable text.
  • 405/404 from HTML: open the UI at http://127.0.0.1:8000/ so requests hit the API origin. The UI also falls back to http://127.0.0.1:8000 when opened from file://.

Streaming (SSE) specific

  • Open the UI at http://127.0.0.1:8000/ (not file://) so EventSource can connect to the API origin.
  • Some proxies/load balancers buffer SSE. Disable response buffering for text/event-stream or access the API directly during development.
  • If your network blocks SSE, the UI will fall back to non‑streaming /query automatically.

Performance notes

  • Ingestion uses chunk size ~1500 characters with ~200 character overlap, and batches embedding requests in groups (default 64). This reduces the number of OpenAI round-trips for large PDFs.
  • You can adjust these parameters in app/main.py if you need different trade-offs between speed and recall.

Notes on Weaviate v4 Migration

  • Uses weaviate.connect_to_custom and the Collections API.
  • Vectorizer is set to none (vector_config=Configure.Vectorizer.none()); embeddings are provided from OpenAI.
  • Collection DocumentChunk is created if missing (no destructive reset on reload).

License

MIT

About

FastAPI-based RAG app: upload PDFs, store OpenAI embeddings in Weaviate, and query with semantic search. Includes a simple HTML UI, Dockerized Weaviate, and ready-to-run setup via requirements.txt.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published