RAG FastAPI Demo

A minimal Retrieval-Augmented Generation (RAG) service built with FastAPI, Weaviate (v4), and OpenAI. Upload a PDF, store chunk embeddings in Weaviate, and query with semantic search; the answer is generated with OpenAI.

What's New

Streaming answers (ChatGPT-style): new GET /query_stream endpoint streams tokens via SSE; the UI renders a typewriter-style answer.
Faster ingestion for large PDFs: larger overlapping chunks and batched OpenAI embedding requests significantly reduce processing time.

Features

PDF upload, chunking, OpenAI embeddings, Weaviate storage (v4 Collections API)
Query endpoint that retrieves top-k chunks via vector search and asks OpenAI to answer
Simple HTML UI to upload and query

Requirements

Python 3.13 (a virtual environment is provided in venv/)
Docker and Docker Compose (for Weaviate)
OpenAI API key

Environment Variables

Create a .env file in the project root with:

OPENAI_API_KEY=sk-...
OPENAI_ORG_ID=your_org_id_optional
WEAVIATE_URL=http://localhost:8080

Notes:

WEAVIATE_URL should point to your Weaviate instance. The included docker-compose.yml exposes http://localhost:8080 and gRPC on 50051.

Start Weaviate

From the project root:

docker compose up -d

This starts Weaviate (>= 1.27.x) compatible with the Python client v4.

Create and Activate Virtualenv

If needed (you may also use the provided venv/):

python3 -m venv venv
source venv/bin/activate
pip install -U pip
pip install fastapi uvicorn weaviate-client openai PyPDF2 python-dotenv

Run the API

source venv/bin/activate
uvicorn app.main:app --reload --port 8000

API docs: http://127.0.0.1:8000/docs
Root HTML UI: http://127.0.0.1:8000/

Using the App

Open http://127.0.0.1:8000/ in your browser.
Upload a real PDF (not a renamed file). The app extracts text and stores vectorized chunks in Weaviate.
Ask a question in the query box.
- The answer will now stream live (typewriter effect) as tokens arrive.
- If streaming is unavailable, it automatically falls back to the non‑streaming endpoint.

Alternatively, using curl:

# Upload via curl (multipart form)
curl -F "file=@/path/to/your.pdf" http://127.0.0.1:8000/upload_pdf

# Query
curl -G "http://127.0.0.1:8000/query" --data-urlencode "q=What is this document about?"

# Streamed query (Server-Sent Events)
# Note: --no-buffer helps show events as they arrive; some shells/binaries auto-buffer
curl --no-buffer -N -G "http://127.0.0.1:8000/query_stream" --data-urlencode "q=What is this document about?"

Project Structure

app/
  main.py       # FastAPI app, Weaviate v4 client, endpoints
  index.html    # Simple UI for upload and query (now with streaming via EventSource)

docker-compose.yml # Weaviate (>= 1.27.x) with gRPC exposed

Demo

GitHub markdown does not reliably play embedded videos inline. Use the direct link below (it will preview in the browser on GitHub, or download):

Demo (MP4): assets/Screen Recording 2025-10-01 at 4.25.15 PM.mp4

If you want an inline preview in the README, convert a short clip to GIF and embed the GIF instead, e.g. assets/demo.gif.

Troubleshooting

Uvicorn not found: activate the virtualenv or use the full path venv/bin/uvicorn.
Weaviate version error: ensure Docker image is >= 1.27.x (compose uses a recent 1.27 tag) and port 8080 is free.
gRPC/HTTP connectivity: compose exposes 8080 and 50051. The client auto-derives security/ports from WEAVIATE_URL.
PDF parsing errors: the API returns a 400 if the file is empty, not a PDF, corrupted, or has no extractable text.
405/404 from HTML: open the UI at http://127.0.0.1:8000/ so requests hit the API origin. The UI also falls back to http://127.0.0.1:8000 when opened from file://.

Streaming (SSE) specific

Open the UI at http://127.0.0.1:8000/ (not file://) so EventSource can connect to the API origin.
Some proxies/load balancers buffer SSE. Disable response buffering for text/event-stream or access the API directly during development.
If your network blocks SSE, the UI will fall back to non‑streaming /query automatically.

Performance notes

Ingestion uses chunk size ~1500 characters with ~200 character overlap, and batches embedding requests in groups (default 64). This reduces the number of OpenAI round-trips for large PDFs.
You can adjust these parameters in app/main.py if you need different trade-offs between speed and recall.

Notes on Weaviate v4 Migration

Uses weaviate.connect_to_custom and the Collections API.
Vectorizer is set to none (vector_config=Configure.Vectorizer.none()); embeddings are provided from OpenAI.
Collection DocumentChunk is created if missing (no destructive reset on reload).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
app		app
assets		assets
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG FastAPI Demo

What's New

Features

Requirements

Environment Variables

Start Weaviate

Create and Activate Virtualenv

Run the API

Using the App

Project Structure

Demo

Troubleshooting

Streaming (SSE) specific

Performance notes

Notes on Weaviate v4 Migration

License

About

Uh oh!

Releases

Packages

Languages

codecraft26/DocSense-RAG

Folders and files

Latest commit

History

Repository files navigation

RAG FastAPI Demo

What's New

Features

Requirements

Environment Variables

Start Weaviate

Create and Activate Virtualenv

Run the API

Using the App

Project Structure

Demo

Troubleshooting

Streaming (SSE) specific

Performance notes

Notes on Weaviate v4 Migration

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages