A minimal Retrieval-Augmented Generation (RAG) service built with FastAPI, Weaviate (v4), and OpenAI. Upload a PDF, store chunk embeddings in Weaviate, and query with semantic search; the answer is generated with OpenAI.
- Streaming answers (ChatGPT-style): new
GET /query_streamendpoint streams tokens via SSE; the UI renders a typewriter-style answer. - Faster ingestion for large PDFs: larger overlapping chunks and batched OpenAI embedding requests significantly reduce processing time.
- PDF upload, chunking, OpenAI embeddings, Weaviate storage (v4 Collections API)
- Query endpoint that retrieves top-k chunks via vector search and asks OpenAI to answer
- Simple HTML UI to upload and query
- Python 3.13 (a virtual environment is provided in
venv/) - Docker and Docker Compose (for Weaviate)
- OpenAI API key
Create a .env file in the project root with:
OPENAI_API_KEY=sk-...
OPENAI_ORG_ID=your_org_id_optional
WEAVIATE_URL=http://localhost:8080
Notes:
WEAVIATE_URLshould point to your Weaviate instance. The includeddocker-compose.ymlexposeshttp://localhost:8080and gRPC on50051.
From the project root:
docker compose up -dThis starts Weaviate (>= 1.27.x) compatible with the Python client v4.
If needed (you may also use the provided venv/):
python3 -m venv venv
source venv/bin/activate
pip install -U pip
pip install fastapi uvicorn weaviate-client openai PyPDF2 python-dotenvsource venv/bin/activate
uvicorn app.main:app --reload --port 8000- API docs: http://127.0.0.1:8000/docs
- Root HTML UI: http://127.0.0.1:8000/
- Open
http://127.0.0.1:8000/in your browser. - Upload a real PDF (not a renamed file). The app extracts text and stores vectorized chunks in Weaviate.
- Ask a question in the query box.
- The answer will now stream live (typewriter effect) as tokens arrive.
- If streaming is unavailable, it automatically falls back to the non‑streaming endpoint.
Alternatively, using curl:
# Upload via curl (multipart form)
curl -F "file=@/path/to/your.pdf" http://127.0.0.1:8000/upload_pdf
# Query
curl -G "http://127.0.0.1:8000/query" --data-urlencode "q=What is this document about?"
# Streamed query (Server-Sent Events)
# Note: --no-buffer helps show events as they arrive; some shells/binaries auto-buffer
curl --no-buffer -N -G "http://127.0.0.1:8000/query_stream" --data-urlencode "q=What is this document about?"app/
main.py # FastAPI app, Weaviate v4 client, endpoints
index.html # Simple UI for upload and query (now with streaming via EventSource)
docker-compose.yml # Weaviate (>= 1.27.x) with gRPC exposed
GitHub markdown does not reliably play embedded videos inline. Use the direct link below (it will preview in the browser on GitHub, or download):
If you want an inline preview in the README, convert a short clip to GIF and embed the GIF instead, e.g. assets/demo.gif.
- Uvicorn not found: activate the virtualenv or use the full path
venv/bin/uvicorn. - Weaviate version error: ensure Docker image is
>= 1.27.x(compose uses a recent 1.27 tag) and port8080is free. - gRPC/HTTP connectivity: compose exposes
8080and50051. The client auto-derives security/ports fromWEAVIATE_URL. - PDF parsing errors: the API returns a 400 if the file is empty, not a PDF, corrupted, or has no extractable text.
- 405/404 from HTML: open the UI at
http://127.0.0.1:8000/so requests hit the API origin. The UI also falls back tohttp://127.0.0.1:8000when opened fromfile://.
- Open the UI at
http://127.0.0.1:8000/(notfile://) soEventSourcecan connect to the API origin. - Some proxies/load balancers buffer SSE. Disable response buffering for
text/event-streamor access the API directly during development. - If your network blocks SSE, the UI will fall back to non‑streaming
/queryautomatically.
- Ingestion uses chunk size ~1500 characters with ~200 character overlap, and batches embedding requests in groups (default 64). This reduces the number of OpenAI round-trips for large PDFs.
- You can adjust these parameters in
app/main.pyif you need different trade-offs between speed and recall.
- Uses
weaviate.connect_to_customand the Collections API. - Vectorizer is set to
none(vector_config=Configure.Vectorizer.none()); embeddings are provided from OpenAI. - Collection
DocumentChunkis created if missing (no destructive reset on reload).
MIT