DocuMind uses Retrieval-Augmented Generation (RAG) to let you chat with your documents, extracting exact answers with page references in seconds.
DocuMind is not just a wrapper around ChatGPT. It builds a Vector Search Engine to ground the AI's responses in your specific data to prevent hallucinations.
graph LR
A[User PDF] -->|PyPDF2| B(Text Chunks)
B -->|OpenAI Embeddings| C(Vector Store / FAISS)
D[User Question] -->|Semantic Search| C
C -->|Top 3 Matches| E[Context Window]
E -->|Prompt Engineering| F[GPT-3.5/4]
F -->|Answer| G[Streamlit UI]
- Ingestion: The app parses raw PDF text and splits it into manageable "chunks" (1000 chars) to preserve context.
- Embedding: Text chunks are converted into 1536-dimensional vectors using
text-embedding-3-small. - Storage: Vectors are stored locally using FAISS (Facebook AI Similarity Search) for O(1) retrieval speed.
- Retrieval: When a user asks a question, the system finds the most mathematically similar chunks and feeds them to the LLM.
| Component | Technology | Description |
|---|---|---|
| Frontend | Streamlit | Rapid UI development for data apps |
| Orchestration | LangChain | Framework for chaining LLM logic |
| Vector DB | FAISS (CPU) | Local, efficient similarity search |
| LLM | OpenAI GPT-3.5 | Inference engine for reasoning |
| Embeddings | OpenAI Ada | Semantic text representation |
| Feature | Description |
|---|---|
| Multi-Document | Upload and process multiple PDFs simultaneously. |
| Context-Aware | Remembers previous questions in the chat session (Conversation Memory). |
| Source Truth | Strictly answers based on the provided context to reduce hallucination. |
| Secure Design | API keys are managed via environment variables and never exposed. |
Prerequisites: Python 3.8+ and an OpenAI API Key.
-
Clone the Repository
git clone [https://github.com/elchibek5/DocuMind.git](https://github.com/elchibek5/DocuMind.git) cd DocuMind -
Install Dependencies
pip install -r requirements.txt
-
Configure Environment Create a
.envfile in the root directory and add your key:OPENAI_API_KEY=sk-proj-xxxxxxxxx...
-
Run the App
streamlit run app.py
Created by Elchibek Dastanov
