🧠 DocuMind

Stop scrolling through 100-page PDFs.

DocuMind uses Retrieval-Augmented Generation (RAG) to let you chat with your documents, extracting exact answers with page references in seconds.

🏗️ System Architecture

DocuMind is not just a wrapper around ChatGPT. It builds a Vector Search Engine to ground the AI's responses in your specific data to prevent hallucinations.

graph LR
    A[User PDF] -->|PyPDF2| B(Text Chunks)
    B -->|OpenAI Embeddings| C(Vector Store / FAISS)
    D[User Question] -->|Semantic Search| C
    C -->|Top 3 Matches| E[Context Window]
    E -->|Prompt Engineering| F[GPT-3.5/4]
    F -->|Answer| G[Streamlit UI]

Ingestion: The app parses raw PDF text and splits it into manageable "chunks" (1000 chars) to preserve context.
Embedding: Text chunks are converted into 1536-dimensional vectors using text-embedding-3-small.
Storage: Vectors are stored locally using FAISS (Facebook AI Similarity Search) for O(1) retrieval speed.
Retrieval: When a user asks a question, the system finds the most mathematically similar chunks and feeds them to the LLM.

🛠️ Tech Stack

Component	Technology	Description
Frontend	Streamlit	Rapid UI development for data apps
Orchestration	LangChain	Framework for chaining LLM logic
Vector DB	FAISS (CPU)	Local, efficient similarity search
LLM	OpenAI GPT-3.5	Inference engine for reasoning
Embeddings	OpenAI Ada	Semantic text representation

💡 Key Features

Feature	Description
Multi-Document	Upload and process multiple PDFs simultaneously.
Context-Aware	Remembers previous questions in the chat session (Conversation Memory).
Source Truth	Strictly answers based on the provided context to reduce hallucination.
Secure Design	API keys are managed via environment variables and never exposed.

🚀 How to Run Locally

Prerequisites: Python 3.8+ and an OpenAI API Key.

Clone the Repository

git clone [https://github.com/elchibek5/DocuMind.git](https://github.com/elchibek5/DocuMind.git)
cd DocuMind

Install Dependencies
```
pip install -r requirements.txt
```
Configure Environment Create a .env file in the root directory and add your key:
```
OPENAI_API_KEY=sk-proj-xxxxxxxxx...
```
Run the App
```
streamlit run app.py
```

Created by Elchibek Dastanov

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
demo.png		demo.png
htmlTemplates.py		htmlTemplates.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 DocuMind

Stop scrolling through 100-page PDFs.

🏗️ System Architecture

🛠️ Tech Stack

💡 Key Features

🚀 How to Run Locally

About

Uh oh!

Releases

Packages

Languages

elchibek5/DocuMind

Folders and files

Latest commit

History

Repository files navigation

🧠 DocuMind

Stop scrolling through 100-page PDFs.

🏗️ System Architecture

🛠️ Tech Stack

💡 Key Features

🚀 How to Run Locally

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages