Scriptoria – AI-Powered Historical Document Analysis

Scriptoria is a Retrieval-Augmented Generation (RAG) engine purpose-built for historical document research. Upload scanned manuscripts, archival records, old letters, or any heritage PDF—Scriptoria extracts text via OCR (even from faded prints and handwritten notes), indexes it in a vector database, and lets you interrogate centuries of knowledge using natural language powered by local LLMs.

"Like having a research assistant who has read every page in your archive."

Features

Historical PDF ingestion – upload scanned manuscripts, archival records, and heritage documents
OCR optimized for aged documents – extracts text from faded prints, old typographies, and scanned pages
Semantic vector storage – powered by ChromaDB for intelligent document retrieval
Natural language queries – ask questions about your historical sources in plain language
Fully local & private – runs entirely on your machine using Ollama (no data leaves your system)
Modern web interface – clean, dark-themed UI built with Astro
REST API – integrate with your own tools via FastAPI endpoints
Docker support – one-command deployment for easy setup

Prerequisites

Python 3.12+
Node.js and npm
Ollama (for local LLM inference)
Docker and Docker Compose (optional, for containerized setup)

Installation

Option 1: Docker (Recommended)

Clone the repository:
```
git clone <repo-url>
cd scriptoria
```
Start the application:
```
docker-compose up --build
```
This will:
- Start Ollama server
- Pull required models (nomic-embed-text, llama3.1)
- Build and run the application on http://localhost:8000

Option 2: Local Development

Clone the repository:
```
git clone <repo-url>
cd scriptoria
```
Install Ollama:
- Download from https://ollama.com/download

Set up Python environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv sync  # or pip install -e .

Set up frontend:

cd frontend
npm install
npm run build
cd ..

Run the application:
```
./start.sh
```
Or manually:
```
uv run python main.py
```
The application will be available at http://localhost:8000

Usage

Open http://localhost:8000 in your browser
Upload scanned historical PDFs (manuscripts, archival records, old books, letters…)
Scriptoria processes them with OCR and indexes content in the vector database
Ask questions in natural language: "What events are described in 1492?", "Summarize the correspondence between these two figures", etc.

API

The backend provides a REST API:

POST /upload – Upload historical PDF documents for OCR processing and indexing
POST /ask – Query your document archive using natural language
GET /files – List all indexed documents

Project Structure

app.py – FastAPI application & API endpoints
main.py – Entry point with automatic Ollama management & model provisioning
rag/ – RAG pipeline (OCR extraction, document ingestion, semantic querying, vector store)
frontend/ – Astro-based web interface
data/ – Uploaded documents and ChromaDB vector store
docker-compose.yml – Full-stack Docker deployment
pyproject.toml – Python project metadata & dependencies

Contributing

Contributions are welcome! Please open issues or submit pull requests.

License

GPL-3.0

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
docs		docs
frontend		frontend
rag		rag
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
main.py		main.py
package-lock.json		package-lock.json
pyproject.toml		pyproject.toml
start.sh		start.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scriptoria – AI-Powered Historical Document Analysis

Features

Prerequisites

Installation

Option 1: Docker (Recommended)

Option 2: Local Development

Usage

API

Project Structure

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scriptoria – AI-Powered Historical Document Analysis

Features

Prerequisites

Installation

Option 1: Docker (Recommended)

Option 2: Local Development

Usage

API

Project Structure

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages