Converts a PDF to text (Markdown) by rendering each page to an image and sending it through the deepseek-ocr:latest model served by Ollama via its OpenAI-compatible API (Responses API).
- Ollama running locally and the model pulled:
ollama pull deepseek-ocr:latest
- On Linux,
pdf2imagerequires Poppler:- Debian/Ubuntu:
sudo apt-get install poppler-utils
- Debian/Ubuntu:
pipx install git+https://github.com/arrase/OCR.gitocr 2512.15741v1.pdfYou can include/exclude pages (1-based) and use both at the same time; --include is applied first and then --exclude.
Examples:
# Only page 1
ocr --include 1 2512.15741v1.pdf
# Pages 1 to 5 except 3
ocr --include 1-5 --exclude 3 2512.15741v1.pdf
# Combinations
ocr --include 1,3,5-8 --exclude 6-7 2512.15741v1.pdfOutput: creates 2512.15741v1.md in the same directory.
You can configure the tool using a YAML file. The tool looks for a configuration file in the following order:
- Path specified via
--config/-c. ~/ocr_config.yaml.
A default configuration file is provided in config/default_config.yaml:
model: deepseek-ocr:latest
base_url: http://localhost:11434/v1
prompt: |
Convert the document to markdown.Environment variables override configuration files:
OLLAMA_BASE_URLOLLAMA_MODEL