This script processes a directory of images (JPG, PNG, WEBP, etc.), uses the Google Gemini API to extract text segments highlighted in yellow or green, sorts the extracted text by approximate reading order, and saves the results as a formatted Markdown bullet list. It also prints the results for each image to the console as it runs.
- Python 3.7 or higher installed.
- Access to the Google Gemini API and a valid API Key.
-
Clone/Download:
- Download the script (
extract_highlights.pyor your chosen name) and therequirements.txtfile into a project directory.
- Download the script (
-
Create Virtual Environment (Highly Recommended):
- Open your terminal or command prompt, navigate to your project directory.
- Run:
python -m venv venv
- Activate the environment:
- macOS / Linux:
source venv/bin/activate - Windows (CMD):
venv\Scripts\activate - Windows (PowerShell):
venv\Scripts\Activate.ps1 - (You should see
(venv)at the beginning of your terminal prompt)
- macOS / Linux:
-
Install Dependencies:
- With your virtual environment activated, run:
(This installs
pip install -r requirements.txt
google-generativeai,Pillow, and optionallypython-dotenv)
- With your virtual environment activated, run:
-
Set API Key:
- Option A (Recommended -
.envfile):- Create a file named
.env(the filename starts with a dot) in your project directory. - Add your API key to this file on a single line:
GEMINI_API_KEY=YOUR_API_KEY_HERE
- Replace
YOUR_API_KEY_HEREwith your actual key. - Important: If using Git, add
.envto your.gitignorefile to avoid committing your key.
- Create a file named
- Option B (Environment Variable):
- Set the
GEMINI_API_KEYenvironment variable in your terminal session before running the script. How you do this depends on your operating system:- macOS / Linux:
export GEMINI_API_KEY="YOUR_API_KEY_HERE"
- Windows (CMD):
set GEMINI_API_KEY=YOUR_API_KEY_HERE
- Windows (PowerShell):
$env:GEMINI_API_KEY="YOUR_API_KEY_HERE"
- macOS / Linux:
- Note: This variable might only last for the current terminal session.
- Set the
- Option A (Recommended -
Run the script from your terminal (make sure your virtual environment is activated first).
python extract_highlights.py -i <path_to_images> -o <output_markdown_file> [options]This section details the arguments you can pass to the script:
-
-i,--input-dir( Required):- Path to the directory containing the images you want to process.
- Example:
-i ./my_scans
-
-o,--output-file( Required):- Path where the output Markdown file will be saved.
- Example:
-o report.md
-
-t,--tolerance(Optional):- The vertical pixel tolerance used when grouping text lines for sorting. Affects how strictly the script considers text to be on the same line if the image is slightly skewed.
- Default:
10 - Example:
-t 15
-
-s,--sleep(Optional):- The number of seconds to pause between processing each image. This helps manage API rate limits.
- Default:
5 - Example:
-s 7
-
-m,--model(Optional):- The specific Gemini model name to use for the API calls.
- Default:
gemini-1.5-flash-latest(Check script'sDEFAULT_MODELconstant if different) - Example:
-m gemini-1.5-pro-latest - Find available model names here: ai.google.dev/models/gemini
Here is an example of how to run the script with some options:
python extract_highlights.py -i ./path/to/my/images -o ./output/highlights_report.md -t 12 -s 5This command will:
- Process images in the
./path/to/my/imagesdirectory. - Save the formatted Markdown output to
./output/highlights_report.md. - Use a sorting tolerance of 12 pixels.
- Wait 5 seconds between each image processing step.
- Use the default Gemini model specified in the script.