Automated screening of academic articles using Large Language Models (LLMs) via OpenRouter API.
This script processes a CSV file of academic articles (with title and abstract columns), queries multiple LLM models for structured relevance decisions, and outputs an enriched CSV with the results.
It is meant to be a command line equivalent of the AISysRev web tool.
- Bibliographic import:
bib2csv.pyconverts Web of Science / Scopus.bibexports and PubMed MEDLINE.txtexports to a CSV ready for screening. Multiple files can be merged in one call. - Structured LLM Responses: Uses Pydantic AI to enforce structured JSON output from LLMs.
- Concurrent API Calls: Efficiently processes multiple articles and models in parallel
- Model selection: Copy the example file and adjust models you want to run from OpenRouter your settings:
cp models.conf.example models.conf
- Inclusion / Exclusion Criteria: Customize inclusion/exclusion criteria and instructions by copying the example file and adjusting it:
cp criteria.conf.example criteria.conf
- Boolean screening:
screen_boolean.pyscreens papers per-criterion — each criterion is sent as a separate LLM call. Per-criterion probabilities are combined using fuzzy boolean logic (AND=MIN, OR=MAX, NOT=1−p) over a criteria tree defined in YAML. A final binary include/exclude decision is derived from the overall probability (threshold 0.5). - Error Handling: Semi-robust retry logic and error reporting.
- Progress Tracking: Real-time progress bars with
tqdm.
The SESR-Eval benchmark dataset 1 (Zenodo record 16408882) contains title-abstract screening data for evaluating LLMs in systematic literature reviews.
python download_sesr.py # downloads and extracts to data/sesr/
python download_sesr.py -o mydir/ # custom output directory
python download_sesr.py --no-extract # keep package.zip onlyEach extracted subfolder in data/sesr/ corresponds to one systematic review and contains:
primary_correct.csv— papers to screen (the authoritative version for that review)secondary_study_data.csv— review metadatacriteria.conf— inclusion/exclusion criteria (ready forscreen.py -c)instructions.txt— prompt instructions (ready forscreen.py -i)
After running download_sesr.py you can screen each of the primary studies in each systematic review folder, for example:
python screen.py data/sesr/A_decade_of_code_comment/primary_correct.csv -c data/sesr/A_decade_of_code_comment/criteria.conf -i data/sesr/A_decade_of_code_comment/instructions.txtThen you can compare the screening results against the labels in primary_correct.csv.
The SYNERGY dataset 2 contains 26 systematic review datasets with title-abstract screening data for evaluating LLMs. Papers are fetched from OpenAlex.
python download_synergy.py # download all active datasets to data/synergy
python download_synergy.py --list # list available datasets
python download_synergy.py Wolters_2018 # download one datasetEach extracted subfolder in data/synergy/ corresponds to one systematic review and contains:
primary_correct.csv— papers to screen (title, abstract, and ground-truthlabel_included)criteria.conf— eligibility criteria extracted from the SYNERGY dataset index
After running download_synergy.py you can screen each of the primary studies in each systematic review folder, for example:
python screen.py data/synergy/Wolters_2018/primary_correct.csv -c data/synergy/Wolters_2018/criteria.confThen you can compare the screening results against the label_included column in primary_correct.csv.
If your papers are in a database export rather than a CSV, convert them first:
# Single file (WOS, Scopus, or PubMed)
python bib2csv.py export.bib -o papers.csv
python bib2csv.py pubmed_export.txt -o papers.csv
# Merge multiple files from different databases
python bib2csv.py wos.bib scopus.bib pubmed.txt -o papers.csv
# Wildcards also work
python bib2csv.py bibs/* -o papers.csvSupported formats:
- Web of Science —
.bibexport (@article{ WOS:... }) - Scopus —
.bibexport (@ARTICLE{...}) - PubMed — MEDLINE tagged
.txtexport (lines starting withPMID-)
Output CSV always contains: title, abstract, doi, year, authors, journal, keywords, source_db, entry_key
After customizing as shown above, you can run it
uv run screen.py <csv_file_with_columns_named_title_and_abstract> -n allYou can also customize input files.
uv run screen.py <csv_file_with_columns_named_title_and_abstract> -n <number of papers> -c <criteria_file> -m <models_file>Then you see output like this:

After that an enriched CSV file is produced with LLM responses.
Not all models return valid responses, e.g., older Llama models. Default setting of screen.py only runs 10 first rows, which is handy if you want to collects statistics on how good the models performs in terms of giving valid JSON responses and not blowing up your OpenRouter credits. If you choose your models carefully you may see good success rate in getting valid output, as in image below

Ensure your CSV is properly escaped. Google sheet to CSV export may produce CSV lines that contain line breaks without being properly escaped. If that is the case run find "\n" replace " " in Google sheet with regular experssions enabled before exporting to CSV.
[1] Huotala A, Kuutila M, Mäntylä M. SESR-Eval: Dataset for Evaluating LLMs in the Title-Abstract Screening of Systematic Reviews. In Proceedings of the The 19th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) 2025 Oct 2 (pp. 1-12) https://arxiv.org/abs/2507.19027 IEEE
[2] De Bruin, Jonathan; Ma, Yongchao; Ferdinands, Gerbrich; Teijema, Jelle; Van de Schoot, Rens, 2023, "SYNERGY - Open machine learning dataset on study selection in systematic reviews", https://doi.org/10.34894/HE6NAQ, DataverseNL, V1 https://github.com/asreview/synergy-dataset