Systematic Review Screening with LLMs via OpenRouter

Automated screening of academic articles using Large Language Models (LLMs) via OpenRouter API.

This script processes a CSV file of academic articles (with title and abstract columns), queries multiple LLM models for structured relevance decisions, and outputs an enriched CSV with the results. It is meant to be a command line equivalent of the AISysRev web tool.

Features

Bibliographic import: bib2csv.py converts Web of Science / Scopus .bib exports and PubMed MEDLINE .txt exports to a CSV ready for screening. Multiple files can be merged in one call.
Structured LLM Responses: Uses Pydantic AI to enforce structured JSON output from LLMs.
Concurrent API Calls: Efficiently processes multiple articles and models in parallel
Model selection: Copy the example file and adjust models you want to run from OpenRouter your settings:
```
cp models.conf.example models.conf
```
Inclusion / Exclusion Criteria: Customize inclusion/exclusion criteria and instructions by copying the example file and adjusting it:
```
cp criteria.conf.example criteria.conf
```
Boolean screening: screen_boolean.py screens papers per-criterion — each criterion is sent as a separate LLM call. Per-criterion probabilities are combined using fuzzy boolean logic (AND=MIN, OR=MAX, NOT=1−p) over a criteria tree defined in YAML. A final binary include/exclude decision is derived from the overall probability (threshold 0.5).
Error Handling: Semi-robust retry logic and error reporting.
Progress Tracking: Real-time progress bars with tqdm.

Downloading benchmark datasets

SESR-Eval

The SESR-Eval benchmark dataset 1 (Zenodo record 16408882) contains title-abstract screening data for evaluating LLMs in systematic literature reviews.

python download_sesr.py            # downloads and extracts to data/sesr/
python download_sesr.py -o mydir/  # custom output directory
python download_sesr.py --no-extract  # keep package.zip only

Each extracted subfolder in data/sesr/ corresponds to one systematic review and contains:

primary_correct.csv — papers to screen (the authoritative version for that review)
secondary_study_data.csv — review metadata
criteria.conf — inclusion/exclusion criteria (ready for screen.py -c)
instructions.txt — prompt instructions (ready for screen.py -i)

After running download_sesr.py you can screen each of the primary studies in each systematic review folder, for example:

python screen.py data/sesr/A_decade_of_code_comment/primary_correct.csv -c data/sesr/A_decade_of_code_comment/criteria.conf -i data/sesr/A_decade_of_code_comment/instructions.txt

Then you can compare the screening results against the labels in primary_correct.csv.

Synergy

The SYNERGY dataset 2 contains 26 systematic review datasets with title-abstract screening data for evaluating LLMs. Papers are fetched from OpenAlex.

python download_synergy.py                 # download all active datasets to data/synergy
python download_synergy.py --list          # list available datasets
python download_synergy.py Wolters_2018    # download one dataset

Each extracted subfolder in data/synergy/ corresponds to one systematic review and contains:

primary_correct.csv — papers to screen (title, abstract, and ground-truth label_included)
criteria.conf — eligibility criteria extracted from the SYNERGY dataset index

After running download_synergy.py you can screen each of the primary studies in each systematic review folder, for example:

python screen.py data/synergy/Wolters_2018/primary_correct.csv -c data/synergy/Wolters_2018/criteria.conf

Then you can compare the screening results against the label_included column in primary_correct.csv.

Importing bibliographic files

If your papers are in a database export rather than a CSV, convert them first:

# Single file (WOS, Scopus, or PubMed)
python bib2csv.py export.bib -o papers.csv
python bib2csv.py pubmed_export.txt -o papers.csv
# Merge multiple files from different databases
python bib2csv.py wos.bib scopus.bib pubmed.txt -o papers.csv
# Wildcards also work
python bib2csv.py bibs/* -o papers.csv

Supported formats:

Web of Science — .bib export (@article{ WOS:... })
Scopus — .bib export (@ARTICLE{...})
PubMed — MEDLINE tagged .txt export (lines starting with PMID-)

Output CSV always contains: title, abstract, doi, year, authors, journal, keywords, source_db, entry_key

After customizing as shown above, you can run it

uv run screen.py <csv_file_with_columns_named_title_and_abstract> -n all

You can also customize input files.

uv run screen.py <csv_file_with_columns_named_title_and_abstract> -n <number of papers> -c <criteria_file> -m <models_file>

Then you see output like this:

After that an enriched CSV file is produced with LLM responses.

Known Issues

Not all models return valid responses, e.g., older Llama models. Default setting of screen.py only runs 10 first rows, which is handy if you want to collects statistics on how good the models performs in terms of giving valid JSON responses and not blowing up your OpenRouter credits. If you choose your models carefully you may see good success rate in getting valid output, as in image below

Ensure your CSV is properly escaped. Google sheet to CSV export may produce CSV lines that contain line breaks without being properly escaped. If that is the case run find "\n" replace " " in Google sheet with regular experssions enabled before exporting to CSV.

References

[1] Huotala A, Kuutila M, Mäntylä M. SESR-Eval: Dataset for Evaluating LLMs in the Title-Abstract Screening of Systematic Reviews. In Proceedings of the The 19th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) 2025 Oct 2 (pp. 1-12) https://arxiv.org/abs/2507.19027 IEEE

[2] De Bruin, Jonathan; Ma, Yongchao; Ferdinands, Gerbrich; Teijema, Jelle; Van de Schoot, Rens, 2023, "SYNERGY - Open machine learning dataset on study selection in systematic reviews", https://doi.org/10.34894/HE6NAQ, DataverseNL, V1 https://github.com/asreview/synergy-dataset

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
prompts		prompts
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
async_api.py		async_api.py
bib2csv.py		bib2csv.py
classify.py		classify.py
classify_single.py		classify_single.py
criteria.conf.example		criteria.conf.example
criteria_classify.yml.example		criteria_classify.yml.example
criteria_screen_boolean.yml		criteria_screen_boolean.yml
download_sesr.py		download_sesr.py
download_synergy.py		download_synergy.py
embed.conf.example		embed.conf.example
embed.py		embed.py
generate_classes.py		generate_classes.py
helpers.py		helpers.py
json_instruction_prompt.txt		json_instruction_prompt.txt
models.conf.example		models.conf.example
plot.py		plot.py
pyproject.toml		pyproject.toml
response_schema.json		response_schema.json
screen.py		screen.py
screen_boolean.py		screen_boolean.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Systematic Review Screening with LLMs via OpenRouter

Features

Downloading benchmark datasets

SESR-Eval

Synergy

Importing bibliographic files

Known Issues

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

EvoTestOps/AISysRevCmdLine

Folders and files

Latest commit

History

Repository files navigation

Systematic Review Screening with LLMs via OpenRouter

Features

Downloading benchmark datasets

SESR-Eval

Synergy

Importing bibliographic files

Known Issues

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages