This repository contains the open-source evaluation toolkit released alongside the paper "From Rules to Neural Graphs: Scalable Structured Prediction for Patent Prior Art Search".
invention_graph_eval/ # Evaluation library (pip-installable)
│ ├── graph_model.py # Graph/Node pydantic models
│ ├── human_readable.py # ASCII graph rendering
│ ├── metrics.py # SetMetrics, FeatureGraphMetrics
│ ├── loader.py # Dataset loading utilities
│ └── tokenize.py # Word-level tokenizer
data/
│ ├── evaluation.tar.gz # 5,000-patent evaluation set (gold-standard)
│ └── train_sample.tar.gz # 5,000-patent training sample
tests/ # Unit tests (no data required)
evaluate.py # CLI evaluation script
requirements.txt
pyproject.toml
pip install -e ".[dev]"
# or: pip install -r requirements.txtThe datasets are included as compressed archives in data/:
cd data
tar xzf evaluation.tar.gz
tar xzf train_sample.tar.gzpython -m pytest tests/ -v
# Validate extracted data files
python -m pytest tests/test_data_validation.py --data-dir data/evaluation -vpython evaluate.py \
--testset data/evaluation \
--predictions my_predictions/ \
--output report.csvBoth --testset and --predictions are directories with the same layout:
<dir>/
patent_meta.csv
json/<ucid>.json
If the testset also contains a targets/ directory with word-level CSV labels,
word coverage metrics are included in the output.
<dataset_root>/
patent_meta.csv # metadata per patent
json/<ucid>.json # gold-standard graph (JSON)
targets/<ucid>.csv # word-level token labels
| Column | Description |
|---|---|
ucid |
Unique patent identifier |
id |
Internal integer ID |
family_id |
Patent family ID |
country |
2-letter country code |
ipc_codes |
IPC codes (pipe-separated) |
coverage |
Fraction of claim words in the graph |
length |
Number of tokens in the patent claim |
| Column | Description |
|---|---|
id |
Token index |
text |
Token text |
parent |
Index of the parent token in the feature tree |
edge |
Edge type: none, normal, meronym, hyponym, syntactic, method, reference |
relation |
Index of the relation root token (0 if none) |
The graph JSON follows the pydantic schema in invention_graph_eval/graph_model.py:
{
"items": [
{
"id": 1,
"type": "feature",
"value": "pump",
"items": [
{
"id": 2,
"type": "relation",
"value": [
{"ref": 1},
{"text": "connects to"},
{"ref": 3}
],
"items": []
}
]
}
]
}| Metric | Description |
|---|---|
whole graph iou |
Mean IoU of the edge sets (predicted vs gold) |
whole graph iou@100 |
Fraction of patents with perfect IoU (= 1.0) |
whole graph iou@90 |
Fraction of patents with IoU ≥ 0.9 |
whole graph p |
Mean precision of edge sets |
whole graph r |
Mean recall of edge sets |
targets coverage |
Fraction of input-text vocabulary in the gold-standard graph |
predictions coverage |
Fraction of input-text vocabulary in the predicted graph |
targets graph depth |
Mean branch depth of gold-standard graphs |
targets relation count |
Mean number of relation nodes in gold-standard graphs |
The primary benchmark metric is whole graph iou.
from invention_graph_eval.loader import load_dataset
from invention_graph_eval.metrics import FeatureGraphMetrics
from invention_graph_eval.graph_model import Graph, GraphWithNodeMap
# Load the evaluation set
meta, target_graphs, input_texts = load_dataset("data/evaluation")
# Build your predictions (list of GraphWithNodeMap, same order as meta)
# ...
metrics = FeatureGraphMetrics(predictions, target_graphs, input_texts)
print(metrics.stats())See LICENSE.