Graph Analysis Navigator for Discovery And Link Finding
- Compressed Sparse Row (CSR) graph representation for memory efficiency
- Bidirectional search for optimal performance
- O(1) property lookups via hash indexing
- Predicate filtering to reduce path explosion
- Batch property enrichment for fast results
- Diagnostic tools to understand path counts
pip install -e .from gandalf import build_graph_from_jsonl
# Build with ontology filtering
graph = build_graph_from_jsonl(
edges_path="data/raw/edges.jsonl",
nodes_path="data/raw/nodes.jsonl",
excluded_predicates={'biolink:subclass_of'}
)
# Save for fast loading
graph.save("data/processed/graph_filtered.pkl")from gandalf import CSRGraph, find_paths
# Load graph (takes ~1-2 seconds)
graph = CSRGraph.load("data/processed/graph.pkl")
# Find paths
paths = find_paths(
graph,
start_id="CHEBI:45783",
end_id="MONDO:0004979"
)
print(f"Found {len(paths)} paths")from gandalf import find_paths_filtered
# Only mechanistic relationships
paths = find_paths_filtered(
graph,
start_id="CHEBI:45783",
end_id="MONDO:0004979",
allowed_predicates={
'biolink:treats',
'biolink:affects',
'biolink:has_metabolite'
}
)The package uses a three-stage pipeline:
- Topology Search (fast) - Find all paths using indices only
- Filtering (medium) - Apply business logic on necessary node or edge properties
- Enrichment (batch) - Load all properties for final paths only
This separation allows filtering millions of paths before expensive property lookups.