Multi-modal tissue-aware graph neural network for in silico genetic discovery

📄 Manuscript • 🛠️ Installation • 📦 Data • 🧪 Demo • 🧬 Embedding Generation • 🔬 Perturbation Analysis

Installation

Recommended Installation (using 'environment.yaml')

# clone GitHub repository
git clone https://github.com/FunctionLab/mahi.git
cd mahi

# create Conda environment from YAML
conda env create -f environment.yaml
conda activate mahi

# install PyTorch Geometric dependencies
pip install torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-2.1.0+cu121.html

Manual Installation

# create new Conda environment
conda create --name mahi python=3.10 pytorch=2.1 torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda activate mahi

# install dependencies
pip install "numpy<2"
pip install torch-geometric wandb pytorch-lightning ipykernel umap-learn biopython pyfaidx seaborn xgboost
conda install scikit-learn matplotlib pandas -c conda-forge

# install PyTorch Geometric dependencies
pip install torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-2.1.0+cu121.html

# install transformers package
pip install "transformers[torch]"

Data

Download required datasets from:

https://drive.google.com/drive/folders/1xWfPkC8bs3aQCsI6YMqYpXnSn6f6E1-B?usp=share_link

Then unzip into the repository root:

unzip <data>.zip

Demo: gene essentiality prediction

This demo runs gene essentiality prediction on one cell line to verify your set up (takes ~30 minutes depending on your setup):

# attach gene essentiality labels to Mahi demo embeddings for lung tissue
python scripts/gene_essentiality/add_labels.py \
  --mahi_root data/demo/mahi_embeddings_lung \
  --data_dir data/demo

# evaluate gene essentiality (5-fold CV + test eval)
python scripts/gene_essentiality/evaluate_mahi_gene_essentiality.py \
  --out_dir outputs/demo \
  --mahi_root data/demo/mahi_embeddings_lung \
  --mapping_file resources/cell_lines.txt \
  --cell_line ACH-000012 # cell line associated with lung tissue

Optional (HPC/SLURM)

For much faster runtime on CPUs, you can also submit the demo as a SLURM job:

sbatch demo.slurm

Outputs

outputs/demo/mahi_gene_essentiality_eval/
  ├── mahi.metrics_by_cellline_and_tissue.csv    # summary metrics on training set
  ├── cv_preds/                                  # per-gene out-of-fold predictions
  └── test_preds/                                # per-gene test predictions

Mahi: End-to-end

Mahi can be run entirely on CPU (unless you are re-training the multigraph GNN).

Generate Mahi embeddings

Processing functional networks

Please download the functional networks using the links from the manuscript and convert .dab files to .dat format using Dat2Dab from Sleipnir (https://github.com/FunctionLab/sleipnir.git).

./sleipnir/build/tools/Dat2Dab -i data/dab_networks/<data.dab> -o data/dat_networks/<data.dat>

After conversion, filter networks to the top 3% of edges (recommended on SLURM):

sbatch scripts/networks/process_networks.slurm

If you do not have SLURM, you can run the same script locally:

bash scripts/networks/process_networks.slurm

This generates filtered networks in:

data/dat_networks/*_filtered_top3.dat

Mahi embeddings for single tissue

python wt_mahi.py \
  --dir data \
  --tissue lung \
  --checkpoint checkpoints/best-checkpoint.ckpt

Multiple tissues

python wt_mahi.py \
  --dir data \
  --tissues lung heart kidney \
  --checkpoint checkpoints/best-checkpoint.ckpt

Multiple tissues from a file

tissues.txt

# tissues.txt
lung
heart
colon

python wt_mahi.py \
  --dir data \
  --tissues_txt tissues.txt \
  --checkpoint checkpoints/best-checkpoint.ckpt

Perturbation (gene KO) analysis

You can specify a single tissue (--tissue), multiple tissues (--tissues), or provide a tissue list file (--tissues_txt).

python perturb_mahi.py \
  --dir data \
  --gene <Entrez ID> \
  --tissue lung \
  --checkpoint checkpoints/best-checkpoint.ckpt

Rank perturbation effects

You can specify a single tissue (--tissue), multiple tissues (--tissues), or provide a tissue list file (--tissues_txt).

python get_top_genes.py \
  --dir data \
  --gene <Entrez ID> \
  --tissue lung \
  --avg resources/averaged_distances.csv \
  --top 1000

Citation

If you use Mahi in your research, please cite:

@article{aggarwal2026mahi,
  title   = {Multi-modal tissue-aware graph neural network for in silico genetic discovery},
  author  = {Aggarwal, Anusha and Sokolova, Ksenia and Troyanskaya, Olga G},
  journal = {bioRxiv},
  year    = {2026},
  month   = feb,
  doi     = {10.64898/2026.02.17.706433},
  url     = {https://www.biorxiv.org/content/10.64898/2026.02.17.706433v1},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-modal tissue-aware graph neural network for in silico genetic discovery

Installation

Recommended Installation (using 'environment.yaml')

Manual Installation

Data

Demo: gene essentiality prediction

Optional (HPC/SLURM)

Outputs

Mahi: End-to-end

Generate Mahi embeddings

Processing functional networks

Mahi embeddings for single tissue

Multiple tissues

Multiple tissues from a file

Perturbation (gene KO) analysis

Rank perturbation effects

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.ipynb_checkpoints		.ipynb_checkpoints
checkpoints		checkpoints
resources		resources
scripts		scripts
.gitignore		.gitignore
README.md		README.md
demo.slurm		demo.slurm
environment.yaml		environment.yaml
get_top_genes.py		get_top_genes.py
mahi_cover.png		mahi_cover.png
perturb_mahi.py		perturb_mahi.py
wt_mahi.py		wt_mahi.py

FunctionLab/mahi

Folders and files

Latest commit

History

Repository files navigation

Multi-modal tissue-aware graph neural network for in silico genetic discovery

Installation

Recommended Installation (using 'environment.yaml')

Manual Installation

Data

Demo: gene essentiality prediction

Optional (HPC/SLURM)

Outputs

Mahi: End-to-end

Generate Mahi embeddings

Processing functional networks

Mahi embeddings for single tissue

Multiple tissues

Multiple tissues from a file

Perturbation (gene KO) analysis

Rank perturbation effects

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages