Skip to content

FunctionLab/mahi

Repository files navigation

MAHI Logo

Multi-modal tissue-aware graph neural network for in silico genetic discovery

📄 Manuscript • 🛠️ Installation • 📦 Data • 🧪 Demo • 🧬 Embedding Generation • 🔬 Perturbation Analysis


Installation

Recommended Installation (using 'environment.yaml')

# clone GitHub repository
git clone https://github.com/FunctionLab/mahi.git
cd mahi

# create Conda environment from YAML
conda env create -f environment.yaml
conda activate mahi

# install PyTorch Geometric dependencies
pip install torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-2.1.0+cu121.html

Manual Installation

# create new Conda environment
conda create --name mahi python=3.10 pytorch=2.1 torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda activate mahi

# install dependencies
pip install "numpy<2"
pip install torch-geometric wandb pytorch-lightning ipykernel umap-learn biopython pyfaidx seaborn xgboost
conda install scikit-learn matplotlib pandas -c conda-forge

# install PyTorch Geometric dependencies
pip install torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-2.1.0+cu121.html

# install transformers package
pip install "transformers[torch]"

Data

Download required datasets from:

https://drive.google.com/drive/folders/1xWfPkC8bs3aQCsI6YMqYpXnSn6f6E1-B?usp=share_link

Then unzip into the repository root:

unzip <data>.zip

Demo: gene essentiality prediction

This demo runs gene essentiality prediction on one cell line to verify your set up (takes ~30 minutes depending on your setup):

# attach gene essentiality labels to Mahi demo embeddings for lung tissue
python scripts/gene_essentiality/add_labels.py \
  --mahi_root data/demo/mahi_embeddings_lung \
  --data_dir data/demo

# evaluate gene essentiality (5-fold CV + test eval)
python scripts/gene_essentiality/evaluate_mahi_gene_essentiality.py \
  --out_dir outputs/demo \
  --mahi_root data/demo/mahi_embeddings_lung \
  --mapping_file resources/cell_lines.txt \
  --cell_line ACH-000012 # cell line associated with lung tissue

Optional (HPC/SLURM)

For much faster runtime on CPUs, you can also submit the demo as a SLURM job:

sbatch demo.slurm

Outputs

outputs/demo/mahi_gene_essentiality_eval/
  ├── mahi.metrics_by_cellline_and_tissue.csv    # summary metrics on training set
  ├── cv_preds/                                  # per-gene out-of-fold predictions
  └── test_preds/                                # per-gene test predictions

Mahi: End-to-end

Mahi can be run entirely on CPU (unless you are re-training the multigraph GNN).

Generate Mahi embeddings

Processing functional networks

Please download the functional networks using the links from the manuscript and convert .dab files to .dat format using Dat2Dab from Sleipnir (https://github.com/FunctionLab/sleipnir.git).

./sleipnir/build/tools/Dat2Dab -i data/dab_networks/<data.dab> -o data/dat_networks/<data.dat>

After conversion, filter networks to the top 3% of edges (recommended on SLURM):

sbatch scripts/networks/process_networks.slurm

If you do not have SLURM, you can run the same script locally:

bash scripts/networks/process_networks.slurm

This generates filtered networks in:

data/dat_networks/*_filtered_top3.dat

Mahi embeddings for single tissue

python wt_mahi.py \
  --dir data \
  --tissue lung \
  --checkpoint checkpoints/best-checkpoint.ckpt

Multiple tissues

python wt_mahi.py \
  --dir data \
  --tissues lung heart kidney \
  --checkpoint checkpoints/best-checkpoint.ckpt

Multiple tissues from a file

tissues.txt

# tissues.txt
lung
heart
colon
python wt_mahi.py \
  --dir data \
  --tissues_txt tissues.txt \
  --checkpoint checkpoints/best-checkpoint.ckpt

Perturbation (gene KO) analysis

You can specify a single tissue (--tissue), multiple tissues (--tissues), or provide a tissue list file (--tissues_txt).

python perturb_mahi.py \
  --dir data \
  --gene <Entrez ID> \
  --tissue lung \
  --checkpoint checkpoints/best-checkpoint.ckpt

Rank perturbation effects

You can specify a single tissue (--tissue), multiple tissues (--tissues), or provide a tissue list file (--tissues_txt).

python get_top_genes.py \
  --dir data \
  --gene <Entrez ID> \
  --tissue lung \
  --avg resources/averaged_distances.csv \
  --top 1000

Citation

If you use Mahi in your research, please cite:

@article{aggarwal2026mahi,
  title   = {Multi-modal tissue-aware graph neural network for in silico genetic discovery},
  author  = {Aggarwal, Anusha and Sokolova, Ksenia and Troyanskaya, Olga G},
  journal = {bioRxiv},
  year    = {2026},
  month   = feb,
  doi     = {10.64898/2026.02.17.706433},
  url     = {https://www.biorxiv.org/content/10.64898/2026.02.17.706433v1},
}

About

AI framework for in silico genetic perturbation in tissue & cell-type context.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published