Skip to content

Bhattacharya-Lab/RNAbpFlow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RNAbpFlow: Base pair-augmented SE(3)-flow matching for conditional RNA 3D structure generation

by Sumit Tarafder and Debswapna Bhattacharya

[bioRxiv] [pdf]

alt text

Installation

  1. Use mamba to create a virtual environment and install dependencies for RNAbpFlow.
conda install -n base -c conda-forge mamba
mamba env create -f RNAbpFlow.yml
  1. Activate the virtual environment
conda activate RNAbpFlow

Typical installation time on a "normal" desktop computer should take a few minutes in a 64-bit Linux system.

Usage

Instructions for running RNAbpFlow:

  1. Place your FASTA sequences and base pair maps (three) in the Inputs folder per target (examples are provided). Optionally, inference may be performed using a single base pair map (.npy) per sequence which will be replicated three times to match the required channel dimension.
  2. Add a list of PDB IDs to list.txt inside the Inputs folder (an example is included).
  3. Each line in list.txt contains a target ID with the number of sample structures to generate, separated by space. If not specified, RNAbpFlow will use the default value specified in the configuration file in the "configs" folder.
  4. Download the trained checkpoints from here and place the checkpoint folder in this repository.
  • The default checkpoint configured is: RNA3DB.ckpt
  • For CASP15 evaluation, edit the configs/inference.yaml to configure the "ckpt_path" field with checkpoint/CASP15.ckpt and checkpoint/CASP16.ckpt for CASP16 or prediction in general.
  1. Run this command to generate sample 3D structures.

    python3 inference.py
    
  2. RNAbpFlow will generate 3D structures in the specified format ("pdb", "mmcif/PDBx" or both) for each ID listed in "list.txt" and place them inside the 'Prediction' folder.

  • Inference time to sample 10 RNA 3D structures for a typical RNA (~70 nucleotides) should take ~25 seconds on 1 GPU.
  • We have provided multi-GPU support for large-scale sample generation. GPU usage can be configured in the configuration file (inference.yaml).

Datasets

  • List of targets used in training and benchmarking are available here.
  • For training and benchmarking, we used the train-test split provided by RNA3DB available here. We downloadeded the 2024-04-26 RNA3DB release.
  • For sampling performance comparison with RNAJP, we downloaded their decoy set from here and the corresponding native structures from PDB.
  • For CASP16 blind benchmarking, we used the entire RNA3DB dataset available here. We downloaded the same 2024-04-26 RNA3DB release. For training data augmentation via cross-distillation, we downloaded the bpRNA-1m (90) dataset from here.
  • For CASP15 blind benchmarking, we filtered the RNA3DB release to collect all the chains released in PDB before 2022-04-26.

About

Base pair-augmented SE(3)-flow matching for conditional RNA 3D structure generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages