GitHub - Bhattacharya-Lab/RNAbpFlow: Base pair-augmented SE(3)-flow matching for conditional RNA 3D structure generation

RNAbpFlow: Base pair-augmented SE(3)-flow matching for conditional RNA 3D structure generation

by Sumit Tarafder and Debswapna Bhattacharya

Installation

Use mamba to create a virtual environment and install dependencies for RNAbpFlow.

conda install -n base -c conda-forge mamba
mamba env create -f RNAbpFlow.yml

Activate the virtual environment

conda activate RNAbpFlow

Typical installation time on a "normal" desktop computer should take a few minutes in a 64-bit Linux system.

Usage

Instructions for running RNAbpFlow:

Place your FASTA sequences and base pair maps (three) in the Inputs folder per target (examples are provided). Optionally, inference may be performed using a single base pair map (.npy) per sequence which will be replicated three times to match the required channel dimension.
Add a list of PDB IDs to list.txt inside the Inputs folder (an example is included).
Each line in list.txt contains a target ID with the number of sample structures to generate, separated by space. If not specified, RNAbpFlow will use the default value specified in the configuration file in the "configs" folder.
Download the trained checkpoints from here and place the checkpoint folder in this repository.

The default checkpoint configured is: RNA3DB.ckpt
For CASP15 evaluation, edit the configs/inference.yaml to configure the "ckpt_path" field with checkpoint/CASP15.ckpt and checkpoint/CASP16.ckpt for CASP16 or prediction in general.

Run this command to generate sample 3D structures.
```
python3 inference.py
```
RNAbpFlow will generate 3D structures in the specified format ("pdb", "mmcif/PDBx" or both) for each ID listed in "list.txt" and place them inside the 'Prediction' folder.

Inference time to sample 10 RNA 3D structures for a typical RNA (~70 nucleotides) should take ~25 seconds on 1 GPU.
We have provided multi-GPU support for large-scale sample generation. GPU usage can be configured in the configuration file (inference.yaml).

Datasets

List of targets used in training and benchmarking are available here.
For training and benchmarking, we used the train-test split provided by RNA3DB available here. We downloadeded the 2024-04-26 RNA3DB release.
For sampling performance comparison with RNAJP, we downloaded their decoy set from here and the corresponding native structures from PDB.
For CASP16 blind benchmarking, we used the entire RNA3DB dataset available here. We downloaded the same 2024-04-26 RNA3DB release. For training data augmentation via cross-distillation, we downloaded the bpRNA-1m (90) dataset from here.
For CASP15 blind benchmarking, we filtered the RNA3DB release to collect all the chains released in PDB before 2022-04-26.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Inputs		Inputs
Predictions		Predictions
configs		configs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
RNAbpFlow.png		RNAbpFlow.png
RNAbpFlow.yml		RNAbpFlow.yml
inference.py		inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RNAbpFlow: Base pair-augmented SE(3)-flow matching for conditional RNA 3D structure generation

Installation

Usage

Datasets

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Bhattacharya-Lab/RNAbpFlow

Folders and files

Latest commit

History

Repository files navigation

RNAbpFlow: Base pair-augmented SE(3)-flow matching for conditional RNA 3D structure generation

Installation

Usage

Datasets

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages