molearn application to sampling RNA conformational space
This repository includes modified source codes of molearn version 1.0.0 (https://github.com/Degiacomi-Lab/molearn/tree/diffusion) and related files to apply molearn to RNA.
Included in this repository are the following:
- Modified files to execute molearn for RNA
- Molearn scripts ( 'protein_handler.py' and 'network.py' ) are in the 'src/molearn' folder.
- Amber force field parameters for RNA (frcmod.RNA.LJbb and nucleic12.LJbb-RNA.lib) are in the 'src/parameters' folder, released in the AmberTools22 package published under a GNU General Public Licence. They are used to calculate Torch potential energy function for RNA.
- Environment setup file (molearnA.yml) is in the 'src/environment' folder.
- detailing:
- Examples of how to execute training and conformation generation with molearn, which are found in the 'examples' folder.
Obtained from the other repository are the following:
- PDB files in the ' data' folder, discussed in Ikuo Kurisaki, Michiaki Hamada (2025). Deep learning generates apo RNA conformations
with cryptic ligand binding site, bioRxiv
- Training input data, consisting of 400 snapshot structures is 'HIV1TAR_in_Two_States.pdb'
and input PDB file for conformation generation with molearn models is 'Max_RMSd_Pair.pdb' . They should be downloaded from https://waseda.app.box.com/folder/300778804035?v=data-molearnA in advance, and should be copied in 'data' folder in your machine before running an example script. - Molearn generated conformations (e.g., MolGen_HIVTAR_from_Model-A.pdb) are found in the 'results/pdb' folder. They can be downloaded from https://waseda.app.box.com/folder/300778804035?v=data-molearnA.
- Training input data, consisting of 400 snapshot structures is 'HIV1TAR_in_Two_States.pdb'
- Results discussed in Ikuo Kurisaki, Michiaki Hamada (2025). Deep learning generates apo RNA conformations with cryptic ligand binding site, bioRxiv
- Trained molearn models in the 'results/model/' folder, which should be downloaded from https://waseda.app.box.com/folder/300778804035?v=data-molearnA in advance. Grid points for MV2003-binding conformations are given in the file 'results/moddel/Grid_Points_for_Conformations.txt' .
molearnA-main/
├── data/ * Input datasets for training and conformation generation
├── examples/ * sample scripts for training and conformation generation
├── results/ * Output results (QRNA opt. structures)
│ ├── model * Trained models by 1000 epoch
│ └── pdb * Generated conformations
└── src/ * Source code
├── molearn/ * Scripts for protein_handler and network
├── parameters/ * Source files for amber force field
└── environment/ * yml file from installation test
'data' , 'results/model' and 'results/pdb' are empty folders. Files should be obtained from https://waseda.app.box.com/folder/300778804035?v=data-molearnA.
We tested Molearn with the modified codes on Python 3.10 on Rocky Linux 8.6 and the following packages (and their associated packages):
- numpy
- PyTorch
- Biobox(https://github.com/Degiacomi-Lab/biobox)
- openMM (https://github.com/openmm/openmm) For other packages, dependencies are described in the ' src/environment/molearnA.yml' .
1) Install molearn into a local environment
* Download “molearn-diffusion.zip” from https://github.com/Degiacomi-Lab/molearn/tree/diffusion in advance.
* Download “molearnA-main.zip” from https://github.com/hmdlab/molearnA in advance.
* Copy “molearn-diffusion.zip” and “molearnA-main.zip” in your working directory
% cd PATH/to/the working directory
% unzip molearn-diffusion.zip
% cd molearn-diffusion
% conda create --name molearnA python=3.10
( or % conda env create -f PATH/TO/molearnA-main/src/environment/molearnA.yml)
% conda activate molearnA
% conda install numpy cython scipy pandas scikit-learn #<--required for installing biobox; If molearnA is created via the yml file, this step could be skipped
% pip install openmm #<-- install openmm
% git clone https://github.com/Degiacomi-Lab/biobox.git<--download biobox
% cd biobox
% pip install . #<--install biobox
% cd ../
% pip install . #<--install molearn
% conda install pytorch torchvision torchaudio cpuonly -c pytorch #<--install torch; If molearnA is created via the yml file, this step could be skipped
2) Modify molearn to apply it to RNA
* Download “molearnA-main.zip” from https://github.com/hmdlab/molearnA in advance.
* Copy items in the molearnA-main into the molearn installed directory
%cp PATH/To/molearnA-main/src/molearn/*.py PATH/To/conda_local/conda/envs/molearnA/lib/python3.10/site-packages/molearn
%cp PATH/To/molearnA-main/src/parameters/* PATH/To/conda_local/conda/envs/molearnA/lib/python3.10/site-packages/molearn/parameters
%cd PATH/To/conda_local/conda/envs/molearnA/lib/python3.10/site-packages/molearn
3) Run Examples
* Download PDB files for the example from https://waseda.app.box.com/folder/300778804035?v=data-molearnA and copy PATH/To/molearnA-main/data
* Download trained molearn models for the example from https://waseda.app.box.com/folder/300778804035?v=data-molearnA and copy PATH/To/molearnA-main/rusults/model
% cd PATH/To/molearnA-main/examples
% chmod +x *sh
% ./run_Traning_Molearn.sh
* Computation may take several hour with standard CPU machine.
* Using smaller 'iter_per_epoch' in 'Training_Molearn_example.py' is one option to perform a test run quickly.
% ./run_Generate_Conformation.sh
* Conformations are generated for a set of grid points (x, y), where x and y ranges from 0 to 1 with 0.01 interval.
* 101 files are generated and each of them has 101 snapshot structures.
* The filename is like MolGen__50__GenConf_with_Model-A.pdb, denoting that conformations are generated
* by results/model/molearn_network_1000_from_Try-A.pth (Labeled by A) and x is 0.49 ((50 -1)/100). y ranges from 0 to 1 by 0.01 interval.
* It is noted that, before further analyses, each of generated conformations should be refined
* by using molecular mechanics simulations such as QRNAS to relax unexpected steric distortions
* (see for details Ikuo Kurisaki, Michiaki Hamada (2025). Deep learning generates apo RNA conformations with cryptic ligand binding site, bioRxiv(https://doi.org/10.1101/2025.01.07.631832)).
If you use molearn with this modification in your work, besides the original study of molearn, V.K. Ramaswamy, S.C. Musson, C.G. Willcocks, M.T. Degiacomi (2021). Learning protein conformational space with convolutions and latent interpolations, Physical Review X 11 (https://journals.aps.org/prx/abstract/10.1103/PhysRevX.11.011052)
please cite: Ikuo Kurisaki, Michiaki Hamada (2025). Deep learning generates apo RNA conformations with cryptic ligand binding site, bioRxiv (https://doi.org/10.1101/2025.01.07.631832)
If you have any issues or questions please contact mhamada@waseda.jp; ikuo.kurisaki@aoni.waseda.jp.