Skip to content

Metagenome read simulation of multiple synthetic communities

License

Notifications You must be signed in to change notification settings

nick-youngblut/MGSIM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

139 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MGSIM Upload Python Package PyPI version

MGSIM

Metagenome read simulation of multiple synthetic communities

Sections

REFERENCE

DOI

DESCRIPTION

Straight-forward simulations of metagenome data from a collection of reference bacterial/archaeal genomes.

Highlights

  • Can simulate Illumina, PacBio, and/or Nanopore reads
    • For Illumina, synthetic long reads (read clouds) can also be simulated
  • Generate communities differing in:
    • Sequencing depth
    • Richness
    • Beta diversity

The workflow:

  • [optional] Download reference genomes
  • Format reference genomes
    • e.g., rename contigs
  • Simulate communities
  • Simulate reads for each community

INSTALLATION

Dependencies

See environment.yml for a list of dependencies.

You can install via:

mamba env create -f environment.yml -n mgsim

mamba is much faster than conda

Install

via pip

pip install MGSIM

via setup.py

python setpy.py install

Testing

  • conda-forge::pytest>=5.3
  • conda-forge::pytest-console-scripts>=1.2

In the MGSIM base directory, use the command pytest to run all of the tests.

To run tests on a particular test file:

pytest -s --script-launch-mode=subprocess path/to/the/test/file

Example:

pytest -s --script-launch-mode=subprocess ./tests/test_Reads.py

HOW-TO

See all subcommands:

MGSIM --list

Download genomes

MGSIM genome_download -h

Simulate communities

MGSIM communities -h

Simulate reads for each genome in each community

Simulating Illumina, PacBio, and/or Nanopore reads

MGSIM reads -h

Simulating haplotagging reads (aka read-cloud data)

MGSIM ht_reads -h

Tutorial

Reference genome download

Create Taxon-accession table

mkdir -p tutorial

cat <<-EOF > tutorial/taxon_accession.tsv
Taxon	Accession
Escherichia coli O104-H4	NC_018658.1
Clostridium perfringens ATCC.13124	NC_008261
Methanosarcina barkeri [MS]	NZ_CP009528.1
EOF

Download genomes

MGSIM genome_download -d tutorial/ tutorial/taxon_accession.tsv > tutorial/genomes.tsv

Simulate communities

Simulate 2 communities

MGSIM communities --n-comm 2 tutorial/genomes.tsv tutorial/communities

Simulate reads

Illumina reads

MGSIM reads tutorial/genomes.tsv --sr-seq-depth 1e5 tutorial/communities_abund.txt tutorial/illumina_reads/

PacBio reads

MGSIM reads tutorial/genomes.tsv --pb-seq-depth 1e3 tutorial/communities_abund.txt tutorial/pacbio_reads/

Nanopore reads

MGSIM reads tutorial/genomes.tsv --np-seq-depth 1e3 tutorial/communities_abund.txt tutorial/nanopore_reads/

LICENSE

See LICENSE

About

Metagenome read simulation of multiple synthetic communities

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages