CUT&RUN Data Processing Pipeline

This repository provides a pipeline for processing CUT&RUN data, encompassing steps for trimming reads, aligning them to both primary and spike-in genomes, deduplicating reads, and calling peaks.

Pipeline Overview

Trim Reads: Remove adapter sequences and low-quality bases from raw sequencing reads.
Align Reads to Primary Genome: Map trimmed reads to the primary reference genome.
Align Reads to Spike-in Genome: Map trimmed reads to the spike-in reference genome for normalization purposes.
Deduplicate Reads: Eliminate PCR duplicates to ensure data accuracy.

Warning

The peak calling step is still under development and is only tunned for human data at the moment.

Getting Started

Ensure that the following tools are installed and accessible in your system's PATH:

Fastp: For trimming adapter sequences
Bowtie2: For aligning reads to reference genomes
Samtools: For handling SAM/BAM files
Picard: For deduplication of reads
Deeptools: For downstream analysis (not needed to run the pipeline)

Or, you can use the provided conda yaml recipe to create a suited environment.

Use the pipeline

The pipeline can be launched from the following command-line :

./run_cutandrun_2.sh samplesheet.csv path/to/bowtie2-index-target path/to/bowtie2-index-spikein

The samplesheet.csv is comma-separated table with 4 fields such as :

SampleName,AssayName,FASTQ_R1,FASTQ_R2
WT1_IgG,cutandrun,path/to/WT1_IgG_1.fq.gz,path/to/WT1_IgG_2.fq.gz
WT1_H3k27ac,cutandrun,/path/to/WT1_H3k27ac_xxx_1.fq.gz,/path/to/WT1_H3k27ac_xxx_2.fq.gz
WT1_H3k27ac,cutandrun,/path/to/WT1_H3k27ac_xyx_1.fq.gz,/path/to/WT1_H3k27ac_xyx_2.fq.gz

Easy building of the sample sheet

Sample sheets can be built using ./build_sample_sheet.sh /path/to/raw_reads_folders

Note

When sequencing required multiple runs to reach the expected sequencing depth, each run much have its own line in the sample_sheet. Reads with the same sample name will be merged at the start of the pipeline.*

Expected output of the pipeline

One folder per sample with the following structure :

|-WT1_H3k27ac
 | |-mapped_spikein
 | | |-WT1_H3k27ac_spikein_flagstat.txt
 | | |-WT1_H3k27ac_spikein_aligned.bam
 | |-deduplicated
 | | |-WT1_H3k27ac_primary_dedup_flagstat.txt
 | | |-WT1_H3k27ac_primary_dedup_metrics.txt
 | | |-WT1_H3k27ac_spikein_dedup_metrics.txt
 | | |-WT1_H3k27ac_spikein_dedup.bam
 | | |-WT1_H3k27ac_primary_dedup.bam
 | | |-WT1_H3k27ac_spikein_dedup_flagstat.txt
 | |-merged_fastq
 | | |-WT1_H3k27ac_merged_R1.fq.gz
 | | |-WT1_H3k27ac_merged_R2.fq.gz
 | |-trimmed
 | | |-WT1_H3k27ac_trimmed_R2.fq.gz
 | | |-fastp_report.json
 | | |-WT1_H3k27ac_trimmed_R1.fq.gz
 | | |-fastp_report.html
 | |-mapped_primary
 | | |-WT1_H3k27ac_primary_flagstat.txt
 | | |-WT1_H3k27ac_primary_aligned.bam

A file called flagstat_summary.tsv collecting metrics about the number of reads at the different steps of the pipeline that can be usefull for downstram spike in normalization.
A timing_report.tsv file with the time spend for each steps of the pipeline

Downstream analysis

Correlation between samples

The script run_plot_cor.sh was designed to generate scatterplots, heatmaps, and PCA plots using deeptools library.

Usage: run_plot_cor.sh <study_name> <bam_dir> <bin_size> [bam_pattern]

Here is an example :

run_plot_cor.sh test_run ../all_WT_bam/ 200  "S1_WT2_D_0.001*.bam"

Warning

The first step of run_plot_cor.sh is the generation of a multiBamSummary matrix that needs all bam to be indexed. Because the last step of the pipeline is the deduplication, output bams are not sorted or indexed. This supplementary step can be done using samtools sort and samtools index or sambamba sort that combine both sorting and indexing.

Bam to bigwig file

The script generate_bamcoverage_command.sh was written to generates list of bamcoverage commands that can be launch to generate bigwig files.

Usage: ./generate_bamcoverage_command.sh <bam_dir> <bam_pattern> <normalize_method: CPM|RPKM> <blacklist.bed> <stranded: stranded|unstranded> <output_script.sh> [output_dir] [--binSize N] [--nproc N]

Parameters:
  bam_dir          Directory containing BAM files.
  bam_pattern      Pattern to match BAM files (e.g., '*.bam').
  normalize_method Normalization method: CPM or RPKM.
  blacklist.bed    Path to blacklist BED file.
  stranded         'stranded' to generate stranded bigWigs (forward & reverse), 'unstranded' for unstranded.
  output_script.sh Output bash script file name to write commands.
  output_dir       (Optional) Directory to save bigWig files. Defaults to current directory.

Optional flags:
  --binSize N      Bin size for bamCoverage (default: 5).
  --nproc N        Number of processors/threads for bamCoverage (default: 2).

Example:
  ./generate_bamcoverage_command.sh /data/bams '*.bam' CPM /refs/blacklist.bed stranded bigwig_commands.sh /results --binSize 10 --nproc 4

Important

Make sure to quote the pattern to match BAM files.

The output of this script is file containing all bamcoverage command lines.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
LICENSE		LICENSE
README.md		README.md
build_sample_sheet.sh		build_sample_sheet.sh
cutandrun.env.yaml		cutandrun.env.yaml
deduplicate_reads.sh		deduplicate_reads.sh
generate_bamcoverage_command.sh		generate_bamcoverage_command.sh
map_reads_primary.sh		map_reads_primary.sh
map_reads_spikein.sh		map_reads_spikein.sh
run_cutandrun.sh		run_cutandrun.sh
run_plot_cor.sh		run_plot_cor.sh
trim_reads.sh		trim_reads.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUT&RUN Data Processing Pipeline

Pipeline Overview

Getting Started

Use the pipeline

Easy building of the sample sheet

Expected output of the pipeline

Downstream analysis

About

Uh oh!

Releases

Packages

Languages

License

pterzian/cutandrun

Folders and files

Latest commit

History

Repository files navigation

CUT&RUN Data Processing Pipeline

Pipeline Overview

Getting Started

Use the pipeline

Easy building of the sample sheet

Expected output of the pipeline

Downstream analysis

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages