Skip to content

Add an option to subsample the input #58

@TCLamnidis

Description

@TCLamnidis

Currently, there is no way to randomly subsample an input bam within DamageProfiler. When the input bams are very large, this can increase the runtime considerably, while the damage rates estimates are hardly changing compared to using a subset of the reads.

It would be very useful to have an option where a user could specify a number of reads to use for damage calculation, similar to how this functionality is implemented in mapDamage.

Proposed functionality

A user can specify either a number of reads (e.g. 10 000 000), or a fraction of reads (e.g. 0.5).
If an integer is given, use up to that number of randomly subsampled reads for damage calculation. If fewer than the requested reads are in the bam file, simply use all available reads.
If a float is given, randomly subsample that fraction of reads for the calculation.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions