Skip to content

CRAM Reference Path Issue in Nextflow Environments #584

@zaka-edd

Description

@zaka-edd

Hi,
Thanks for creating modkit. However I am having an issue.

When running modkit pileup (v.0.6.1) with CRAM files in containerized environments (e.g., Nextflow/Singularity pipelines), modkit fails with the error Error! should be at least 1 contig because it cannot resolve the reference genome path embedded in the CRAM header. The --reference flag does not override the header reference path, causing failures even when a valid reference is provided.

What Happens:

  1. CRAM files contain embedded reference paths in their headers (e.g., pointing to original file locations)
  2. When modkit processes these CRAMs through Nextflow, even with --reference flag and REF_PATH/REF_CACHE environment variables set, it still attempts to use the reference path from the CRAM header
  3. If that header reference path doesn't exist in the container/work directory, modkit fails to read contigs Error message: Error! should be at least 1 contig

I was able to resolve the issue by rewriting the reference path in the CRAM header before running modkit (e.g., updating the @sq UR fields to point to a reference path accessible inside the container).

However, this becomes cumbersome when working with many samples in automated pipelines (e.g., Nextflow).

The modkit command I ran is:

modkit pileup \
   sample_ID \
    ./output/ \
    --reference genome_hg38.fa \
    --modified-bases 5mC \
    --phased \
    --combine-strands --cpg \
    --threads 12

This might also be an issue with HTSLIB or samtools, but I was wondering is there a way in modkit to:
Force the use of the reference specified with --reference, rr override/ignore the reference path embedded in the CRAM header?
If not, would this be something that could be supported in future releases?

Thank you in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    troubleshootingworkflow and data preparation questions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions