Skip to content

Linking to lower-level data structures (molecule information) #24

@ambrosejcarr

Description

@ambrosejcarr

We have aligned that Matrix-API should be a "canonical intermediate format" for omics data as opposed to a format that can also absorb all potential downstream and upstream representations 1.

There is interest in capturing and linking the underlying information that is used to create the aggregated (observation, feature) matrices. These data types are out of scope for storing in Matrix-API, but it would be valuable to identify the use cases for transformation of these data types into Matrix-API representations.

Use cases, with the underlying molecular information in bold:

  • In scRNA-seq, a raw data matrix describes the number of RNA molecules observed for each gene in each cell2.
  • In scATAC-seq, genomic alignments are counted or analyzed to create "peak", "genomic bin" or "gene activity score" features. The underlying data can be stored in WIG, BigWIG, or BedGraph formats3.
  • In spatial transcriptomics studies, RNA molecules are spatially localized in euclidean space and assigned to cells by a segmentation algorithm. OME/ngff are exploring how to represent these data: Table spec proposal ome/ngff#64 and Nanostring are developing the CosMX assay which will generate this kind of information at large scale.

Footnotes

  1. #11, see comment

  2. Example 10x experiment, Direct download link for per-molecule information.

  3. UCSC Wig format description

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions