Skip to content

Pyfive VirtualiZarr Parser? #155

@TomNicholas

Description

@TomNicholas

This package looks awesome. These parts of the docs particularly interested me:

The motivation for pyfive development were many, but recent developments prioritised thread-safety, lazy loading, and performance at scale in a cloud environment both standalone, and as a backend for other software such as cf-python, xarray, and h5netcdf.

We have also implemented extra methods (beyond the h5py API) to expose the chunk index directly (as well as via an iterator) and to access chunk info using the zarr indexing scheme rather than the h5py indexing scheme. This is useful for avoiding the need for a priori use of kerchunk to make a zarr index for a file.

To me it seems like the step from those features to writing a VirtualiZarr Parser is very small. We have a virtualizarr.parsers.HDFParser but it would be great to have an alternative/replacement that doesn't depend on the HDF C library. It could even live in this package if you preferred.

Is this idea of interest?

We have also implemented extra methods (beyond the h5py API) to expose the chunk index directly (as well as via an iterator) and to access chunk info using the zarr indexing scheme rather than the h5py indexing scheme. This is useful for avoiding the need for a priori use of kerchunk to make a zarr index for a file.

Also is this documented somewhere? I only found ZarrArrayStub (which seems somewhat similar to a virtualizarr.ManifestArray)

cc @sharkinsspatial @maxrjones

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions