-
Notifications
You must be signed in to change notification settings - Fork 25
Description
This package looks awesome. These parts of the docs particularly interested me:
The motivation for pyfive development were many, but recent developments prioritised thread-safety, lazy loading, and performance at scale in a cloud environment both standalone, and as a backend for other software such as cf-python, xarray, and h5netcdf.
We have also implemented extra methods (beyond the h5py API) to expose the chunk index directly (as well as via an iterator) and to access chunk info using the zarr indexing scheme rather than the h5py indexing scheme. This is useful for avoiding the need for a priori use of kerchunk to make a zarr index for a file.
To me it seems like the step from those features to writing a VirtualiZarr Parser is very small. We have a virtualizarr.parsers.HDFParser but it would be great to have an alternative/replacement that doesn't depend on the HDF C library. It could even live in this package if you preferred.
Is this idea of interest?
We have also implemented extra methods (beyond the h5py API) to expose the chunk index directly (as well as via an iterator) and to access chunk info using the zarr indexing scheme rather than the h5py indexing scheme. This is useful for avoiding the need for a priori use of kerchunk to make a zarr index for a file.
Also is this documented somewhere? I only found ZarrArrayStub (which seems somewhat similar to a virtualizarr.ManifestArray)