Investigate alternate chunking strategy for Zarr store #851

ConnectedSystems · 2024-09-16T07:21:11Z

Currently ADRIA runs are stored in a Zarr data store, chunking data on a per scenario basis.

This means there are potentially $n$ files created, where $n$ is equal to the number of scenarios.
This can be a very large number.

Alternatively, we could chunk by time step, which is fairly consistent. It would only require creating $t$ files, where $t$ is the number of time steps.

The downside is that extracting data for a single scenario would require opening/closing $t$ files instead of 1.

The upside is that extracting data for multiple scenarios is more consistent, requiring $t$ files to be opened/closed compared to potentially thousands...

Zapiano · 2024-09-16T07:24:56Z

Another downside of chunking by time step is that each file size can grow indefinitely (unless you don't think that's a problem). Could we chunk by a fixed number of scenarios?

ConnectedSystems · 2024-09-16T07:34:15Z

Yeah, you can chunk by a set (e.g., 1:100, 101:200, etc).

Downside is when you have uneven number of scenarios, or collecting data across files (e.g. you want scenario 1, 150, 220, etc...)

But not sure how often that use case will happen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate alternate chunking strategy for Zarr store #851

Investigate alternate chunking strategy for Zarr store #851

ConnectedSystems commented Sep 16, 2024

Zapiano commented Sep 16, 2024

ConnectedSystems commented Sep 16, 2024

Investigate alternate chunking strategy for Zarr store #851

Investigate alternate chunking strategy for Zarr store #851

Comments

ConnectedSystems commented Sep 16, 2024

Zapiano commented Sep 16, 2024

ConnectedSystems commented Sep 16, 2024