Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate alternate chunking strategy for Zarr store #851

Open
ConnectedSystems opened this issue Sep 16, 2024 · 2 comments
Open

Investigate alternate chunking strategy for Zarr store #851

ConnectedSystems opened this issue Sep 16, 2024 · 2 comments

Comments

@ConnectedSystems
Copy link
Collaborator

Currently ADRIA runs are stored in a Zarr data store, chunking data on a per scenario basis.

This means there are potentially $n$ files created, where $n$ is equal to the number of scenarios.
This can be a very large number.

Alternatively, we could chunk by time step, which is fairly consistent. It would only require creating $t$ files, where $t$ is the number of time steps.

The downside is that extracting data for a single scenario would require opening/closing $t$ files instead of 1.

The upside is that extracting data for multiple scenarios is more consistent, requiring $t$ files to be opened/closed compared to potentially thousands...

@Zapiano
Copy link
Collaborator

Zapiano commented Sep 16, 2024

Another downside of chunking by time step is that each file size can grow indefinitely (unless you don't think that's a problem). Could we chunk by a fixed number of scenarios?

@ConnectedSystems
Copy link
Collaborator Author

Yeah, you can chunk by a set (e.g., 1:100, 101:200, etc).

Downside is when you have uneven number of scenarios, or collecting data across files (e.g. you want scenario 1, 150, 220, etc...)

But not sure how often that use case will happen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants