Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding datastore size to iohub info #248

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

edyoshikun
Copy link
Contributor

@edyoshikun edyoshikun commented Sep 26, 2024

This addresses issue #247 by adding the store size and array size in GB. This is useful and simple metadata.

I wanted to know how much memory to request for caching datasets.

@ziw-liu
Copy link
Collaborator

ziw-liu commented Sep 26, 2024

Is this meant to represent the size on disk (compressed) or size in RAM (decompressed)?

@ziw-liu ziw-liu added enhancement New feature or request NGFF OME-NGFF (OME-Zarr format) labels Sep 26, 2024
@edyoshikun
Copy link
Contributor Author

I find it more use when it's decompressed rather than compressed. We can report both if needed. I think zarr.array does nbytes_stored. What do you guys think?

Copy link
Contributor

@talonchandler talonchandler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the uncompressed size is the most valuable.

The reported size is the expected size, not the true size (e.g. it hasn't been filled yet or there was an error). Naming is tricky---maybe "Expected uncompressed size (GB)", "Est. size in RAM (GB)", or "Est. size (GB)"?

iohub/reader.py Outdated Show resolved Hide resolved
@edyoshikun
Copy link
Contributor Author

ended up adding uncompressed size [GB]

@@ -262,11 +262,21 @@ def print_info(path: StrOrBytesPath, verbose=False):
print("Zarr hierarchy:")
reader.print_tree()
positions = list(reader.positions())
total_GB_uncompressed = (
len(positions) * (positions[0][1][0].nbytes) / 1e9
Copy link
Collaborator

@ziw-liu ziw-liu Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THis would be confusing if showing 0.00 GB for <10 MB. Maybe try to mimic the behavior of du -h?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zarr also does this: https://zarr.readthedocs.io/en/v2.18.3/_autoapi/zarr.core.Array.html#zarr.core.Array.info

import zarr
z = zarr.zeros(1000000, chunks=100000, dtype='i4')
z.info
Type               : zarr.core.Array
Data type          : int32
Shape              : (1000000,)
Chunk shape        : (100000,)
Order              : C
Read-only          : False
Compressor         : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store type         : zarr.storage.KVStore
No. bytes          : 4000000 (3.8M)
No. bytes stored   : 320
Storage ratio      : 12500.0
Chunks initialized : 0/10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request NGFF OME-NGFF (OME-Zarr format)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants