-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding datastore size to iohub info #248
base: main
Are you sure you want to change the base?
Conversation
Is this meant to represent the size on disk (compressed) or size in RAM (decompressed)? |
I find it more use when it's decompressed rather than compressed. We can report both if needed. I think zarr.array does |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the uncompressed size is the most valuable.
The reported size is the expected size, not the true size (e.g. it hasn't been filled yet or there was an error). Naming is tricky---maybe "Expected uncompressed size (GB)", "Est. size in RAM (GB)", or "Est. size (GB)"?
ended up adding |
@@ -262,11 +262,21 @@ def print_info(path: StrOrBytesPath, verbose=False): | |||
print("Zarr hierarchy:") | |||
reader.print_tree() | |||
positions = list(reader.positions()) | |||
total_GB_uncompressed = ( | |||
len(positions) * (positions[0][1][0].nbytes) / 1e9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THis would be confusing if showing 0.00 GB for <10 MB. Maybe try to mimic the behavior of du -h
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zarr also does this: https://zarr.readthedocs.io/en/v2.18.3/_autoapi/zarr.core.Array.html#zarr.core.Array.info
import zarr
z = zarr.zeros(1000000, chunks=100000, dtype='i4')
z.info
Type : zarr.core.Array
Data type : int32
Shape : (1000000,)
Chunk shape : (100000,)
Order : C
Read-only : False
Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store type : zarr.storage.KVStore
No. bytes : 4000000 (3.8M)
No. bytes stored : 320
Storage ratio : 12500.0
Chunks initialized : 0/10
This addresses issue #247 by adding the store size and array size in GB. This is useful and simple metadata.
I wanted to know how much memory to request for caching datasets.