Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] Allow for opening with known file sizes #315

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

jrbourbeau
Copy link
Collaborator

The latest release of s3fs allows you to specify the file size ahead of time (if known) when opening an S3 file (this already existed for HTTPS files). This allows s3fs to skip some calls to S3 which can be expensive (especially when opening lots of files). Marking as DNM for now as I'm still experimenting with what performance impacts this has in practice.

@github-actions
Copy link

github-actions bot commented Oct 11, 2023

Binder 👈 Launch a binder notebook on this branch for commit c2bb1cd

I will automatically update this comment whenever this PR is modified

Binder 👈 Launch a binder notebook on this branch for commit 0b6bb98

@MattF-NSIDC
Copy link

MattF-NSIDC commented Oct 11, 2023

DNM

Surprising to still be seeing new-to-me acronyms for PR statuses 😆

What do you think of setting PRs not ready for merge to draft status? I'm fine with whatever, and I like the idea of standardized PR labels. Just curious what everyone else prefers. Are there automations that recognize the Do Not Merge label?

@betolink
Copy link
Member

It looks good but I'm not sure where these sizes will be coming from? granules have irregular sizes based on what they contain and if we are relaying only on what CMR has, there will be discrepancies. Will it be an issue with fsspec if we use a wrong size? @jrbourbeau

@jrbourbeau
Copy link
Collaborator Author

Good point. We get the correct sizes when we create fsspec.https or s3fs file objects (they query the data store to get the size). The changes here just make sure that we keep track of that size and, in the case where we switch between https and s3 access and re-open the file, we use the already stored size to make subsequent re-openings faster (no longer need to query the data store for the file size)

@chuckwondo
Copy link
Collaborator

chuckwondo commented Apr 19, 2024

@jrbourbeau, any progress here? I'm marking this PR as Draft for now to help us know which PRs we should be actively reviewing.

@chuckwondo chuckwondo marked this pull request as draft April 19, 2024 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants