Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added parameters and exception behaviour to pqdm #792

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Sherwin-14
Copy link
Contributor

@Sherwin-14 Sherwin-14 commented Aug 24, 2024

I did all the steps outlined in the issue #581, we can now build upon this.


📚 Documentation preview 📚: https://earthaccess--792.org.readthedocs.build/en/792/

Copy link
Collaborator

@chuckwondo chuckwondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this PR @Sherwin-14!

There are still several places where you have to expose the new fail_fast parameter. So far, you have added it only to "protected" methods, so it is not yet exposed to the user in the "public" function and methods. You need to keep following the call chains up to the top (public) API so that users can use the parameter, if they want to choose False instead of the default True.

For example, you added fail_fast to Store._open_files, but you have not added it to all places that call Store._open_files. You did add it to Store._open_urls, which calls _open_files, but you did not yet add it to Store._open_granules, which also calls _open_files.

Further, both _open_files and _open_granules are dispatched when Store._open is called, so you must find all places where Store._open is called. In this case, it is called from Store.open, which is a public method, so you must add fail_fast to Store.open so the user can override the default, if they wish.

Further still, Store.open is called by the open top-level function in the api module, so you must add fail_fast there as well.

You must follow all such call chains from all points where pqdm is called, all the way up to the top of the call chains (up to the public functions and methods), adding fail_fast to each function and method along all such call chains.

@chuckwondo
Copy link
Collaborator

@mfisher87, I see lots of Ruff error messages regarding a number of files that @Sherwin-14 has not touched in this PR. Is there something already on main to address these errors, such that Sherwin simply needs to merge main into his own branch to silence them?

@mfisher87
Copy link
Member

Not yet, but that should be an easy fix. I'll take care of it now :) #788

@mfisher87
Copy link
Member

#793

@chuckwondo
Copy link
Collaborator

Not yet, but that should be an easy fix. I'll take care of it now :) #788
#793

Excellent! Thank you for the quick fix.

@Sherwin-14, Matt's fix is now merged into main, so please update your branch with the fix from main, which will eliminate the ruff errors in your build.

@chuckwondo
Copy link
Collaborator

@Sherwin-14, do you have enough information to proceed with the changes I requested?

@Sherwin-14
Copy link
Contributor Author

@Sherwin-14, do you have enough information to proceed with the changes I requested?

Yeah I have the info I needed, I had put this on hold since I was preoccupied with some other things for the last couple of days. I'll resume my work on this asap.

@betolink
Copy link
Member

betolink commented Sep 3, 2024

Would it be possible and in scope of this PR to use **kwargs and pass them to pqdm for both open() and download() with sane defaults? Users could be in control of the failing behavior but also the number of threads to use and the verbosity (don't display the progress bar). I'm not sure about the style, in Xarray they use **kwargs and an equivalent backend_kwargs={} that in our case could be something like pqdm_opts={}

# code in api.py and something similar in store.py
# in this example we leave threads out to ensure backwards compatibility and will use it in the store class if present
def open(granules, threads, provider, pqdm_opts):
    default_pqdm_opts = {
      n_jobs=threads,
      exception_behaviour='immediate'
      colour="green",
      disable=False # True means no output including the progress bar.
    }

    if not pqdm_opts;
        pqdm_opts = default_pqdm_opts
    earthaccess.__store__.open(granules, treads, pqdm_opts)

and actually we should do the same with fsspec so we let users tune their caching behavior when opening the files, aka a "smart_open". What do you all think?

@mfisher87
Copy link
Member

I had put this on hold since I was preoccupied with some other things for the last couple of days. I'll resume my work on this asap.

@Sherwin-14 No rush! We really appreciate any time you can spare, and don't want to ask for more than that :)

@mfisher87
Copy link
Member

@betolink I really like that idea. Basically "Allow customizing all pqdm behaviors" (and separately, "Allow customizing all fsspec behaviors")?

I'm not sure how I feel about including that as part of this PR. I lean towards separate out of respect for @Sherwin-14 's time, but I don't want to speak for you, Sherwin!

@mfisher87
Copy link
Member

mfisher87 commented Sep 3, 2024

We decided during hack day today that we want to allow users to customize pqdm behaviors like so:

def open(something, threads=16, *, pqdm_kwargs={...}, ...):
    pqdm_kwargs_defaults = {"exception_behavior": "immediate", "threads": threads}
    pqdm_kwargs = pqdm_kwargs_defaults.update(pqdm_kwargs)
    ...

We considered the threads argument as fundamental to the function of earthaccess, so we think it should remain a top-level parameter. We'll use threads whether we use pqdm or not in 5 years.

@Sherwin-14
Copy link
Contributor Author

Sherwin-14 commented Sep 5, 2024

We decided during hack day today that we want to allow users to customize pqdm behaviors like so:

def open(something, threads=16, *, pqdm_kwargs={...}, ...):
    pqdm_kwargs_defaults = {"exception_behavior": "immediate", "threads": threads}
    pqdm_kwargs = pqdm_kwargs_defaults.update(pqdm_kwargs)
    ...

We considered the threads argument as fundamental to the function of earthaccess, so we think it should remain a top-level parameter. We'll use threads whether we use pqdm or not in 5 years.

How should we now proceed with the changes we discussed during the hackday?

@chuckwondo
Copy link
Collaborator

We decided during hack day today that we want to allow users to customize pqdm behaviors like so:

def open(something, threads=16, *, pqdm_kwargs={...}, ...):
    pqdm_kwargs_defaults = {"exception_behavior": "immediate", "threads": threads}
    pqdm_kwargs = pqdm_kwargs_defaults.update(pqdm_kwargs)
    ...

We considered the threads argument as fundamental to the function of earthaccess, so we think it should remain a top-level parameter. We'll use threads whether we use pqdm or not in 5 years.

How should we now proceed with the changes we discussed during the hackday?

Do the same thing you did in the PR you already started for this, but instead of fail_fast (which is what you pretty much completed) change it to pqdm_kwargs and adjust your code changes similar to what is shown above, although the code above won't work because pqdm has an n_jobs arg, not a threads arg, so I suggest something like so, instead, within the functions/methods that invoke pqdm:

pqdm_kwargs = {
    "exception_behavior": "immediate",
    "n_jobs": threads,
    **pqdm_kwargs,
}

...

pqdm(<other args>, **pqdm_kwargs)

@mfisher87
Copy link
Member

Great point! Much clearer with the double splat too ;)

@chuckwondo
Copy link
Collaborator

@Sherwin-14, do you need some help with this?

@Sherwin-14
Copy link
Contributor Author

@Sherwin-14, do you need some help with this?

I will try to accomodate the change asap, I was caught up in something else up untill now

@chuckwondo
Copy link
Collaborator

@Sherwin-14, do you need some help with this?

I will try to accomodate the change asap, I was caught up in something else up untill now

No rush. I just wanted to check in to see if you need any help or if something else was requiring you to postpone your continuation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants