Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large number of iids presenting new challenges #148

Open
jbusecke opened this issue May 7, 2024 · 3 comments
Open

Large number of iids presenting new challenges #148

jbusecke opened this issue May 7, 2024 · 3 comments

Comments

@jbusecke
Copy link
Collaborator

jbusecke commented May 7, 2024

#145 seems to have unblocked a lot of the iids that previously were not available! Big Win in general, but we need to work on some of the parts of our infrastructure.

This has led to two issues:

The speed considerations will become more pertinent as we add more iids with time. In particular the 'parsing' step where we go from the input list (with wildcards, brackets) to a list of single iids will produce more and more requests on each run of the deployment action.
The following steps will presumably get more manageable over time since we are pruning off the iids that are already ingested.

We are currently also handling this fairly inefficiently and are basically querying for the dataset info twice (once in expand_instance_id_list and then in get_recipe_inputs_from_iid_list(which currently takes a list of instance ids).

Going forward we should probably extract something like
{
'instance_id': {'id':..., 'field_a':..., },
'other_instance_id':{'id':..., 'field_a':..., },
...
}

This would make it trivial to prune off existing iids and then passing only the 'id' fields to get_recipe_inputs_from_iid_list

@jbusecke
Copy link
Collaborator Author

jbusecke commented May 7, 2024

#149 does implement the bq batching in the recipe, but still waiting for a proper fix in leap-stc/leap-data-management-utils#33

@jbusecke
Copy link
Collaborator Author

jbusecke commented May 7, 2024

Waiting on #149 merged results to see if we can handle a bunch more iids (even though it might be slow for now).

@jbusecke
Copy link
Collaborator Author

jbusecke commented May 7, 2024

https://github.com/leap-stc/cmip6-leap-feedstock/actions/runs/8990218806 is actually highly successful! 200+ datasets ingested and still going strong!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant