Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S1 extraction fixes - PR Re-opened #149

Merged
merged 14 commits into from
Sep 4, 2024
Merged

S1 extraction fixes - PR Re-opened #149

merged 14 commits into from
Sep 4, 2024

Conversation

GriffinBabe
Copy link
Collaborator

In this PR I re-open the changes brought and then reverted of the old s1-extraction-fixes branch.

Changes include (those can be ticked once validated here):

Miscellaneous changes

  • Fixes in the fetching module: now accepting CDSE-STAGING
  • Fixed load_collection parameters wrongly passed in the S1 fetchers
  • Added an hardcoded parameter for the CDSE S1 catalog to only query for VV & VH products. Otherwise, it would fetch for products only having VV or VH, causing a crash in the backscatter computation process.
  • New dataset uploaded to the artifactory for S2 tile job-splitting. As the original s2grid bounds contained S2 tiles unknown to the CDSE catalog. The new s2grid_bounds_v2.geojson file contains an additional column cdse_valid to filter out those tiles.

Job manager changes

  • Added a retry_on_exception wrapper for post-job actions, as internal openeo exception can disrupt the extraction process.
  • Dynamic job scaling: By default disabled, it allows to change the property parralel_jobs depending on the running time of the manager. Useful feature but a bit complex, I'm not using it in-fine for the WorldCereal project so personally I don't mind removing it.
  • Use of pickle serialization to contain the STAC catalogue. Writing the full catalogue on json format at the end of each post-job action is too heavy. Instead we are now just appending new items to the in-memory object, and serializing it in a single binary file after each post-job action. The full catalogue is written whenever all the jobs are finished. Works fast and efficiently but I already got a corrupted catalogue cache as I Ctrl-C'ed the script whenever one thread was writing the binary serialized file. So there is something to improve here.
  • Some various fixes: the thread lock to write on the STAC catalogue, safeguards such as check for missing fields in the openeo job metadata...

@GriffinBabe
Copy link
Collaborator Author

@HansVRP I still need your green fire for the remaining open conversations

@HansVRP
Copy link
Collaborator

HansVRP commented Aug 28, 2024

Thanks @GriffinBabe will dedicate some time this afternoon

Copy link
Collaborator

@HansVRP HansVRP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions and comments.

The code provided is not necessarily very general. In addition I miss practical unit tests for the newly introduced functionalities. e.g. unit test for calculating the s1 area ect

@GriffinBabe
Copy link
Collaborator Author

Some questions and comments.

The code provided is not necessarily very general. In addition I miss practical unit tests for the newly introduced functionalities. e.g. unit test for calculating the s1 area ect

By the way there is a unit test available for the s1 area calculation: https://github.com/Open-EO/openeo-gfmap/blob/s1-extraction-fixes-PR2/tests/test_openeo_gfmap/test_utils.py#L24

There are missing unit-tests for:

  • JobManager, although I really don't know how to test such a complicated feature
  • Generic extractors

)
request_sessions.mount("https://", adapters.HTTPAdapter(max_retries=retries))
return request_sessions


class UncoveredS1Exception(Exception):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this class needed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(basic implementation is a pass?)

Copy link
Collaborator

@VictorVerhaert VictorVerhaert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for EUGW I think these change pose no problem

…1-extraction-fixes-PR2

Conflicts:
	src/openeo_gfmap/manager/job_splitters.py
@HansVRP HansVRP merged commit 0a6e65c into main Sep 4, 2024
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants