Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make GFMAP Job Manager more robust and persistant #96

Closed
3 tasks done
GriffinBabe opened this issue Apr 17, 2024 · 2 comments · Fixed by #104
Closed
3 tasks done

Make GFMAP Job Manager more robust and persistant #96

GriffinBabe opened this issue Apr 17, 2024 · 2 comments · Fixed by #104

Comments

@GriffinBabe
Copy link
Collaborator

GriffinBabe commented Apr 17, 2024

It can happen that the GFMAPJobManager crashes. Not necessarily due to errors on gfmap side, but also from bad user code in post-job actions.

  • Implement the possibility of re-running jobs that previously failed. Could be a parameter of the Job manager when running.
  • Re-run failed post-job actions. This could be done by setting up the job statuses to an intermediate value "post-processing" before setting it up to finished at the end of the post-job action. This however can enter in conflict with the MultiBackendJobManager behavior.
  • There is also the issue that when running an extraction on the same destination folder, the STAC catalogue is being overwritten instead of being extended Don't overwrite existing STAC collection when doing a new extraction #94

At the moment, persistence is done through the job_tracking.csv file and the base logic in the MultiBackendJobManager
https://github.com/Open-EO/openeo-python-client/blob/master/openeo/extra/job_management.py#L32

@GriffinBabe
Copy link
Collaborator Author

GriffinBabe commented Apr 23, 2024

Whenever a crash happens from the user-code, the GFMAP manager loses it's stac collection progress as it is only written whenever the manager finishes it's jobs.

One temporary way of tackling that would be to simply add a try/except clause as such:

try:
    manager.run_jobs(job_df, create_datacube_optical, tracking_df_path)
except Exception as e:
    _pipeline_log.error("Error during the job execution: %s", e)
finally:
    manager.create_stac(constellation='sentinel2', item_assets={"auxiliary": AUXILIARY})

This should in-theory save only fully initialized STAC items (crashing points should be considered from the output_path_gen, post_job_action, create_job user-functions, all of which are called before adding any item to the collection):

self._root_collection.add_items(job_items)

@VincentVerelst However I was thinking that it would be maybe better to call create_stac function automatically within the manager, so that STAC is automatically handled during a crash. The usage of a job manager could look like this:

manager = GFMAPJobManager(...)
manager.setup_stac(constellation='sentinel2', item_assets={'auxiliary': AUXILIARY})

manager.run_jobs(...)  # Will can _create_stac internally

Tell me what do you think 😄

@VincentVerelst
Copy link
Collaborator

@GriffinBabe, sounds like a good idea! I don't see any benefit in the user having to call create_stac themselves. Also like the idea of having a setup_stac. Maybe we can also make this one optional? i.e. only if the user is interested in changing the STAC metadata, they need to call it, otherwise GFMap will generate a default STAC collection based on which constellation is selected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants