Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize job splitting configuration #156

Open
soxofaan opened this issue Sep 19, 2024 · 0 comments
Open

Generalize job splitting configuration #156

soxofaan opened this issue Sep 19, 2024 · 0 comments

Comments

@soxofaan
Copy link
Member

soxofaan commented Sep 19, 2024

(spin-off from #150)

The aggregator currently has a couple of job splitting features, triggered from a job option:

  • "large area" job splitting (called "managed job splitting" at https://docs.openeo.cloud/federation/#managed-job-splitting)
    • triggered by something like job_options={"tile_grid": "utm-20km"}
    • for testing purposes, the "flimsy" mode of partitioned jobs (a job is just "split" in a single subjob) can be triggered too with
      • job_options={"split_strategy": "flimsy"}
  • simple cross-backend splitting based on splitting off load_collection nodes:
    • job_options={"split_strategy": "crossbackend"}

With #150 we'll add a new cross-backend split approach that splits deeper in the graph.

It will get messy to handle this with a simple split_strategy job option. Also note that we even might want to combine splitting methods: e.g. split spatially and cross-backend. As a matter of fact, one can think of several "dimensions" to split a job or graph:

  • spatially (like the current "large area" tile_grid based splitting),
  • cross-backend (due to collection availability or other backend capabilities),
  • temporal (e.g. split longer time series in smaller jobs to optimize resource/credit usage),
  • result asset (if a backend does not support multiple save_result nodes, or not all requested output formats).

Current mockup of the job option to control job splitting, based on what we already have, is something like this (with some fictional tuning parameters):

job_options = {
    "split_strategy": {
        "crossbackend": {
            "method": "deep",
            "primary_backend": "vito",
            "max_depth": 2,
            "split_deny_list": ["aggregate_spatial", "mask"],
        },
        "spatial": {
            "tile_grid": "utm-20km",
            "align": "west"
        }
    }
}
soxofaan added a commit that referenced this issue Sep 19, 2024
…reate_crossbackend_job

and integrate it in test coverage of TestCrossBackendSplitting
soxofaan added a commit that referenced this issue Sep 19, 2024
soxofaan added a commit that referenced this issue Sep 19, 2024
soxofaan added a commit that referenced this issue Sep 20, 2024
…reate_crossbackend_job

and integrate it in test coverage of TestCrossBackendSplitting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant