Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal example for downstream inference UDF #27

Closed
kvantricht opened this issue Jan 23, 2024 · 12 comments
Closed

Minimal example for downstream inference UDF #27

kvantricht opened this issue Jan 23, 2024 · 12 comments
Assignees
Labels
enhancement New feature or request

Comments

@kvantricht
Copy link
Collaborator

We need a minimal example showing how external projects can make use of OpenEO-GFMAP functionality for inference purposes:

  • Chosen backend
  • Custom bbox
  • Custom temporal range
  • Requested sensors and preprocessing
  • Custom UDF code that returns a result cube
@kvantricht
Copy link
Collaborator Author

@VictorVerhaert, according to Hans you would already have an inference UDF notebook for grassland watch. Would you be able to share it in a PR so @GriffinBabe can have a look at it?

@VictorVerhaert
Copy link
Collaborator

Yes I'll add it to the examples on the github.
If you want (and it fits in our next sprint) I could also take a look at creating an as minimal as possible example notebook.

@VictorVerhaert
Copy link
Collaborator

My inference notebook does not use GFMap however.
I use a shared .py file containing the preprocessing steps. My extraction pipeline (GFmap) uses this .py file after the fetchers, but my inference pipeline just uses load_collection.

for now I would just suggest putting this example in https://github.com/Open-EO/openeo-community-examples and referencing it here

@VictorVerhaert
Copy link
Collaborator

FYI you can inspect my pipelines here: https://github.com/gisat/grasslandwatch/tree/main/lc_offline

@kvantricht
Copy link
Collaborator Author

My inference notebook does not use GFMap however.

Ah ok interesting. Definitely useful but we should also work on a GFMAP-based inference workflow here.

@VictorVerhaert
Copy link
Collaborator

I assume the functionality of GFMap for inference would mainly be to split up the spatial extent that we want to perform inference on as well as job managing right?

@kvantricht
Copy link
Collaborator Author

GFMAP standardizes band names across backends, lays out typical data flow paths, takes care of loading collections and rescaling them into the most efficient datatype, applies collection-specific standardized processes, etc. That goes much broader than just the job splitting concept.

@VictorVerhaert
Copy link
Collaborator

Yes of course, I meant what would be visible in the example notebook and what to focus on in the explanation.
It might indeed be good to emphasize that using the same pipeline for extraction and inference is crucial for having accurate results due to the optimalisations you mention in the background.

@GriffinBabe
Copy link
Collaborator

@VictorVerhaert one thing about the extraction pipeline:

The S1 bands are scaled in uint16 in the following code block (in the fetching preprocessing) https://github.com/Open-EO/openeo-gfmap/blob/main/src/openeo_gfmap/fetching/s1.py#L132
This is a memory optimization for OpenEO, as the collections are in float32 power vals. Those values are automatically reconverted to decibels in the feature extractor, unless the users disables it with a flag: https://github.com/Open-EO/openeo-gfmap/blob/main/src/openeo_gfmap/features/feature_extractor.py#L110. Now I see here that you perform some operations on compositing, so probably we should do that rescaling after preprocessing and before entering in the FeatureExtractor.

@GriffinBabe
Copy link
Collaborator

@kvantricht @VictorVerhaert

I like the idea of using the common ONNX library. I see online that it is possible to convert any Sklearn, PyTorch and Tensorflow model to that format. Even catboost is directly compatible.

Based on the inference UDF of @VictorVerhaert and the Feature Extractor functionalities already implemented in GFMAP I came with this first idea for a Model Inference base class, that an user can override to implement its own model inference pipeline. Please take a look and tell me what do you think:
https://github.com/Open-EO/openeo-gfmap/blob/a7b0cd7ff05e0de73460776fb148a31d8a0167f4/src/openeo_gfmap/inference/model_inference.py

We could very well also provide a Model Inference default implementation that requires an path to download the ONNX model and the name of the input tensor as unique parameters, and that returns the probability values or directly the max_probability argument.

One thing that needs to be taken care of by the user is the dependency of ONNX within the OpenEO job. On the long term this could be directly included in the default OpenEO UDF environment, but so far we need to specify the .zip file in the udf-dependency-archives parameters at the end of the job creation, which is done mannualy at the moment. Maybe that's something to discuss in the redesign dicussion @VincentVerelst

@VictorVerhaert
Copy link
Collaborator

on this last point: @HansVRP and I had a similar discussion this morning.
I think that in the long run the onnxruntime should be included in the standard udf env, as we are advising different projects to use onnx models.

@GriffinBabe
Copy link
Collaborator

Closed by #88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants