Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Math processes on the top-level / multiple save_results in process #609

Open
m-mohr opened this issue Aug 28, 2024 · 4 comments
Open

Math processes on the top-level / multiple save_results in process #609

m-mohr opened this issue Aug 28, 2024 · 4 comments
Labels

Comments

@m-mohr
Copy link
Member

m-mohr commented Aug 28, 2024

Hi,

I have Python client code:

import openeo
from openeo.processes import power

connection = openeo.connect("https://earthengine.openeo.org").authenticate_basic("group2", "test123")

aoi = {
  "type": "Polygon",
  "coordinates": [
    [[-7.664532, 38.543869], [-7.689248, 38.141037], [-7.159228, 38.151837], [-7.11289, 38.554609], [-7.664532, 38.543869]]
  ]
}

EPSG = 32629

data = connection.load_collection(
  collection_id = "COPERNICUS/S2_SR_HARMONIZED",
  spatial_extent = aoi,
  temporal_extent = ["2019-06-27T00:00:00Z", "2019-07-04T00:00:00Z"],
  bands = ["B1", "B2", "B3", "B4"]
)

# If we get multiple images (should not happen for the given extent), compute the mean
# Divide by 10000 to convert from DN to reflectance values
data = data.mean_time() / 10000

# Assign the indivdual bands to variables
B1 = data.band("B1")
B2 = data.band("B2")
B3 = data.band("B3")
B4 = data.band("B4")

# Density of cyanobacteria
cyanobacteria = 115530 * power((B3 * B4) / B2, 2.38)
save1 = cyanobacteria.save_result(
  format = "GTIFF",
  options = {
    "name": "cyanobacteria",
    "metadata": {
      "bands": [ { "statistics": { "minimum": 0, "maximum": 100 } } ]
    },
    "epsgCode": EPSG
  }
)

Unfortunately, I get the following error once I run result1 = connection.execute(save1):

Preflight process graph validation raised: [ProcessArgumentInvalid] The argument 'base' in process 'power' (namespace: n/a) is invalid: Schema for result 'reducedimension2' not compatible

Looking at the generated JSON, this is not overly surprising anymore:

grafik

Why does the client put power and multiply on the top-level? It works flawlessly with the divide by 10000 in apply.

@m-mohr m-mohr added the bug label Aug 28, 2024
@m-mohr
Copy link
Member Author

m-mohr commented Aug 28, 2024

And if I try to add more save_result nodes:

chlorophyll_a = 4.26 * power(B3 / B1, 3.94)
cyanobacteria.save_result(
  format = "GTIFF",
  options = {
    "name": "chlorophyll_a",
    "metadata": {
      "bands": [ { "statistics": { "minimum": 0, "maximum": 200 } } ]
    },
    "epsgCode": EPSG
  }
)

# Turbidity
turbidity = 8.93 * (B3 / B1) - 6.39
result = cyanobacteria.save_result(
  format = "GTIFF",
  options = {
    "name": "turbidity",
    "metadata": {
      "bands": [ { "statistics": { "minimum": 0, "maximum": 30 } } ]
    },
    "epsgCode": EPSG
  }
)

job = connection.create_job(title = "OSPD Algal Bloom usecase (Python)", process_graph=result)

So that it gets closer to:

workflow

It seems to not pick up the additional nodes.

Seems I'm pushing some boundaries ;-)

@m-mohr m-mohr changed the title Math processes on the top-level Math processes on the top-level / create_job on node / multiple save_results in process Aug 28, 2024
@soxofaan
Copy link
Member

yeah it's a bit hard to explain what is going on here

Why does the client put power and multiply on the top-level? It works flawlessly with the divide by 10000 in apply.

Simply put: the divide is a method call (disguised in syntactic sugar) so it's aware about working on a data cube and translates it to an apply with the division as child process. The power is just a function that is not smart enough to know it should use apply with child process, but instead it does the power on the full data cube, which results in this top level power node.

workaround is to do the power something as follows:

cyanobacteria = 115530 * ((B3 * B4) / B2).apply(lambda x: x.power(2.38))

But I understand this is indeed not obvious.

I'm not sure yet how to improve the situation here, e.g. make processes like power smarter to do the right thing, or throw an error pointing to a better approach.

@m-mohr m-mohr changed the title Math processes on the top-level / create_job on node / multiple save_results in process Math processes on the top-level / multiple save_results in process Aug 31, 2024
@m-mohr
Copy link
Member Author

m-mohr commented Aug 31, 2024

Thanks, yes, this works. Indeed it would be more obvious if the power function somehow would react if one of the inputs is a datacube or so.

I'm not sure how to create the process graph with 4 save_results, but for now I'll just create 3 jobs I guess...

@soxofaan
Copy link
Member

soxofaan commented Sep 4, 2024

I'm not sure how to create the process graph with 4 save_results, but for now I'll just create 3 jobs I guess...

Indeed, that is indeed roadblocked by some old outdated assumptions, but we're looking into improving that:

(note that we're also working on backend-side support for that)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants