Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requirements Analysis and Architectural Design - EOX #2202 #77

Open
1 of 2 tasks
Schpidi opened this issue Sep 13, 2024 · 8 comments
Open
1 of 2 tasks

Requirements Analysis and Architectural Design - EOX #2202 #77

Schpidi opened this issue Sep 13, 2024 · 8 comments
Assignees
Labels

Comments

@Schpidi
Copy link

Schpidi commented Sep 13, 2024

@Schpidi Schpidi added this to the Q3 milestone Sep 13, 2024
@j08lue j08lue added the EOX label Sep 30, 2024
@jankovicgd
Copy link
Collaborator

Doing some research and outlining notes here: https://gitlab.eox.at/vs/stacture/-/issues/44

@j08lue
Copy link
Collaborator

j08lue commented Oct 22, 2024

Hey, I tried to understand the questions in your ticket regarding OGC API Coverages behavior.

I have zero experience with WCS / coverages (serving / consuming), so I do not know what kind of behavior would be expected based on other implementations and can only share some opinions here, FWIW.

Default behavior

I see some of your questions are related to what should be the default behavior.

Generally, I prefer not to have “automagic” behavior - i.e. guessing at the most desired outcome in cases where there is no 100% 80% logical behavior. Instead, you require user input.

Which assets to choose

To the question, which assets should be returned, I think it would make a lot of sense that there is no default and users can specify a list. That is btw. also how TiTiler handles it: Either a list of assets or an expression has to be defined by the user.

https://eoapi.develop.eoepca.org/raster/collections/sentinel-2-l2a/items/S2A_MSIL2A_20230917T100031_N0509_R122_T34UCF_20230917T143103/preview

Assets with different resolutions requested together

E.g. when a user requests 10m and 60m bands together, instead of defaulting to either down- or upsampling, fail and tell the user to decide what to do.

@jankovicgd
Copy link
Collaborator

Hey, thanks for the input, unfortunately here the OGCAPI-Coverages works against this. The requirements packages are optional, meaning some output needs to be provided once a url without the properties parameter is missing. Take for example the following two requests:

  1. {{url-local}}/collections/{{collection}}/coverage?bbox={{bbox-poland-eoepca}}&bbox-crs=EPSG:4326&width=512&height=512&f=image/tiff&datetime={{datetime-poland-eoepca}}&properties=B04_10m
  2. {{url-local}}/collections/{{collection}}/coverage?bbox={{bbox-poland-eoepca}}&bbox-crs=EPSG:4326&width=512&height=512&f=image/tiff&datetime={{datetime-poland-eoepca}}

The first request is evident, but what do I provide in the second? We cannot fail because then we are not adhering to the standard. We also cannot provide all assets because there is 36 assets with the data and not the archive role, and we are reaching multiple gigabytes, so we need to define what asset(s) is(are) the default

The ultimate question, not just in EOEPCA, but in general is who needs to configure this and how this affects the system.

  1. If this default behavior configured through the chart, then it is on the operators, but any time a change is needed, there will be downtime.
  2. Or if this behavior should be dynamic we will think of these as entries in a database and a separate management API and even a GUI.

I am not completely familiar with the nuances of the project, and who gets to manage what in a production system and I am trying to avoid weird behavior where

  1. the Dev, Ops and domain knolwedge duties are bounced across organizational boundaries and lead to long and unnecessary communication and waiting time
  2. or where someone needs both GIS/EO domain knowledge and DevOps to state what is the return of this particular request.

I always fall back to a simple GeoServer management strategy.

  1. An Operator with superuser access knows how to spin up, update and monitor GeoServer in order for a:
  2. A distinct Data Manager who has the necessary domain knowledge and a more constrained access to the system to add data and enable services and default behavior and communicates it to the:
  3. User who is someone consuming the OGC apis via clients or QGIS and has the bare minimum access to the resources

@jonas-eberle can you attempt to clarify some of the questions here? If you want the TL;DR without the technical nuances:
Who operates the system?
When you have a stac collection and you want to subset it by a spatiotemporal bounds, you get the items, but which assets when the user provides none?
Who manages the data in the system and exposes the data to the outside?
Is downtime acceptable when there is new visualization or when default coverages are to be configured?

@j08lue
Copy link
Collaborator

j08lue commented Oct 23, 2024

We cannot fail because then we are not adhering to the standard.

Right, 💯

We also cannot provide all assets because there is 36 assets with the data and not the archive role, and we are reaching multiple gigabytes, so we need to define what asset(s) is(are) the default

Why not return them all? The user will hopefully notice the excess and economize.

A distinct Data Manager who has the necessary domain knowledge and a more constrained access to the system to add data and enable services and default behavior

Yes! I think our plan is that the Admin UI should allow "Data Managers" to set (data-related) application configuration. Ideally via STAC, IMO.

Are there reference projects we could look to? Does GeoServer manage default coverage properties somehow? pygeoapi? Rasdaman?

Or some non-OGC many-variable data distribution frameworks like Open Data Cube or XCube?

@jankovicgd
Copy link
Collaborator

Why not return them all? The user will hopefully notice the excess and economize.

If these were small thumbnails I'd agree, but you give 5 users with a lot of assets to do this and the server gets immediately clogged. I've rechecked the standard docs and actually limiting is the way to go here. So we need (someone) to configure the limits. https://docs.ogc.org/DRAFTS/19-087.html search for /per/core/limits. The concept of defaults is synonymous to limits so we can see a bit how to go about this.

We could, in theory, simply agree on a default role or, more thoroughly, propose a coverages extension to STAC where limits are embedded in both the STAC Collection and then limit the collection coverage and in STAC Item and limit the scene coverage. https://github.com/radiantearth/stac-api-spec/blob/release/v1.0.0/stac-spec/best-practices.md#list-of-asset-roles

@jonas-eberle
Copy link
Collaborator

Who manages the data in the system and exposes the data to the outside?
Is downtime acceptable when there is new visualization or when default coverages are to be configured?

It is not an option to specify this as a value in the deployment, e.g. we as platform/services operators will not restart the services when we add a new collection. In addition, users will create their own (private) collections. Thus, a default asset needs to be specified on the STAC collection.

Of course it is hard to specify a default assets for (e.g.,) Sentinel-2. But otherwise I agree that especially for datasets with lots of assets this becomes easily critical to operate.

Is there a way to further limit the requests for specific collections? E.g., no global bounding box with full resolution on Sentinel-2?

@jankovicgd
Copy link
Collaborator

@jonas-eberle yeah, the limit is something that should also be somehow configurable. If we're pushing it to STAC, you are expecting that the data providers are aware of this, but if the data providers don't operate and manage the system they ultimately probably don't care about this detail.

@j08lue
Copy link
Collaborator

j08lue commented Oct 25, 2024

Just for the record - an example of Microsoft PC platform- or service-specific collection metadata:

https://radiantearth.github.io/stac-browser/#/external/planetarycomputer.microsoft.com/api/stac/v1/collections/io-lulc-9-class

Btw, they did not implement a STAC extension for their extra metadata, but added these in stac-fields, which helps STAC Browser nicely render them:

image

When we then want to enable users of the Admin UI to edit these, we would just need to create a little plugin, too: https://eoepca.readthedocs.io/projects/resource-discovery/en/latest/design/resource-admin-ui/plugins/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants