Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crossbackend feature in aggregator: use NetCDF iso GTiff? #145

Open
soxofaan opened this issue May 31, 2024 · 3 comments
Open

crossbackend feature in aggregator: use NetCDF iso GTiff? #145

soxofaan opened this issue May 31, 2024 · 3 comments

Comments

@soxofaan
Copy link
Member

soxofaan commented May 31, 2024

I noticed this while looking into Open-EO/openeo-geopyspark-driver#786 related issue:

the crossbackend feature in aggregator currently uses GTiff for the load_stac bridge:

# TODO: other/better choices for save_result format (e.g. based on backend support)?
"process_id": "save_result",
"arguments": {
"data": {"from_node": node_id},
# TODO: particular format options?
# "format": "NetCDF",
"format": "GTiff",
},
"result": True,

If I remember correctly we picked that at the time of implementation, because it's a safe choice (widely supported) and there were issues with NetCDF support in load_stac in openeo-geopyspark-driver at the time (March 2023).

We might want to revisit the situation
e.g. automatically detect a better option? let user choose in some way?

@jdries
Copy link
Contributor

jdries commented Jun 3, 2024

I'm not really sure if netcdf will be better, especially because writing a single large netcdf is not so easy, whereas geotiff can write multiple files in parallel.
The only other format with some potential for this use case is Zarr, again because of the parallel write possibility.

@soxofaan
Copy link
Member Author

soxofaan commented Jun 3, 2024

A reason to prefer NetCDF is that it is more standardized to handle multidimensional cases (e.g. encode time dimension). With GTiff we do encoding of time dimension in a more ad-hoc way, so that will not scale well if more backend implementations come in play.

But indeed, this is not an urgent matter at this time

@jdries
Copy link
Contributor

jdries commented Jun 3, 2024

STAC + geotiff can fully define a datacube with time dimension in a standardized manner.
In fact, the stac metadata becomes more complicated for netcdf with time dimension. I've also seen other backends write netcdf output in rather unexpected ways that we would probably not support on our side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants