Skip to content

Latest commit

 

History

History
203 lines (167 loc) · 9.21 KB

README.md

File metadata and controls

203 lines (167 loc) · 9.21 KB

xcube-stac

Build Status codecov Code style: black License

xcube-stac is a Python package and xcube plugin that adds a data store named stac to xcube. The data store is used to access data from the STAC - SpatioTemporal Asset Catalogs.

Table of contents

  1. Setup
    1. Installing the xcube-stac plugin from the repository
  2. Overview
    1. General structure of a STAC catalog
    2. General functionality of xcube-stac
  3. Introduction to xcube-stac
    1. Overview of Jupyter notebooks
    2. Getting started
  4. Testing
    1. Some notes on the strategy of unit-testing

Setup

Installing the xcube-stac plugin from the repository

Installing xcube-stac directly from the git repository, clone the repository, direct into xcube-stac, and follow the steps below:

conda env create -f environment.yml
conda activate xcube-stac
pip install .

This installs all the dependencies of xcube-stac into a fresh conda environment, then installs xcube-stac into this environment from the repository.

Overview

General structure of a STAC catalog

A SpatioTemporal Asset Catalog (STAC) consists of three main components: catalog, collection, and item. Each item can contain multiple assets, each linked to a data source. Items are associated with a timestamp or temporal range and a bounding box describing the spatial extent of the data.

Items within a collection generally exhibit similarities. For example, a STAC catalog might contain multiple collections corresponding to different space-borne instruments. Each item represents a measurement covering a specific spatial area at a particular timestamp. For a multi-spectral instrument, different bands can be stored as separate assets.

A STAC catalog can comply with the STAC API - Item Search conformance class, enabling server-side searches for items based on specific parameters. If this compliance is not met, only client-side searches are possible, which can be slow for large STAC catalogs.

General functionality of xcube-stac

The xcube-stac plugin reads the data sources from the STAC catalog and opens the data in an analysis ready form following the xcube dataset convetion. By default, a data ID represents one item, which is opened as a dataset, with each asset becoming a data variable within the dataset.

Additionally, a stack mode is available, enabling the stacking of items using the core functionality of xcube. This allows for mosaicking multiple tiles grouped by solar day, and concatenating the datacube along the temporal axis.

Also, odc-stac and stackstac has been considered during the evaluation of python libraries supporting stacking of STAC items. However, both stacking libraries depend on GDAL driver for reading the data with rasterio.open, which prohibit the reading the data from the CDSE S3 endpoint, due to blocking of the rasterio AWS environments. Comparing odc-stac and stackstac, the benchmarking report shows that ocd-stac outperforms stackstac. Furthermore, stackstac shows an issue in making use of the overview levels of COGs files. Still, stackstac shows high popularity in the community and might be supported in the future.

Introduction to xcube-stac

Overview of Jupyter notebooks

The following Jupyter notebooks provide some examples:

  • example/notebooks/cdse_sentinel_2.ipynb: This notebook shows an example how to stack multiple tiles of Sentinel-2 L2A data using the CDSE STAC API. It shows stacking of individual tiles and mosaicking of multiple tiles measured on the same solar day.
  • example/notebooks/earth_search_sentinel2_l2a_stack_mode.ipynb: This notebook shows an example how to stack multiple tiles of Sentinel-2 L2A data from Earth Search by Element 84 STAC API. It shows stacking of individual tiles and mosaicking of multiple tiles measured on the same solar day.
  • example/notebooks/geotiff_nonsearchable_catalog.ipynb: This notebook shows an example how to load a GeoTIFF file from a non-searchable STAC catalog.
  • example/notebooks/geotiff_searchable_catalog.ipynb: This notebook shows an example how to load a GeoTIFF file from a searchable STAC catalog.
  • example/notebooks/netcdf_searchable_catalog.ipynb: This notebook shows an example how to load a NetCDF file from a searchable STAC catalog.
  • example/notebooks/xcube_server_stac_s3.ipynb: This notebook shows an example how to open data sources published by xcube server via the STAC API.

Getting started

The xcube data store framework allows to easily access data in an analysis ready format, following the few lines of code below.

from xcube.core.store import new_data_store

store = new_data_store(
    "stac",
    url="https://earth-search.aws.element84.com/v1"
)
ds = store.open_data(
    "collections/sentinel-2-l2a/items/S2B_32TNT_20200705_0_L2A",
    data_type="dataset"
)

The data ID "collections/sentinel-2-l2a/items/S2B_32TNT_20200705_0_L2A" points to the STAC item's JSON and is specified by the segment of the URL that follows the catalog's URL. The data_type can be set to dataset and mldataset, which returns a xr.Dataset and a xcube multi-resoltuion dataset, respectively. Note that in the above example, if data_type is not assigned, a xarray.Dataset will be returned.

To use the stac-mode, initiate a stac store with the argument stack_mode=True.

from xcube.core.store import new_data_store

store = new_data_store(
    "stac",
    url="https://earth-search.aws.element84.com/v1",
    stack_mode=True
)
ds = store.open_data(
    bbox=[506700, 5883400, 611416, 5984840],
    time_range=["2020-07-15", "2020-08-01"],
    crs="EPSG:32632",
    spatial_res=20,
    asset_names=["red", "green", "blue"],
    apply_scaling=True,
)

In the stacking mode, the data IDs are the collection IDs within the STAC catalog. To get Sentinel-2 L2A data, we assign data_id to "sentinel-2-l2a". The bounding box and time range are assigned to define the temporal and spatial extent of the data cube. The parameter crs and spatial_res are required as well and define the coordinate reference system (CRS) and the spatial resolution respectively. Note, that the bounding box and spatial resolution needs to be given in the respective CRS.

Testing

To run the unit test suite:

pytest

To analyze test coverage:

pytest --cov=xcube_stac

To produce an HTML coverage report:

pytest --cov-report html --cov=xcube_stac

Some notes on the strategy of unit-testing

The unit test suite uses pytest-recording to mock STAC catalogs. During development an actual HTTP request is performed to a STAC catalog and the responses are saved in cassettes/**.yaml files. During testing, only the cassettes/**.yaml files are used without an actual HTTP request. During development, to save the responses to cassettes/**.yaml, run

pytest -v -s --record-mode new_episodes

Note that --record-mode new_episodes overwrites all cassettes. If the user only wants to write cassettes which are not saved already, --record-mode once can be used. pytest-recording supports all records modes given by VCR.py. After recording the cassettes, testing can be then performed as usual.