Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor fixes #119

Merged
merged 10 commits into from
Jun 25, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 18 additions & 20 deletions episodes/05-access-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ represent a very precious data source for any activity that involves monitoring
typically provided in the form of geospatial raster data, with the measurements in each grid cell ("pixel") being
associated to accurate geographic coordinate information.

In this episode we will explore how to access open satellite data using Python. In particular, we will
In this episode we will explore how to access open satellite data using Python. In particular, we will
consider [the Sentinel-2 data collection that is hosted on Amazon Web Services (AWS)](https://registry.opendata.aws/sentinel-2-l2a-cogs).
This dataset consists of multi-band optical images acquired by the two satellite constellations of
This dataset consists of multi-band optical images acquired by the constellation of two satellites from
[the Sentinel-2 mission](https://sentinel.esa.int/web/sentinel/missions/sentinel-2) and it is continuously updated with
new images.

Expand Down Expand Up @@ -96,20 +96,20 @@ functionality of searching its items. For the Earth Search STAC catalog the API
api_url = "https://earth-search.aws.element84.com/v1"
```

You can query a STAC API endpoint from Python using the [`pystac_client` library](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client).
To do so we will first import `Client` from `pystac_client` and use the [method open from the Client object](https://pystac-client.readthedocs.io/en/stable/quickstart.html):
You can query a STAC API endpoint from Python using the [`pystac_client` library](https://pystac-client.readthedocs.io).
To do so we will first import `Client` from `pystac_client` and use the [method `open` from the Client object](https://pystac-client.readthedocs.io/en/stable/quickstart.html):

```python
from pystac_client import Client

client = Client.open(api_url)
```

For this episode we will focus at scenes belonging to the `sentinel-2-l2a` collection.
This dataset is useful for our case and includes Sentinel-2 data products pre-processed at level 2A (bottom-of-atmosphere reflectance).
For this episode we will focus at scenes belonging to the `sentinel-2-l2a` collection.
This dataset is useful for our case and includes Sentinel-2 data products pre-processed at level 2A (bottom-of-atmosphere reflectance).

In order to see which collections are available in the provided `api_url` the
[`get_collections`](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client.Client.get_collections) method can be used on the Client object.
In order to see which collections are available in the provided `api_url` the
[`get_collections`](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client.Client.get_collections) method can be used on the Client object.

```python
collections = client.get_collections()
Expand All @@ -136,7 +136,7 @@ for collection in collections:
As said, we want to focus to the `sentinel-2-l2a` collection. To do so, we set this collection into a variable:

```python
collection_sentinel_2_l2a = "sentinel-2-l2a"
collection_sentinel_2_l2a = "sentinel-2-l2a"
```

The data in this collection is stored in the Cloud Optimized GeoTIFF (COG) format and as JPEG2000 images. In this episode we will focus at COGs, as these offer useful functionalities for our purpose.
Expand All @@ -156,17 +156,17 @@ by a high-resolution raster can directly access the lower-resolution versions of
on the downloading time. More information on the COG format can be found [here](https://www.cogeo.org).
:::

In order to get data for a specific location you can add longitude latitude coordinates (World Geodetic System 1984 EPSG:4326) in your request.
In order to get data for a specific location you can add longitude latitude coordinates (World Geodetic System 1984 EPSG:4326) in your request.
In order to do so we are using the `shapely` library to define a geometrical point.
Below we have included a center point for the island of Rhodes, which is the location of interest for our case study (i.e. Longitude: 27.95 | Latitude 36.20).
Below we have included a point on the island of Rhodes, which is the location of interest for our case study (i.e. Longitude: 27.95 | Latitude 36.20).

```python
from shapely.geometry import Point
point = Point(27.95, 36.20) # Coordinates of a point on Rhodes
```

Note: at this stage, we are only dealing with metadata, so no image is going to be downloaded yet. But even metadata can
be quite bulky if a large number of scenes match our search! For this reason, we limit the search by the intersection of the point (by setting the parameter `intersects`) and assign the collection (by setting the parameter `collections`). More information about the possible parameters to be set can be found in the `pystac_client` documentation for the [Client's search method](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client.Client.search).
be quite bulky if a large number of scenes match our search! For this reason, we limit the search by the intersection of the point (by setting the parameter `intersects`) and assign the collection (by setting the parameter `collections`). More information about the possible parameters to be set can be found in the `pystac_client` documentation for the [Client's `search` method](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client.Client.search).

We now set up our search of satellite images in the following way:

Expand Down Expand Up @@ -287,7 +287,7 @@ print(item.properties)
{'created': '2023-08-27T18:15:43.106Z', 'platform': 'sentinel-2a', 'constellation': 'sentinel-2', 'instruments': ['msi'], 'eo:cloud_cover': 0.955362, 'proj:epsg': 32635, 'mgrs:utm_zone': 35, 'mgrs:latitude_band': 'S', 'mgrs:grid_square': 'NA', 'grid:code': 'MGRS-35SNA', 'view:sun_azimuth': 144.36354987218, 'view:sun_elevation': 59.06665363921, 's2:degraded_msi_data_percentage': 0.0126, 's2:nodata_pixel_percentage': 12.146327, 's2:saturated_defective_pixel_percentage': 0, 's2:dark_features_percentage': 0.249403, 's2:cloud_shadow_percentage': 0.237454, 's2:vegetation_percentage': 6.073786, 's2:not_vegetated_percentage': 18.026696, 's2:water_percentage': 74.259061, 's2:unclassified_percentage': 0.198216, 's2:medium_proba_clouds_percentage': 0.613614, 's2:high_proba_clouds_percentage': 0.341423, 's2:thin_cirrus_percentage': 0.000325, 's2:snow_ice_percentage': 2.3e-05, 's2:product_type': 'S2MSI2A', 's2:processing_baseline': '05.09', 's2:product_uri': 'S2A_MSIL2A_20230827T084601_N0509_R107_T35SNA_20230827T115803.SAFE', 's2:generation_time': '2023-08-27T11:58:03.000000Z', 's2:datatake_id': 'GS2A_20230827T084601_042718_N05.09', 's2:datatake_type': 'INS-NOBS', 's2:datastrip_id': 'S2A_OPER_MSI_L2A_DS_2APS_20230827T115803_S20230827T085947_N05.09', 's2:granule_id': 'S2A_OPER_MSI_L2A_TL_2APS_20230827T115803_A042718_T35SNA_N05.09', 's2:reflectance_conversion_factor': 0.978189079756816, 'datetime': '2023-08-27T09:00:21.327000Z', 's2:sequence': '0', 'earthsearch:s3_path': 's3://sentinel-cogs/sentinel-s2-l2a-cogs/35/S/NA/2023/8/S2A_35SNA_20230827_0_L2A', 'earthsearch:payload_id': 'roda-sentinel2/workflow-sentinel2-to-stac/af0287974aaa3fbb037c6a7632f72742', 'earthsearch:boa_offset_applied': True, 'processing:software': {'sentinel2-to-stac': '0.1.1'}, 'updated': '2023-08-27T18:15:43.106Z'}
```

Now if we want to access one item in the dictionary, for instance the EPSG code of the projected coordinate system, you need to access the item in the dictionary as usual. For instance:
You can access items from the `properties` dictionary as usual in Python. For instance, for the EPSG code of the projected coordinate system:

```python
print(item.properties['proj:epsg'])
Expand Down Expand Up @@ -329,7 +329,7 @@ items = search.item_collection()
items.save_object("rhodes_sentinel-2.json")
```

This creates a file in GeoJSON format, which we will reuse here and in the next episodes. Note that this file contains the metadata of the files that meet out criteria. It does not include the data itself, only their metadata.
This creates a file in GeoJSON format, which we will reuse here and in the next episodes. Note that this file contains the metadata of the files that meet our criteria. It does not include the data itself, only their metadata and links to where the data files can be accessed.

## Access the assets

Expand Down Expand Up @@ -423,11 +423,9 @@ https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/35/S/NA/20

From the thumbnails alone we can already observe some dark spots on the island of Rhodes at the bottom right of the image!

In order to open the high-resolution satellite images and investigate the scenes in more detail, we will be using the `rioxarray` library. Note that this library can both work with local and remote raster data. At this moment we will only take a sneak peek at the [to_raster function](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_array.RasterArray.to_raster) of this library. We will learn more about it in the next episode.
In order to open the high-resolution satellite images and investigate the scenes in more detail, we will be using the [`rioxarray` library](https://corteva.github.io/rioxarray). Note that this library can both work with local and remote raster data. At this moment, we will only take a sneak peek of the functionality of this library. We will learn more about it in the next episode.
fnattino marked this conversation as resolved.
Show resolved Hide resolved

Now let us focus on the near ´red´ band by accessing the item `red` from the assets dictionary and get the Hypertext Reference (also known as URL) attribute using `.href` after the item selection.

Remote raster data can be directly opened via the `rioxarray` library. We will learn more about this library in the next episodes.
Now let us focus on the red band by accessing the item `red` from the assets dictionary and get the Hypertext Reference (also known as URL) attribute using `.href` after the item selection.

For now we are using [rioxarray to open the raster file](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray-open-rasterio).

Expand All @@ -454,7 +452,7 @@ Attributes:
add_offset: 0.0
```

Now we want to save the data to our local machine using the [to_raster](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_array.RasterArray.to_raster) function:
Now we want to save the data to our local machine using the [to_raster](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_array.RasterArray.to_raster) method:

```python
# save whole image to disk
Expand All @@ -464,7 +462,7 @@ red.rio.to_raster("red.tif")
That might take a while, given there are over 10000 x 10000 = a hundred million pixels in the 10-meter NIR band.
But we can take a smaller subset before downloading it. Because the raster is a COG, we can download just what we need!

In order to do that, we are using rioxarray´s [clip_box](https://corteva.github.io/rioxarray/stable/examples/clip_box.html) with which you can set a bounding box defining the area you want.
In order to do that, we are using rioxarray´s [`clip_box`](https://corteva.github.io/rioxarray/stable/examples/clip_box.html) with which you can set a bounding box defining the area you want.

```python
red_subset = red.rio.clip_box(
Expand Down
22 changes: 14 additions & 8 deletions episodes/06-raster-intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,26 +8,30 @@
- How is a raster represented by rioxarray?
- How do I read and plot raster data in Python?
- How can I handle missing data?
:::

Check warning on line 11 in episodes/06-raster-intro.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag

:::objectives
- Describe the fundamental attributes of a raster dataset.
- Explore raster attributes and metadata using Python.
- Read rasters into Python using the `rioxarray` package.
- Visualize single/multi-band raster data.
:::

Check warning on line 18 in episodes/06-raster-intro.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag


:::callout
Morrizzzzz marked this conversation as resolved.
Show resolved Hide resolved
## Raster Data

In the [first episode](01-intro-raster-data.md) of this course we provided an introduction on what Raster datasets are and how these divert from vector data. In this episode we will dive more into raster data and focus on how to work with them. We introduce fundamental principles, python packages, metadata and raster attributes for working with this type of data. In addition, we will explore how Python handles missing and bad data values.

The Python package we will use throughout this episode to handle raster data is [`rioxarray`](https://corteva.github.io/rioxarray/stable/). This package is based on the popular [`rasterio`](https://rasterio.readthedocs.io/en/latest/) (which is build upon the GDAL library) for working with raster data and [`xarray`](https://xarray.pydata.org/en/stable/) for working with multi-dimensional arrays.

`Rioxarray` extends `xarray` by providing top-level functions like the [`open_rasterio`](https://corteva.github.io/rioxarray/html/rioxarray.html#rioxarray-open-rasterio) function to open raster datasets. Furthermore, it adds a set of methods to the main objects of the `xarray` package like the [`Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html) and the [`DataArray`](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html#xarray.DataArray). These methods are made available via the `rio` accessor and become available from `xarray` objects after importing `rioxarray`. Since a lot of the functions, methods and attributes do not orginate from rioxarray, but come from the other packages mentioned and are accessible through the accessor, the documentation is in some cases limited and requires a little puzzling. It is therefore recommended to foremost focus at the notebook´s functionality to use tab and go through the various functionalities. In addition, every function or method offers the opportunity to add a questionmark `?` to see the various options.
`rioxarray` extends `xarray` by providing top-level functions like the [`open_rasterio`](https://corteva.github.io/rioxarray/html/rioxarray.html#rioxarray-open-rasterio) function to open raster datasets. Furthermore, it adds a set of methods to the main objects of the `xarray` package like the [`Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html) and the [`DataArray`](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html#xarray.DataArray). These methods are made available via the `rio` accessor and become available from `xarray` objects after importing `rioxarray`.

:::callout

## Exploring `rioxarray` and getting help

Since a lot of the functions, methods and attributes from `rioxarray` originate from other packages (mostly `rasterio`), the documentation is in some cases limited and requires a little puzzling. It is therefore recommended to foremost focus at the notebook´s functionality to use tab completion and go through the various functionalities. In addition, adding a question mark `?` after every function or method offers the opportunity to see the available options.

For instance if you want to understand the options for rioxarray´s open_rasterio call:
For instance if you want to understand the options for rioxarray´s `open_rasterio` function:

```python
rioxarray.open_rasterio?
Morrizzzzz marked this conversation as resolved.
Show resolved Hide resolved
Expand Down Expand Up @@ -130,9 +134,11 @@
Type: function
```

:::

Check warning on line 137 in episodes/06-raster-intro.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag

In addition to rioxarray, we will use the [`pystac`](https://github.com/stac-utils/pystac) package to load rasters from the search results that were created in [episode 5](05-access-data.md).

:::

Check warning on line 141 in episodes/06-raster-intro.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag

## Getting the data

Expand All @@ -142,7 +148,7 @@
In case you would like to work with raster data for this lesson without downloading data on-the-fly, you can download the raster data using this [link](https://figshare.com/ndownloader/files/36028100). Save the `geospatial-python-raster-dataset.tar.gz` file in your current working directory, and extract the archive file by double-clicking on it or by running the
following command in your terminal `tar -zxvf geospatial-python-raster-dataset.tar.gz`.

If you use choose to download the data you can skip the following part and continue with
If you use choose to download the data you can skip the following part and continue with

**Load a Raster and View Attributes**.

Expand Down Expand Up @@ -227,7 +233,7 @@

## Load a Raster and View Attributes

Now we can load `red` band using the function [`rioxarray.open_rasterio()`](), via the variable we created.
Now we can load the red band using the function [`rioxarray.open_rasterio()`](https://corteva.github.io/rioxarray/html/rioxarray.html#rioxarray-open-rasterio), via the variable we created.

```python
import rioxarray
Expand All @@ -238,12 +244,12 @@

```python
import rioxarray
rhodes_red = rioxarray.open_rasterio("../data/stac_json/rhodes_red.tif")
rhodes_red = rioxarray.open_rasterio("data/sentinel2/red.tif")
```

The first call to `rioxarray.open_rasterio()` opens the file from remote or local storage, and then returns a `xarray.DataArray` object. The object is stored in a variable, i.e. `rhodes_red`. Reading in the data with `xarray` instead of `rioxarray` also returns a `xarray.DataArray`, but the output will not contain the geospatial metadata (such as projection information). You can use numpy functions or built-in Python math operators on a `xarray.DataArray` just like a numpy array. Calling the variable name of the `DataArray` also prints out all of its metadata information.

By calling the variable name we can get a quick look at the shape and attributes of the data.
By printing the variable we can get a quick look at the shape and attributes of the data.
```python
print(rhodes_red)
```
Expand Down Expand Up @@ -368,7 +374,7 @@

More options can be consulted [here](https://docs.xarray.dev/en/v2024.02.0/generated/xarray.plot.imshow.html). You will notice that these parameters are part of the `imshow` method from the plot function. Since plot originates from matplotlib and is so widely used, your python environment helps you to interpret the parameters without having to specify the method. It is a service to help you, but can be confusing when teaching it. We will explain more about this below.

:::

Check warning on line 377 in episodes/06-raster-intro.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag

## View Raster Coordinate Reference System (CRS) in Python
Another information that we're interested in is the CRS, and it can be accessed with `.rio.crs`. We introduced the concept of a CRS in [an earlier
Expand Down Expand Up @@ -438,8 +444,8 @@
::::solution
`crs.axis_info` tells us that the CRS for our raster has two axis and both are in meters.
We could also get this information from the attribute `rhodes_red_80.rio.crs.linear_units`.
::::

Check warning on line 447 in episodes/06-raster-intro.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag
:::

Check warning on line 448 in episodes/06-raster-intro.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag

### Understanding pyproj CRS Summary
Let's break down the pieces of the `pyproj` CRS summary. The string contains all of the individual CRS elements that Python or another GIS might need, separated into distinct sections, and datum.
Expand Down Expand Up @@ -532,7 +538,7 @@
```

You may notice that `rhodes_red_80.quantile` and `numpy.percentile` didn't require an argument specifying the axis or dimension along which to compute the quantile. This is because `axis=None` is the default for most numpy functions, and therefore `dim=None` is the default for most xarray methods. It's always good to check out the docs on a function to see what the default arguments are, particularly when working with multi-dimensional image data. To do so, we can use`help(rhodes_red_80.quantile)` or `?rhodes_red_80.percentile` if you are using jupyter notebook or jupyter lab.
:::

Check warning on line 541 in episodes/06-raster-intro.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag

## Dealing with Missing Data
So far, we have visualized a band of a Sentinel-2 scene and calculated its statistics. However, as you can see on the image it also contains an artificial band to the top left where data is missing. In order to calculate meaningfull statistics, we need to take missing data into account. Raster data often has a "no data value" associated with it and for raster datasets read in by `rioxarray`. This value is referred to as `nodata`. This is a value assigned to pixels where data is missing or no data were collected. There can be different cases that cause missing data, and it's common for other values in a raster to represent different cases. The most common example is missing data at the edges of rasters.
Expand Down Expand Up @@ -684,8 +690,8 @@

![Overview of the true-color image with the correct aspect ratio](fig/E06/rhodes_multiband_80_equal_aspect.png){alt="raster plot with correct aspect ratio"}

::::

Check warning on line 693 in episodes/06-raster-intro.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag
:::

Check warning on line 694 in episodes/06-raster-intro.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

check for the corresponding open tag

:::keypoints
- `rioxarray` and `xarray` are for working with multidimensional arrays like pandas is for working with tabular data.
Expand Down
8 changes: 3 additions & 5 deletions episodes/08-crop-raster-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,9 @@ We also use the cropped fields polygons `fields_cropped.shp`, which was generate

### Data loading

### Data loading

First, we will load the visual image of Sentinel-2 over Rhodes Island, which we downloaded and stored in `data/sentinel2/visual.tif`.
First, we will load the visual image of Sentinel-2 over Rhodes Island, which we downloaded and stored in `data/sentinel2/visual.tif`.

We can open this asset with `rioxarray`, and specify the overview level, since this is a Cloud-Optimized GeoTIFF (COG) file. As explained in episode 6 raster images can be quite big, therefore we decided to resample the data using ´rioxarray's´ overview parameter and set it to `overview_level=1`.
We can open this asset with `rioxarray`, and specify the overview level, since this is a Cloud-Optimized GeoTIFF (COG) file. As explained in episode 6 raster images can be quite big, therefore we decided to resample the data using ´rioxarray's´ overview parameter and set it to `overview_level=1`.

```python
import rioxarray
Expand Down Expand Up @@ -96,7 +94,7 @@ assets.total_bounds
array([27.7121001 , 35.87837949, 28.24591124, 36.45725024])
```

The bounding box is composed of the `[minx, miny, maxx, maxy]` values of the raster. Comparing these values with the raster image, we can identify that the magnitude of the bounding box coordinates does not match the coordinates of the raster image. This is because the two datasets have different coordinate reference systems (CRS). This will cause problems when cropping the raster image, therefore we first need to align the CRS-s of the two datasets
The bounding box is composed of the `[minx, miny, maxx, maxy]` values of the raster. Comparing these values with the raster image, we can identify that the magnitude of the bounding box coordinates does not match the coordinates of the raster image. This is because the two datasets have different coordinate reference systems (CRS). This will cause problems when cropping the raster image, therefore we first need to align the CRS-s of the two datasets

Considering the raster image has larger data volume than the vector data, we will reproject the vector data to the CRS of the raster data. We can use the `to_crs` method:

Expand Down
Loading
Loading