-
Notifications
You must be signed in to change notification settings - Fork 0
/
tutorial_01.Rmd
548 lines (352 loc) · 25.8 KB
/
tutorial_01.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
---
title: "Creating and Analyzing Multi-Variable Earth Observation Data Cubes in R"
author: "Marius Appel"
date: "*Aug 18, 2020*"
output:
html_document:
theme: flatly
highlight: default
toc: true
toc_float:
collapsed: false
smooth_scroll: false
toc_depth: 2
bibliography: references.bib
link-citations: yes
csl: american-statistical-association.csl
editor_options:
chunk_output_type: console
---
```{r setup, include=FALSE}
figtrim <- function(path) {
img <- magick::image_trim(magick::image_read(path))
#img <- magick::image_extent(img,magick::geometry_size_percent(width = 100, height = 110))
magick::image_write(img, path)
path
}
knitr::opts_chunk$set(echo = TRUE)
#knitr::opts_chunk$set(eval = FALSE)
knitr::opts_chunk$set(out.width = "100%")
knitr::opts_chunk$set(dev = "jpeg")
knitr::opts_chunk$set(fig.process = figtrim)
knitr::opts_chunk$set(fig.width = 10, fig.height = 10)
```
**OpenGeoHub Summer School 2020, Wageningen**
# Outline
- **Part I: Introduction to Earth observation (EO) data cubes and the `gdalcubes` R package**
- Part II: Examples on using data cubes to combine imagery from different satellite-based EO missions
- Part III: Summary, discussion, and practical exercises
All the material of this tutorial is online at [GitHub](https://github.com/appelmar/opengeohub_summerschool2020), including R markdown sources, rendered HTML output, and solutions to the practical exercises.
# Sample Data
Part I will use a collection of 180 Landsat 8 surface reflectance images, covering a small part of the Brazilian Amazon forest. The live demonstration will use the full resolution dataset (62 gigabytes compressed; > 200 gigabytes unzipped; available [here](https://uni-muenster.sciebo.de/s/SmiqWcCCeFSwfuY/download)), whereas we will use a downsampled version of the same dataset with coarse spatial resolution (300 meter pixel size; 740 megabytes compressed; 2 gigabytes unzipped) in the practical part (available [here](https://uni-muenster.sciebo.de/s/e5yUZmYGX0bo4u9/download)).
After downloading whichever version of the dataset, make sure to unzip it. The following R code will download the low resolution version to your current working directory and unzip it.
```{r download, eval = FALSE}
download.file("https://uni-muenster.sciebo.de/s/e5yUZmYGX0bo4u9/download",
destfile = "L8_Amazon.zip", mode="wb")
unzip("L8_Amazon.zip", exdir = "L8_Amazon")
```
---------------------------------------------------------------------------------------------------------------------------
# Part I: Introduction to the gdalcubes R package
Raw satellite imagery is mostly distributed as collection of files, whether on download portals of space agencies, or in cloud computing environments (Amazon Web Services, Google Cloud, ...). If we want to analyze more than a single image, or even images from multiple satellites, we quickly run into the following challenges:
- spectral bands at different spatial resolutions
- spatially overlapping images
- irregular time series for pixels from different tiles (or in overlapping areas)
- different spatial reference systems of images
- different data formats and structures
To do time series analysis, process larger areas, and / or combine datasets from different sensors / satellites, we first must restructure our data, e.g. as a data cube with a single spatial reference system, regular pixel sizes, both in time and in space.
![](cube.png)
Notice that what we call *cube* is actually not really a cube. It has (up to) four dimensions, and the lengths of the dimensions may be different.
Therefore, four dimensional regular raster data cubes also cover simple time series, multiband time series, grayscale images, multispectral images, and time-series of images.
## Existing tools (with a focus on R)
[GDAL](https://gdal.org), the Geospatial Data Abstraction Library is a software library reading and writing all relevant raster (and vector) data formats, and providing functions to warp (reproject, rescale, resample, and crop) multiband raster images. It has a three dimensional (space, bands) raster data model and solves some of the problems (data formats, image warping). However, it does *not* know about the organization of data products, and time. GDAL is written in C / C++ but the `rgdal`package [@rgdal] provides an easy to use interface in R.
There are further R packages to process satellite imagery:
`raster`[@raster]
- well established, stable, reliable
- three-dimensional only, no multispectral AND multitemporal stacks
- modern version `terra` [@terra] recently published on CRAN (see https://cran.r-project.org/package=terra)
`stars` [@stars]
- see tutorial by Edzer Pebesma yesterday
- arbitrary dimensions
- has vector data cubes
- lazy evaluation approach, compute only the pixels you see
## First steps with gdalcubes
gdalcubes is a quite new C++ library and R package that mostly wraps functions of the C++ library. It uses GDAL to read, write, and warp images, but understands date/time and how complex satellite image data products are organized. We will focus on the R package here. To get started, please install the gdalcubes package from CRAN with:
```{r install, eval = FALSE}
install.packages("gdalcubes")
```
We can load the package and make sure that all computations later in this tutorial use up to 8 threads with:
```{r setup_gdalcubes}
library(gdalcubes)
gdalcubes_options(threads=8)
```
Please notice that this tutorial needs package version 0.3.0, which you can check by running:
```{r pkgversion}
packageVersion("gdalcubes")
```
## Image Collections
At first, we need to tell gdalcubes where our satellite images come from and how they are organized.
```{r L8files}
L8.files = list.files("/media/marius/Samsung_T5/eodata/L8_Amazon_full", pattern = ".tif", recursive = TRUE, full.names = TRUE)
head(L8.files, 15)
length(L8.files)
sum(file.size(L8.files)) / 1000^3 # gigabytes
```
We see that every image is represented by a directory, with individual files for spectral bands.
To be able to work with these images as "one object", we must scan all the files once and extract some metadata (date/time, coordinate reference system, spatial extent, etc.). Therefore, we compile the images to an _image collection_ with the following command:
```{r L8col}
L8.col = create_image_collection(L8.files, format = "L8_SR", out_file = "L8.db")
# L8.col = image_collection("L8.db")
L8.col
```
The result is essentially an index stored in a single file (SQLite) database, typically consuming around a few kilobytes per image. Later in the tutorial, this index e.g. allows to avoid reading image data from files which do not intersect with our spatiotemporal area of interest.
The `format` argument describes, how all the metadata can be extracted from the images. The gdalcubes package comes with a set of predefined **image collection formats** for particular data products. We can list available formats with:
```{r colformats}
collection_formats()
```
The number of available formats is still rather limited, but continues to grow and is extensible (using `add_collection_format()`). In fact, a collection format is a single JSON (JavaScript Object Notation) file, describing some rules how to extract e.g. date/time, and bands from filenames (examples at https://github.com/gdalcubes/collection_formats). Writing collection formats for your own non-standard datasets in many cases is not too difficult and documented [here](https://gdalcubes.github.io/docs/collection_formats.html).
In our example, we used the predefined format `"L8_SR"` for Landsat 8 surface reflectance data as downloaded from the [USGS portal](https://espa.cr.usgs.gov).
The creation of image collections is typically done only once. We can add images to an existing collection with `add_images()` and we can derive the spatiotemporal extent of the collection with:
```{r L8extent}
extent(L8.col, srs="EPSG:4326")
```
Throughout the package, coordinate reference systems can be specified as strings which are understood by GDAL. This includes the "EPSG:XXXX", WKT, proj4, and other formats (see [here](https://gdal.org/doxygen/classOGRSpatialReference.html#aec3c6a49533fe457ddc763d699ff8796).
## Defining a *Data Cube View*
We can define a target data cube by its geometry, i.e., the spatiotemporal extent, the spatial reference system, the spatial size, and the temporal duration of cells. We call this a *data cube view*, i.e., the geometry of a cube without connecting it to any data.
To create a data cube view, we can use the `cube_view()` function:
```{r L8cubeview}
v.overview.500m = cube_view(srs="EPSG:3857", extent=L8.col, dx=500, dy=500, dt = "P1Y", resampling="average", aggregation="median")
v.overview.500m
v.subarea.100m = cube_view(extent=list(left=-6180000, right=-6080000, bottom=-550000, top=-450000,
t0="2014-01-01", t1="2018-12-31"), dt="P1Y", dx=100, dy=100, srs="EPSG:3857",
aggregation = "median", resampling = "average")
v.subarea.100m
v.subarea.30m.daily = cube_view(view = v.subarea.100m, extent =
list(t0 = "2015-01-01",t1 = "2015-12-31"), dt="P1D", dx=30, dy=30)
v.subarea.30m.daily
```
Notice that the data cube view does not contain any information on bands, because it is independent from particular data products.
## Data Cube Creation
Having defined an *image collection*, and a *data cube view*, a data cube is simply the combination of the two. We can create a data cube with the `raster_cube()` function:
```{r L8datacube}
L8.cube.overview = raster_cube(L8.col, v.overview.500m)
L8.cube.overview
L8.cube.subarea = raster_cube(L8.col, v.subarea.100m)
L8.cube.subarea
L8.cube.subarea.daily = raster_cube(L8.col, v.subarea.30m.daily)
L8.cube.subarea.daily
```
This is very cheap, simply returning *proxy* objects, but not reading any image data. The package delays the computational intensive parts as much as possible (e.g., until users call `plot()`). The returned object knows about the bands of the data product. We can use `select_bands()` to get only the spectral bands we are interested in:
```{r L8rgb}
L8.cube.overview.rgb = select_bands(L8.cube.overview, c("B02", "B03", "B04"))
L8.cube.overview.rgb
```
There are some utility functions on data cubes, including:
```{r datacubefun}
names(L8.cube.overview.rgb)
srs(L8.cube.overview.rgb)
bands(L8.cube.overview.rgb)
dim(L8.cube.overview.rgb)
```
## Plotting Data Cubes
The plot function can be used to visualize data cubes. Calling `plot()` will start reading and processing the data:
For a simple RGB plot, we use the `rgb` argument to specify which bands correspond to the red, green, and blue channels, and specify the black and white points of the channels (to control contrast and brightness) in `zlim`.
```{r L8plot1}
plot(L8.cube.overview.rgb, rgb=3:1, zlim=c(0,1500))
plot(select_bands(L8.cube.subarea, c("B02", "B03", "B04")), rgb=3:1, zlim=c(0,1500))
```
Notice that we can also plot bands individually, creating a two-dimensional plot layout of bands and time. Using `key.pos = 1`, and `col=viridis::viridis`, we plot a legend at the bottom of the plot, and use the viridis color scales (this requires the viridis package).
```{r L8plot2}
plot(L8.cube.overview.rgb, zlim=c(0,1500), key.pos=1, col=viridis::viridis)
```
Plotting an identical data cube twice, with different visualization arguments `zlim`, `col`, and others will not need to reprocess the data cube again. `plot()` internally writes netCDF files to a temporary directory and remembers that a specific cube is already available.
The `plot()` function also considers different types of data cubes. For example, if the number of cells in x and y direction equals one, we get a simple time series plot, as we will see later in this tutorial.
## Masking
The `raster_cube()` function supports an additional mask band, e.g., to leave out cloudy pixels when creating a data cube.
To define a mask, we can use the `image_mask()` function. This function expects the name of the mask band as its first `band` argument. Additionally, we can either pass a vector of values that are masked (all bands set to NA if the specified mask band has one of the provided values) as the `values` argument, or give a range of mask values by passing minimum and maximum values as `min` and `max` arguments. Masks can be inverted by setting `invert = TRUE`. For bit field masks (as for MODIS data), it is possible to extract specific bits (applying a logical AND) of the band values, before comparing them to the values or range of the mask.
```{r L8masking}
L8.clear_mask = image_mask("PIXEL_QA", values=c(322, 386, 834, 898, 1346, 324, 388, 836, 900, 1348), invert = TRUE)
x = raster_cube(L8.col, v.subarea.100m, mask=L8.clear_mask)
x = select_bands(x, c("B02","B03","B04"))
plot(x, rgb=3:1, zlim=c(0,1500))
```
## Animations
The data cube representation makes it straightforward to create animations, by plotting time slices of the cube individually, and use these plots as animation frames:
```{r L8anim1}
animate(select_bands(raster_cube(L8.col, v.subarea.100m, mask = L8.clear_mask), c("B02","B03","B04")), rgb=3:1, zlim=c(0,1300))
```
## Exporting Data Cubes
Sometimes we want to process or visualize data cubes with external software or other packages. We can export data cubes either as single netCDF files, or as a collection of (cloud-optimized) GeoTIFF files, where each time-slice of a cube will be stored as one (multiband) file.
Both, netCDF and GeoTIFF export support *compression*, and *packing* (converting double precision numeric values to smaller integer types by applying an offset and scale) to reduce the file size if needed (see documentation at `?write_ncdf`, and `?write_tif`).
```{r L8export}
write_ncdf(L8.cube.overview.rgb, tempfile(pattern = "cube", tmpdir = getwd(),fileext = ".nc"))
```
Similarly, the package includes an `st_as_stars()` function to convert data cubes to `stars` objects [@stars] and the `write_tif()`
function can be used to process (only three-dimensional) data cubes with the `raster` package [@raster]:
```{r}
x = raster::stack(
write_tif(
select_bands(
raster_cube(L8.col, v.subarea.100m), "B05")))
x
```
The exported GeoTIFF files can also be used for interactive plotting using the `mapview` package.
```{r}
library(mapview)
mapView(x[[1]], maxpixels = 4e+06, layer.name="B05",
map.types=c("Esri.WorldImagery","OpenTopoMap"))
```
## Data Cube Processing
The gdalcubes package comes with some built-in operations to process data cubes. The following operations produce a derived data cube from one or more input data cubes.
| Operator | Description |
|----------|------------------------------------------------------------------------------------|
| `apply_pixel` | Apply arithmetic expressions on band values per pixel. |
| `fill_time` | Fill missing values by simple time-series interpolation. |
| `filter_pixel` | Filter pixels based on logical expressions. |
| `filter_geom` | Filter pixels that do not intersect with a given input geometry |
| `join_bands` | Combine bands of two or more identically shaped input data cubes. |
| `reduce_space` | Apply a reducer function over time slices of a data cube. |
| `reduce_time` | Apply a reducer function over individual pixel time series. |
| `select_bands` | Select a subset of a data cube's bands. |
| `window_time` | Apply a moving window reducer of kernel over individual pixel time series. |
These operations can be chained (e.g., using the pipe operator `%>%` from the `magrittr` package, which passes a left-hand-side R expression as the first argument to the function on the right-hand-side (e.g. `rnorm(100) %>% mean`).
The implementation of these operations in gdalcubes works chunk-wise, i.e., reads only the chunk of the input data cube that is currently needed. This makes sure that only small parts are needed in main memory. The chunk sizes can be controlled as an additional argument to the `raster_cube()` function.
### Arithmetic expressions on pixel band values
The `apply_pixel()` function can be used to apply per-pixel arithmetic expressions on band values of a data cube. Examples include the calculation of vegetation indexes. The function takes a data cube, a string vector of arithmetic expressions, and a vector of result band names as arguments. Below, we derive the normalized difference vegetation index (NDVI) from the red and near infrared (NIR) channel. We can apply multiple expressions at the same time by providing a vector of expressions (and names).
```{r L8ndvi}
library(magrittr)
library(colorspace)
ndvi.col = function(n) {
rev(sequential_hcl(n, "Green-Yellow"))
}
L8.ndvi = raster_cube(L8.col, v.subarea.100m, mask=L8.clear_mask) %>%
select_bands(c("B04","B05")) %>%
apply_pixel("(B05-B04)/(B05+B04)", names = "NDVI", keep_bands=FALSE)
L8.ndvi
plot(L8.ndvi, col=ndvi.col, zlim=c(-0.3,1), key.pos = 1)
```
Creating a chain of data cube operations still returns proxy objects, knowing the size and shape of the output data cube, before calling plot will start computations. In the example, we do not need the original bands after computing the NDVI and set `keep_bands = FALSE` (this is the default).
Similar to `apply_pixel()` we can filter pixels by arithmetic expressions with `filter_pixel()`. Values of all bands for pixels not fulfilling a logical expression will be set to NA.
```{r L8ndvi_filter}
raster_cube(L8.col, v.subarea.100m, mask=L8.clear_mask) %>%
select_bands(c("B04","B05")) %>%
apply_pixel("(B05-B04)/(B05+B04)" , names = "NDVI") %>%
filter_pixel("NDVI < 0.7") %>%
plot(col=ndvi.col, zlim=c(-0.3,1), na.color = "aliceblue", key.pos = 1)
```
### Reduction over time and space
Data cubes can be reduced over the space and time dimensions. The `reduce_time()` function applies one or more reducer functions over pixel time series, producing a single (multiband) result image, whereas `reduce_space()` reduces time slices in the cube to single values (per band), resulting in a single (multiband) time series.
The example below derives median NDVI values over all pixel time series.
```{r L8reducetime1}
raster_cube(L8.col, v.subarea.100m, mask=L8.clear_mask) %>%
select_bands(c("B04","B05")) %>%
apply_pixel("(B05-B04)/(B05+B04)", names = "NDVI", keep_bands=FALSE) %>%
reduce_time("median(NDVI)") %>%
plot(col=ndvi.col, nbreaks=100, zlim=c(-0.3,1), key.pos = 1)
```
Possible reducers include `"min"`, `"mean"`, `"median"`, `"max"`, `"count"` (count non-missing values), `"sum"`, `"var"` (variance), and `"sd"` (standard deviation). Reducer expressions are always given as a string starting with the reducer name followed by the band name in parentheses. Notice that it is not possible to apply more complex arithmetic expressions here. It is however possible to mix reducers and bands:
```{r L8reducetime2}
raster_cube(L8.col, v.subarea.100m, mask=L8.clear_mask) %>%
select_bands(c("B04","B05")) %>%
apply_pixel("(B05-B04)/(B05+B04)", names = "NDVI", keep_bands=TRUE) %>%
reduce_time("median(NDVI)", "mean(NDVI)","max(B05)")
```
Results of `reduce_space()` are plotted as simple time series.
```{r L8space1}
raster_cube(L8.col, v.subarea.100m, mask=L8.clear_mask) %>%
select_bands(c("B04","B05")) %>%
apply_pixel("(B05-B04)/(B05+B04)", names = "NDVI") %>%
reduce_space("median(NDVI)", "sd(NDVI)") %>%
plot()
```
The `"count"` reducer is often very useful to get an initial understanding of an image collection.
```{r L8reducetime_count}
raster_cube(L8.col, cube_view(view=v.overview.500m, dt="P1D"), mask=L8.clear_mask) %>%
select_bands(c("B01")) %>%
reduce_time("count(B01)") %>%
plot(key.pos=1)
```
```{r L8reducespace_count}
raster_cube(L8.col, cube_view(view=v.overview.500m, dt="P1M"), mask=L8.clear_mask) %>%
select_bands("B01") %>%
reduce_space("count(B01)") %>%
plot()
```
We can see that there are almost no observations during the months from October to May, because the download was limited to images with low cloud percentages.
## Time-series methods
There are two more built-in functions that operate on individual pixel time series.
The `fill_time()` function interpolates missing values by preceding or succeeding values (using simple linear or nearest neighbor interpolation, or carrying observations forwards or backwards), The `window_time()` function can either apply a moving window kernel, or apply a reducer function over moving windows.
In the example below, we sum NDVI changes between subsequent time slices in the data cube, and visualize the result using a diverging color scale from the `colorspace` package.
```{r L8timeseries}
col.trend = function(n) {
rev(diverging_hcl(n = n, palette = "Green-Brown"))
}
raster_cube(L8.col, v.subarea.100m, mask=L8.clear_mask) %>%
select_bands(c("B04","B05")) %>%
apply_pixel("(B05-B04)/(B05+B04)", names = "NDVI") %>%
fill_time(method = "locf") %>%
window_time(kernel = c(-1,1), window=c(1,0)) %>%
reduce_time("sum(NDVI)") %>%
plot(zlim=c(-0.4,0.4),col = col.trend, key.pos=1, nbreaks = 10)
```
## Applying user-defined R functions
So far, we have provided expressions and reducers as characters / strings. The reason was that these methods automatically translate to built-in C++ functions. In the current version, the following functions may also receive R functions as argument.
| Operator | Input | Output
|----------|-----------------------------------------|-----------------------------------------|
| `apply_pixel` | Vector of band values for one pixel | Vector of band values of one pixel
| `reduce_time` | Multi-band time series as a matrix | Vector of band values
| `reduce_space` | Three-dimensional array with dimensions bands, x, and y | Vector of band values
| `apply_time` | Multi-band time series as a matrix | Multi-band time series as a matrix
This opens up quite a bunch of things we can do, e.g. using functions from our favorite R packages to process pixel time series.
In the example below, we simply fit a line to individual NDVI pixel time series and return its slope (trend).
```{r L8udf}
raster_cube(L8.col, cube_view(view = v.subarea.100m, dx=180), mask = L8.clear_mask) %>%
select_bands(c("B04","B05")) %>%
apply_pixel("(B05-B04)/(B05+B04)", names = "NDVI") %>%
reduce_time(names=c("ndvi_trend"), FUN=function(x) {
z = data.frame(t=1:ncol(x), ndvi=x["NDVI",])
result = NA
if (sum(!is.na(z$ndvi)) > 3) {
result = coef(lm(ndvi ~ t, z, na.action = na.exclude))[2]
}
return(result)
}) %>%
plot(key.pos=1, col=col.trend, nbreaks=10, zlim=c(-0.2,0.2))
```
There is no limit in what we can do in the provided R function, but we must take care of a few things:
1. The reducer function is executed in a new R process without access to the current workspace. It is not possible to access variables defined outside of the function and packages must be loaded **within** the function.
2. The reducer function **must** always return a vector with the same length (for all time series).
3. It is a good idea to think about `NA` values, i.e. you should check whether the complete time series is `NA`, and that missing values do not produce errors.
Similarly, we could create a maximum NDVI RGB composite image for 2016 with the following function:
```{r L8udf2}
raster_cube(L8.col, cube_view(view = v.subarea.100m, dx=180, dt = "P16D", extent = list(t0 = "2016-01-01", t1="2016-12-31")), mask = L8.clear_mask) %>%
select_bands(c("B02","B03", "B04","B05")) %>%
apply_pixel("(B05-B04)/(B05+B04)", names = "NDVI", keep_bands = TRUE) %>%
reduce_time(names=c("R", "G", "B"), FUN=function(x) {
i = which.max(x["NDVI",])
return(x[c("B04","B03","B02"), i])
}) %>%
plot(rgb = 1:3, zlim=c(0, 1200))
```
## Extraction from Data Cubes
Besides exporting data cubes as netCDF or GeoTIFF files and converting to `stars` objects, package version 0.3.0 adds functions to extract points (`query_points()`), time series (`query_timeseries()`), and summary statistics over polygons (`zonal_statistics()`). These functions are especially useful for integration with vector data, e.g. to prepare training datasets for applying machine learning methods, or to enrich movement trajectories with environmental data.
## Internals
### Data cubes in memory
Internally, data cubes are simple chunked arrays. A full data cube is not kept in memory (unless is has only one chunk). A chunk is simply a dense four-dimensional C++ array of type double. Chunks, which are completely empty, however, do not consume main memory and are propagated among cube operations. For example if one spatiotemporal chunk does not intersect with
any images in a collection, it will be empty and operations will also be skipped. For this reason, working with higher temporal resolution (e.g. daily for Landsat 8) is not necessarily much slower. Exceptions to this include the application of user-defined R functions and the writing of dense files (netCDF / GeoTIFF). However, using compression should be effective in the latter case.
For performance tuning, the size of chunks can be specified as additional argument to the `raster_cube()` function. This affects also parallelization.
### Proxy Objects
The package is mostly wrapping functions and objects in C++. It makes a lot of use of the `Rcpp` package and external pointers, pointing to temporary in-memory objects, managed in C++. Therefore, it is not
possible to use `save()` to save a data cube and reload it in a new R session. Proxy objects internally can be serialized as processing graphs. These are JSON text documents containing all information how a proxy object can be recreated.
```{r asjson}
raster_cube(L8.col, cube_view(view = v.subarea.100m, extent=list(t0="2014-01",t1="2018-12")), mask=L8.clear_mask) %>%
select_bands(c("B04","B05")) %>%
apply_pixel("(B05-B04)/(B05+B04)", names = "NDVI") %>%
fill_time(method = "locf") %>%
window_time(kernel = c(-1,1), window=c(1,0)) %>%
reduce_time("sum(NDVI)") %>%
as_json() %>%
print()
```
### Futher package options
Further global package options are documented in `?gdalcubes_options`.
# References