-
Notifications
You must be signed in to change notification settings - Fork 1
/
using_scidbst.Rmd
244 lines (183 loc) · 14.6 KB
/
using_scidbst.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
---
title: "Interacting with R and the spatio-temporal extension of SciDB"
output: rmarkdown::html_vignette
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
%\VignetteIndexEntry{Interacting with R and the spatio-temporal extension of SciDB}
---
```{r, include=FALSE}
library(scidb)
library(raster)
library(zoo)
library(xts)
library(knitr)
library(scidbst)
opts_chunk$set(collapse = T, comment = "#>")
opts_chunk$set(cache.extra = list(R.version, sessionInfo(), format(Sys.Date(), '%Y-%m')))
opts_chunk$set(fig.width=10, fig.height=8)
outputFolder = "/assests/"
```
## Introduction
The package `scidbst` was implemented with the intention to make life easier to deal with operations in SciDB based on spatio-temporal imagery and maintaining their respective metadata. SciDB with the `scidb4geo` extension is currently capable to maintain spatial and/or temporal dimensions. The extension was developed with the intention to utilize GDAL as a data exchange tool and SciDB as a workhorse to perform calculation intensive operations on big data sets.
With the release of a package for SciDB (`scidb`) it was possible to perform queries and calculations from R. This package provides the functionalities for the base version of SciDB, so this package will bridge the gap to the missing functionalities to the `scidb4geo` extension, namely the spatial and temporal referenciation.
## Aim and Data
The aim of this vignette is to illustrate the usage of the methods in this package with simple examples. Since this package requires a running instance of SciDB with the 'scidb4geo' extension, the examples stated here are not directly reproduceable. As data for this example we used:
- SRTM data for Africa [SRTM_AFRICA]
- Landsat 7 scene in the Brazilian Rainforest (scene LS230063_2001088, Level 1, band composition: 1,2,3,4,5,6a,6b,7, scaled onto Byte) [LS7_BRAZIL]
- Landsat 7 subset containing just the NDVI band for a scene in Ethopia [L7_SW_ETHOPIA]
- daily Tropical Rainfall measurement data (TRMM) from 1998 to 2015
The data has already been loaded into scidb with its spatial and temporal reference. The import was done using the GDAL with the extension for scidb (scidb4gdal). A detailed tutorial with examples on how to import spatio-temporal images into SciDB was provided by Appel and Pebesma [1] and hence shall not be topic of this vignette.
## The Basics
As for every operation with \code{scidb} we need to establish a connection to the database first. \code{scidbconnect} will do exactly that. For SciDB versions 15.7 and greater the parameter 'protocol' and 'auth_type' are necessary to make good use of the new user authentication. Using \code{scidbst.ls} will result in a data frame object containing all the arrays that have a referenced dimension as well as their type.
```{r loadCredentials, include=FALSE}
source("/home/lahn/GitHub/scidbst/vignettes/assets/credentials.R")
```
```{r connect}
scidbconnect(host=host,port=port,user=user,password=password,protocol="https",auth_type = "digest")
array.list = scidbst.ls()
array.list
```
Next, we will create a array proxy in a similar way as for \code{scidb} objects by using the \code{scidbst} constructor. This will create a scidb object with the specialty that the spatial and temporal extent will be calculated and the references will be set. Calling the object by name returns an overview about the array structure with the dimension indices, whereas \code{extent} will return the spatial bounding box in coordinates according to the stated coordinate reference system (\code{crs}). If the array has a temporal dimension, then \code{t.extent} will return the temporal extent denoted by a list with the minimum and maximum POSIX date.
```{r, cache=1}
l7_ethiopia = scidbst("L7_SW_ETHOPIA")
l7_ethiopia
extent(l7_ethiopia)
crs(l7_ethiopia)
textent(l7_ethiopia)
trs(l7_ethiopia)
```
One typical task in the Geoscience domain is the creation of spatial subsets. In SciDB such subsets are created by using the dimension index values, which is complicated since it is not obvious which index refers to which real world coordinate. In this package we allow spatial subsetting with the \code{crop} method that was introduced by the \code{raster} package. Other subsetting operations that refer to the dimension index notion are \cdoe{subarray} and \code{subset}.
```{r ehtiopiaSubset, cache=1}
subset.extent = extent(35,36.5,6,8.5)
ethiopia.subset = crop(l7_ethiopia,subset.extent)
extent(ethiopia.subset)
```
In order to plot the image we need to further reduce the dimensionality to simply the two spatial dimensions, this can be done by \code{slice}, which sets the subset to a specific dimension value. This is interesting, when you want to extract an image at a certain point in time. In this example we take a slice from dimension 't' (the temporal dimension) and we set the selection value to a certain date, which is internally recalculated as the according dimension value. Instead of the timestamp, we could also have used the dimension index directly (e.g. 0).
```{r ethiopiaSlice, cache=1}
ethiopia.subset = slice(ethiopia.subset,"t","2003-07-21")
```
Now that we have a two dimensional subset, we can either plot the image or download its values. \code{spplot} is used to for the visualization purpose. To access access or download the data, you need to coerce the scidbst object into another object. Most common for this would be to coerce into a \code{SpatialPointsDataFrame}, \code{RasterBrick} or \code{data.frame}, but there are also other formats supported.
```{r ethiopiaPlot, cache=1}
ethiopia.brick = as(ethiopia.subset,"RasterBrick")
spplot(ethiopia.brick)
```
## NDVI calculation
In this use case we will calculate the NDVI of a Landsat 7 scene in the Brazilian rainforest. To make the image smaller and calculations faster, we will first resample the image and store selected bands. Then we will calculate the NDVI for this scene. Afterwards we will again store the calculated NDVI and plot the results.
```{r ndviNames, cache=1}
ls.brazil.name = "LS7_BRAZIL"
regrid.name = "LS7_BRAZIL_REGRID"
ndvi.name = "LS7_BRAZIL_REGRID_NDVI"
```
```{r ndviDeletes, include=FALSE}
if (any(array.list$name == regrid.name)) {
scidbrm(regrid.name,force=TRUE)
}
if (any(array.list$name == ndvi.name)) {
scidbrm(ndvi.name, force=TRUE)
}
```
```{r ndviResolution, cache=1}
ls7_brazil = scidbst(ls.brazil.name)
xres(ls7_brazil)
yres(ls7_brazil)
```
We loaded the Landsat 7 scene that is located in Brazil and checked the resolution with \code{xres} and \code{yres}. The usual resolution for Landsat scenes is 30m x 30m. To make computations faster and to show case the regrid (or resample) functionality we are going the downscale the image to 300m x 300m.
The downscaling is done using \code{regrid}. The function takes the scidbst array, the amount of cells to aggregate for each dimension and the aggregation function. The aggregation function statement we used is SciDB AFL styled syntax. We used is, because it is more reliable. As output we will get an scidbst array that has the same dimensions as the input image, but with a different granularity, as well as only the stated aggregated attributes as attributes.
```{r ndviRegrid, cache=1}
ls7_brazil_regrid = regrid(ls7_brazil,c(10,10,1),"avg(band1),avg(band3),avg(band4),avg(band5),avg(band8)")
xres(ls7_brazil_regrid)
yres(ls7_brazil_regrid)
```
As it can be seen, we now got a scidbst array with a resolution of about 300m x 300m. As it is done in scidb, the results are neither stored nor calculated in SciDB unless the evaluation function is called explicitly. For scidb the function \code{scidbeval} is used and for scidbst arrays the according function is \code{scidbsteval}.
```{r ndviEval, cache=1}
ls7_brazil_regrid = scidbsteval(ls7_brazil_regrid,regrid.name)
```
As the next steps we will calculate the NDVI and the MDVI of the image. The indices are calculated as follows:
$$ NDVI = \frac{NIR-RED}{NIR+RED} $$ and $$ MDVI =\frac{MIR-NIR}{MIR+NIR} $$
*NIR is the near infrared band, MIR the middle infrared band and RED the band for the visible red interval.*
The calculated attributes are attached as new attributes. We will extract those new calculated attributes by using \code{project} and save them as a new scidbst array in SciDB.
```{r ndviCalculation, cache=1}
ls7_calc = transform(ls7_brazil_regrid, ndvi = "(band4_avg - band3_avg) / (band4_avg + band3_avg)", mdvi = "(band8_avg - band3_avg) / (band8_avg + band3_avg)")
ls7_calc = project(ls7_calc,c("ndvi","mdvi"))
ls7_calc = scidbsteval(ls7_calc,ndvi.name)
```
To show the results we will first download the data with \code{as(x,"RasterBrick")} for one point in time and then we will plot each attribute. To select the attributes, we used \code{project} again. In this case the spatial plotting internally queries for the data on the server and the data will not be returned to the outside of the \code{spplot} function, but it will be plotted.
```{r ndviPlot, cache=1}
ndvi = slice(project(ls7_calc,c("ndvi")),"t","2001-088")
mdvi = slice(project(ls7_calc,c("mdvi")),"t","2001-088")
spplot(ndvi,main="NDVI calculation")
spplot(mdvi,main="MDVI calculation")
```
In this use case we highlighted the following functions:
- extracting resolution information with \code{xres} and \code{yres}
- resampling with \code{regrid}
- storing arrays with \code{scidbsteval}
- performing attribute calculations with \code{transform}
- selecting attributes with \code{project}
## Cloud mask
In the next use case we will perform a very simplistic approach for cloud detection to showcase the the filter/subset function. In this case we assume that clouds appear as white objects in an image. This means that in the visual light spectrum they have a whiteish color. We can filter the array for all cells that have values around the maximum data type value. In the Brazilian Landsat scene this will relate to selecting each cell that has a value of 254 or 255 for bands 1 to 3.
```{r cloudMask, cache=1}
ls7_brazil = scidbst("LS7_BRAZIL")
sliced.regridded = regrid(slice(ls7_brazil,"t",0),c(10,10),"avg(band1),avg(band2),avg(band3)")
brazil.brick = as(sliced.regridded,"RasterBrick")
plotRGB(brazil.brick,r=3,g=2,b=1)
transformed = transform(sliced.regridded,cloud = "iif(band1_avg >= 252 and band2_avg >= 252 and band3_avg >= 252, 1 , 0)")
p2 = project(transformed,c("cloud"))
clouds = subset(p2,"cloud = 1")
points = as(clouds,"SpatialPointsDataFrame")
plot(points)
brick = as(clouds,"RasterBrick")
spplot(brick)
```
In this example we resampled the landsat image on the fly and selected just bands 1 to 3. To see the results as a RGB image, we used plotRGB to do so. After that we used \code{transform} to create a new attribute called 'cloud' with values 1 and 0 to mark whether or not the stated inline if-statement is met. To minimize data the attribute 'cloud' is selected as the only attribute for the result array. \code{subset} is used afterwards to selected cell values that have the value '1' (meeting the statement of transform).
To visualize the results we coerce the scidbst object into a \code{SpatialPointsDataFrame} and into a \code{RasterBrick} to plot them afterwards. (Note: those two coercions methods work only with non temporal arrays)
In this use case we highlighted the uses of:
- selection of cell values by filtering \code{subset}
- coercing a scidbst object to different representations \code{as}
## TRMM Precipitation Data
In this use case we are going to highlight the use of the aggregation function\code{aggregate}. For the aggregation use case we will aggregate over space, which will leave us with a timeseries for a certain region. The TRMM data set contains daily precipitation values for the area -180° to 180° longitude and -50° to 50° latitude. We are going to use \code{crop} to make a spatial subset for the Landsat 7 scene in Ethiopia. Afterwards we are going to aggregate over the spatial domain with an averaging function.
As a result we will get a time series, which is coerced into a \code{xts} object and plotted afterwards.
```{r trmmAggregation, cache=1}
trmm = scidbst("TRMM3B42_DAILY")
trmm.prec = project(trmm,"band5")
trmm.prec.crop = crop(trmm.prec,l7_ethiopia)
daily.avg.ethiopia = aggregate(trmm.prec.crop,list("t"),FUN="avg(band5)")
te = textent(as.POSIXct("2010-01-01"),as.POSIXct("2013-01-01"))
tsubset = subarray(daily.avg.ethiopia,limits=te,between=TRUE)
ts = as(tsubset,"xts")
plot(ts,major.ticks="months",minor.ticks=FALSE,main="Average precipitation for Ethiopia Landsat scene from 2010 to 2013")
```
## Joining spatial arrays with different resolution
In this use case we show how to combine the attributes of two spatial arrays. As input arrays we use the Landsat NDVI dataset of Ethiopia and the SRTM dataset of Africa. As a requirement both datasets need to share the same spatial reference system (in this case WGS84 with latitude and longitude coordinates). We use the landsat scene as the main extent and therefore we create a spatial subset of the SRTM dataset using the spatial extent of the Landsat 7 slice.
```{r joinArraysPreperation, cache=1}
joined.name = "join_ls7ndvi_srtm_ethiopia"
ls7_ethiopia = scidbst("L7_SW_ETHOPIA")
srtm = scidbst("SRTM_AFRICA")
ls7.slice = slice(ls7_ethiopia,"t",7)
extent(ls7.slice)
extent(srtm)
srtm.sub = crop(srtm,ls7.slice)
```
```{r joinArraysResolution, cache=1}
res(ls7.slice)
res(srtm.sub)
```
We can see that the spatial resolution of this two arrays differs, which makes it difficult to join cells approriately. Internally the join function will use the array with the lower resolution as target and resamples the higher resolution array with an aggregation function.
```{r removeJoined, include=FALSE}
if (any(array.list$name == joined.name)) {
scidbrm(joined.name,force=TRUE)
}
```
```{r joinExecution, cache=1}
joined.array = join(ls7.slice,srtm.sub,storeTemp=TRUE,name=joined.name)
joined.array.brick = as(joined.array,"RasterBrick")
# joined.array.brick = writeRaster(joined.array.brick,filename="joined_slice.tif",format="GTiff")
```
To reuse arrays that otherwise need to be recalculated on the fly, we set the 'storeTemp' parameter to TRUE, which creates temporary arrays during the join process. Last, the 'name' specifies the array name under which the joined array can be found.
```{r joinVisual, cache=1}
spplot(subset(joined.array.brick,1)/max(values(subset(joined.array.brick,1)),na.rm=TRUE),main="Landsat NDVI (regridded)")
spplot(subset(joined.array.brick,2),main="SRTM heights in m")
```
## References
[1] Appel, M. and Pebesma, E. (2016). "Scalable Earth Observation analytics with R and SciDB". Web blog post. r-spatial.org, 11.05.2016. Online: <a href="http://r-spatial.org/r/2016/05/11/scalable-earth-observation-analytics.html">http://r-spatial.org/r/2016/05/11/scalable-earth-observation-analytics.html</a>