-
Notifications
You must be signed in to change notification settings - Fork 0
/
Session3.Rmd
687 lines (523 loc) · 19.8 KB
/
Session3.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
---
title: "Geospatial Data Carpentry | Session 3: Working with raster data"
author: "Claudiu Forgaci"
date: "2022-11-21"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = TRUE,
message = FALSE,
warning = FALSE
)
library(tidyverse) # Meta package for data science
library(here) # Working with paths
library(raster) # Accessing and manipulating raster data
library(rgdal) # Interface to the GDAL utility
```
# Intro to raster data
<!-- (30 + 20 minutes) -->
In this part of the workshop we will talk about raster data. We will start with an introduction of the fundamental principles, packages, and metadata needed to work with raster data in R. We will discuss some of the most important metadata elements, including CRS and resolution. We continue to work with the `tidyverse` and `here` packages and we will use two additional packages to work with raster data: `raster` and `rgdal`.
The dataset we chose is a set of GeoTIFF files on Delft or subdivisions of it. The files were included in the archive you downloaded before the workshop.
## View Raster File Attributes
The GeoTIFF format contains a set of embedded tags with metadata about the raster data. We can use the function GDALinfo() from the `rgdal` package to get information about our raster data before we read that data into R. It is recommended to do this before importing your data.
```{r attr}
GDALinfo(here("data","tud-dsm.tif"))
```
## Open a Raster in R
Now that we've previewed the metadata for our GeoTIFF, let's import this raster dataset. We are going to work with a Digital Surface Model (DSM) which is in the GeoTIFF format. We can use the `raster()` function to import a raster file.
```{r}
DSM_TUD <- raster(here("data","tud-dsm.tif"))
DSM_TUD
```
Similar to other data structures in R like vectors and data frame columns, descriptive statistics for raster data can be retreived with the `summary()` function.
```{r}
summary(DSM_TUD)
```
But note the warning. Unless you force R to calculate these statistics using every cell, it will take a random sample of 100,000 cells and calculate from them instead. Now, raster objects are not data frames so you cannot count the cells with `nrow()`, but with the `ncell()` function of the `raster` package.
```{r}
summary(DSM_TUD, maxsamp = ncell(DSM_TUD))
```
To visualise raster data in `ggplot2`, we will need to convert it to a data frame. The `raster` package has a `as.data.frame` function for that purpose.
```{r}
DSM_TUD_df <- as.data.frame(DSM_TUD, xy = TRUE)
```
Now when we view the structure of our data, we will see a standard dataframe format.
```{r}
str(DSM_TUD_df)
```
We can use `ggplot()` to plot this data with another `geom_` function called `geom_raster()`. The `coord_quickmap()` gives a quick approximation that preserves straight lines. This approximation is suitable for small areas that are not too close to the poles.
```{r}
ggplot() +
geom_raster(data = DSM_TUD_df , aes(x = x, y = y, fill = tud.dsm)) +
scale_fill_viridis_c() + # remember, this color palette was introduced in the first lesson
coord_quickmap()
```
These map, a so-called Digital Surface Model, shows the elevation of our study site, including buildings and vegetation.
For faster previews, you can use the `plot()` function.
```{r}
plot(DSM_TUD)
```
But what units are these? This information is specified in the CRS, so let's have a closer look at the CRS of our raster.
## View Raster Coordinate Reference System (CRS) in R
With raster objects we use the `crs()` function from the `raster` package.
```{r}
crs(DSM_TUD)
```
We see that the units are in meters in the Proj4 string: `+units=m`.
## Calculate the Min and Max value
It is useful to know the minimum and maximum values of a raster dataset. In the case of a DSM, those values represent the min/max elevation range of our site.
```{r}
minValue(DSM_TUD)
```
```{r}
maxValue(DSM_TUD)
```
If Min and Max values haven't been calculated, you can set them with the `raster::setMinMax()` function.
```{r}
DSM_TUD <- raster::setMinMax(DSM_TUD)
```
## Raster bands
To see how many bands a raster dataset has, use the `raster::nlayers()` function.
```{r}
nlayers(DSM_TUD)
```
This dataset has only 1 band. We will discuss multi-band raster data in a later episode.
## Creating a histogram of raster values
A histogram can be used to inspect the distribution of raster values visually. It can show if there are values above the max or below the min of the expected range. We can create a histogram with the ggplot2 function `geom_histogram()`.
```{r}
ggplot() +
geom_histogram(data = DSM_TUD_df, aes(tud.dsm))
```
Adjust the level of desired detail by setting the number of bins.
```{r}
ggplot() +
geom_histogram(data = DSM_TUD_df, aes(tud.dsm), bins = 40)
```
Looking at the distribution can help us identify bad data values, that is, values that are out of the min/max range.
### Challenge 1 (2 minutes)
Use `GDALinfo()` to determine the following about the `tud-dsm-hill.tif` file:
1. Does this file have the same CRS as `DSM_TUD`?
2. What is resolution of the raster data?
3. How large would a 5x5 pixel area be on the Earth’s surface?
4. Is the file a multi- or single-band raster?
```{r}
GDALinfo(here("data","tud-dsm-hill.tif"))
```
# Plot raster data
<!-- (40 + 30 minutes - 10 minutes about plot formatting - 15 minutes details = 45 minutes) -->
In this part we will plot our raster object using `ggplot2` with customized coloring schemes. In the previous plot, our DSM was colored with a continuous colour range. For clarity and visibility, we may prefer to view the data “symbolized” or colored according to ranges of values. This is comparable to a “classified” map. For that, we need to tell `ggplot` how many groups to break our data into and where those breaks should be. To make these decisions, it is useful to first explore the distribution of the data using a bar plot. To begin with, we will use dplyr’s `mutate()` function combined with `cut()` to split the data into 3 bins.
```{r}
DSM_TUD_df <- DSM_TUD_df %>%
mutate(fct_elevation = cut(tud.dsm, breaks = 3))
ggplot() +
geom_bar(data = DSM_TUD_df, aes(fct_elevation))
```
To see the cutoff values:
```{r}
unique(DSM_TUD_df$fct_elevation)
```
To show count number of pixels in each group:
```{r}
DSM_TUD_df %>%
group_by(fct_elevation) %>%
count()
```
To customize cutoff values:
```{r}
custom_bins <- c(-10, 0, 5, 100)
head(DSM_TUD_df)
DSM_TUD_df <- DSM_TUD_df %>%
mutate(fct_elevation_cb = cut(tud.dsm, breaks = custom_bins))
head(DSM_TUD_df)
unique(DSM_TUD_df$fct_elevation_cb)
```
```{r}
ggplot() +
geom_bar(data = DSM_TUD_df, aes(fct_elevation_cb))
```
```{r}
DSM_TUD_df %>%
group_by(fct_elevation_cb) %>%
count()
```
```{r}
ggplot() +
geom_raster(data = DSM_TUD_df , aes(x = x, y = y, fill = fct_elevation_cb)) +
coord_quickmap()
```
The plot above uses the default colors inside ggplot for raster objects. We can specify our own colors to make the plot look a little nicer. R has a built in set of colors for plotting terrain, which are built in to the `terrain.colors()` function. Since we have three bins, we want to create a 3-color palette:
```{r}
terrain.colors(3)
```
```{r}
ggplot() +
geom_raster(data = DSM_TUD_df , aes(x = x, y = y,
fill = fct_elevation_cb)) +
scale_fill_manual(values = terrain.colors(3)) +
coord_quickmap()
```
```{r}
my_col <- terrain.colors(3)
ggplot() +
geom_raster(data = DSM_TUD_df , aes(x = x, y = y,
fill = fct_elevation_cb)) +
scale_fill_manual(values = my_col, name = "Elevation") +
coord_quickmap()
```
### Challenge 2 (5 minutes)
Create a plot of the TU Delft Digital Surface Model (`DSM_TUD`) that has:
1. Six classified ranges of values (break points) that are evenly divided among the range of pixel values.
2. Axis labels.
3. A plot title.
```{r}
DSM_TUD_df <- DSM_TUD_df %>%
mutate(fct_elevation_6 = cut(tud.dsm, breaks = 6))
unique(DSM_TUD_df$fct_elevation_6)
```
```{r}
my_col <- terrain.colors(6)
ggplot() +
geom_raster(data = DSM_TUD_df, aes(x = x, y = y,
fill = fct_elevation_6)) +
scale_fill_manual(values = my_col, name = "Elevation") +
coord_quickmap() +
xlab("X") +
ylab("Y") +
labs(title = "Elevation Classes of the Digital Surface Model (DSM)")
```
## Layering rasters
We can layer a raster on top of a hillshade raster for the same area, and use a transparency factor to create a 3-dimensional shaded effect. A hillshade is a raster that maps the shadows and texture that you would see from above when viewing terrain. We will add a custom color, making the plot grey.
```{r}
DSM_hill_TUD <- raster(here("data","tud-dsm-hill.tif"))
DSM_hill_TUD
```
```{r}
DSM_hill_TUD_df <- as.data.frame(DSM_hill_TUD, xy = TRUE)
str(DSM_hill_TUD_df)
```
```{r}
ggplot() +
geom_raster(data = DSM_hill_TUD_df,
aes(x = x, y = y, alpha = tud.dsm.hill)) +
scale_alpha(range = c(0.15, 0.65), guide = "none") +
coord_quickmap()
```
We can layer another raster on top of our hillshade by adding another call to the `geom_raster()` function. Let’s overlay DSM_TUD on top of the `DSM_hill_TUD`.
```{r}
ggplot() +
geom_raster(data = DSM_TUD_df ,
aes(x = x, y = y,
fill = tud.dsm)) +
geom_raster(data = DSM_hill_TUD_df,
aes(x = x, y = y,
alpha = tud.dsm.hill)) +
scale_fill_viridis_c() +
scale_alpha(range = c(0.15, 0.65), guide = "none") +
ggtitle("Elevation with hillshade") +
coord_quickmap()
```
### Challenge 3 (8 minutes)
Use the `tud-dtm.tif` and `tud-dtm-hill.tif` files from the `data` directory to create a Digital Terrain Model map of the TU Delft area.
Make sure to:
- include hillshade in the maps,
- label axes,
- include a title for each map,
- experiment with various alpha values and color palettes to represent the data.
```{r}
# import DTM
DTM_TUD <- raster(here("data","tud-dtm.tif"))
DTM_TUD_df <- as.data.frame(DTM_TUD, xy = TRUE)
# DTM Hillshade
DTM_hill_TUD <- raster(here("data","tud-dtm-hill.tif"))
DTM_hill_TUD_df <- as.data.frame(DTM_hill_TUD, xy = TRUE)
ggplot() +
geom_raster(data = DTM_TUD_df ,
aes(x = x, y = y,
fill = tud.dtm,
alpha = 2.0)
) +
geom_raster(data = DTM_hill_TUD_df,
aes(x = x, y = y,
alpha = tud.dtm.hill)
) +
scale_fill_viridis_c() +
guides(fill = guide_colorbar()) +
scale_alpha(range = c(0.4, 0.7), guide = "none") +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank()) +
ggtitle("DTM with Hillshade") +
coord_quickmap()
```
# Reproject raster data
<!-- (40 + 20 minutes - 10 minutes instruction details - 20 minutes challenges = 30 minutes) -->
What happens when maps don't line up? That is usually a sign that layers are in different coordinate reference systems (CRS).
In this episode, we will be working with the digital terrain model.
```{r}
DTM_TUD <- raster(here("data","tud-dtm.tif"))
DTM_hill_TUD <- raster(here("data","tud-dtm-hill-ETRS89.tif"))
```
```{r}
DTM_TUD_df <- as.data.frame(DTM_TUD, xy = TRUE)
DTM_hill_TUD_df <- as.data.frame(DTM_hill_TUD, xy = TRUE)
```
```{r}
ggplot() +
geom_raster(data = DTM_TUD_df ,
aes(x = x, y = y,
fill = tud.dtm)) +
geom_raster(data = DTM_hill_TUD_df,
aes(x = x, y = y,
alpha = tud.dtm.hill.ETRS89)) +
scale_fill_gradientn(name = "Elevation", colors = terrain.colors(10)) +
coord_quickmap()
```
Our results are curious - neither the Digital Terrain Model (`DTM_TUD_df`) nor the DTM Hillshade (`DTM_hill_TUD_df`) plotted. Let’s try to plot the DTM on its own to make sure there are data there.
```{r}
ggplot() +
geom_raster(data = DTM_TUD_df,
aes(x = x, y = y,
fill = tud.dtm)) +
scale_fill_gradientn(name = "Elevation", colors = terrain.colors(10)) +
coord_quickmap()
```
```{r}
ggplot() +
geom_raster(data = DTM_hill_TUD_df,
aes(x = x, y = y,
alpha = tud.dtm.hill.ETRS89)) +
coord_quickmap()
```
If we look at the axes, we can see that the projections of the two rasters are different.
### Challenge 4 (2 minutes)
View the CRS for each of these two datasets. What projection does each use?
```{r}
crs(DTM_TUD)
```
```{r}
crs(DTM_hill_TUD)
```
## Reproject rasters
We can use the `projectRaster()` function to reproject a raster into a new CRS. Keep in mind that reprojection only works when you first have a defined CRS for the raster object that you want to reproject. It cannot be used if no CRS is defined.
```{r}
DTM_hill_EPSG28992_TUD <- projectRaster(DTM_hill_TUD,
crs = crs(DTM_TUD))
```
```{r}
crs(DTM_hill_EPSG28992_TUD)
```
```{r}
crs(DTM_hill_TUD)
```
```{r}
extent(DTM_hill_EPSG28992_TUD)
```
```{r}
extent(DTM_hill_TUD)
```
### Question
Why do you think the two extents differ?
## Dealing with raster resolution
```{r}
res(DTM_hill_EPSG28992_TUD)
```
```{r}
res(DTM_TUD)
```
These two resolutions are different, but they’re representing the same data. We can tell R to force our newly reprojected raster to be the same as `DTM_TUD` by using `res(DTM_TUD)`.
```{r}
DTM_hill_EPSG28992_TUD <- projectRaster(DTM_hill_TUD,
crs = crs(DTM_TUD),
res = res(DTM_TUD))
```
```{r}
res(DTM_hill_EPSG28992_TUD)
```
```{r}
res(DTM_TUD)
```
```{r}
DTM_hill_TUD_2_df <- as.data.frame(DTM_hill_EPSG28992_TUD, xy = TRUE)
```
```{r}
ggplot() +
geom_raster(data = DTM_TUD_df ,
aes(x = x, y = y,
fill = tud.dtm)) +
geom_raster(data = DTM_hill_TUD_2_df,
aes(x = x, y = y,
alpha = tud.dtm.hill.ETRS89)) +
scale_fill_gradientn(name = "Elevation", colors = terrain.colors(10)) +
coord_quickmap()
```
# Raster calculations
<!-- (40 + 20 minutes - 15 minutes raster math - 15 minutes challenges = 30 minutes) -->
We often want to perform calculations on two or more rasters to create a new output raster. For example, if we are interested in mapping the heights of trees and buildings across an entire field site, we might want to calculate the difference between the Digital Surface Model (DSM, tops of trees and buildings) and the Digital Terrain Model (DTM, ground level). The resulting dataset is referred to as a Canopy Height Model (CHM) and represents the actual height of trees, buildings, etc. with the influence of ground elevation removed.
## Raster calculations in R
```{r}
GDALinfo(here("data","tud-dtm.tif"))
```
```{r}
GDALinfo(here("data","tud-dsm.tif"))
```
```{r}
ggplot() +
geom_raster(data = DTM_TUD_df ,
aes(x = x, y = y, fill = tud.dtm)) +
scale_fill_gradientn(name = "Elevation", colors = terrain.colors(10)) +
coord_quickmap()
```
```{r}
ggplot() +
geom_raster(data = DSM_TUD_df ,
aes(x = x, y = y, fill = tud.dsm)) +
scale_fill_gradientn(name = "Elevation", colors = terrain.colors(10)) +
coord_quickmap()
```
## Raster math and Canopy Height Models
```{r}
CHM_TUD <- DSM_TUD - DTM_TUD
CHM_TUD_df <- as.data.frame(CHM_TUD, xy = TRUE)
```
```{r}
ggplot() +
geom_raster(data = CHM_TUD_df ,
aes(x = x, y = y, fill = layer)) +
scale_fill_gradientn(name = "Canopy Height", colors = terrain.colors(10)) +
coord_quickmap()
```
```{r}
ggplot(CHM_TUD_df) +
geom_histogram(aes(layer))
```
### Challenge 5 (5 minutes)
It’s often a good idea to explore the range of values in a raster dataset just like we might explore a dataset that we collected in the field.
1. What is the min and maximum value for the Canopy Height Model `CHM_TUD` that we just created?
2. What are two ways you can check this range of data for `CHM_TUD`?
3. What is the distribution of all the pixel values in the CHM?
4. Plot a histogram with 6 bins instead of the default and change the color of the histogram.
5. Plot the CHM_TUD raster using breaks that make sense for the data. Include an appropriate color palette for the data, plot title and no axes ticks / labels.
```{r}
min(CHM_TUD_df$layer, na.rm = TRUE)
max(CHM_TUD_df$layer, na.rm = TRUE)
ggplot(CHM_TUD_df) +
geom_histogram(aes(layer))
ggplot(CHM_TUD_df) +
geom_histogram(aes(layer), colour="black",
fill="darkgreen", bins = 6)
custom_bins <- c(0, 10, 20, 30, 100)
CHM_TUD_df <- CHM_TUD_df %>%
mutate(canopy_discrete = cut(layer, breaks = custom_bins))
ggplot() +
geom_raster(data = CHM_TUD_df , aes(x = x, y = y,
fill = canopy_discrete)) +
scale_fill_manual(values = terrain.colors(4)) +
coord_quickmap()
```
## Export a GeoTIFF
```{r}
writeRaster(CHM_TUD, here("fig_output","CHM_TUD.tiff"),
format="GTiff",
overwrite=TRUE,
NAflag=-9999)
```
# Work with multi-band rasters
<!-- (40 + 20 minutes - 15 minutes details - 15 minutes challenges = 30 minutes) -->
## Getting Started with Multi-Band Data in R
In this episode, the multi-band data that we are working with is high resolution imagery for the Netherlands.
The `raster()` function only reads in the first band, in this case the red band of an RGB raster.
```{r}
RGB_band1_TUD <- raster(here("data","tudlib-rgb.tif"))
```
```{r}
RGB_band1_TUD_df <- as.data.frame(RGB_band1_TUD, xy = TRUE)
```
```{r}
ggplot() +
geom_raster(data = RGB_band1_TUD_df,
aes(x = x, y = y, alpha = tudlib.rgb)) +
coord_quickmap() # use `coord_equal()` instead
```
```{r}
RGB_band1_TUD
```
```{r}
nbands(RGB_band1_TUD)
```
To import the green band:
```{r}
RGB_band2_TUD <- raster(here("data","tudlib-rgb.tif"), band = 2)
```
```{r}
RGB_band2_TUD_df <- as.data.frame(RGB_band2_TUD, xy = TRUE)
```
```{r}
ggplot() +
geom_raster(data = RGB_band2_TUD_df,
aes(x = x, y = y, alpha = tudlib.rgb)) +
coord_equal()
```
## Raster stacks
There is a better way of reading in all bands. The `stack()` function brings in all bands
```{r}
RGB_stack_TUD <- stack(here("data","tudlib-rgb.tif"))
```
```{r}
RGB_stack_TUD
```
```{r}
RGB_stack_TUD@layers
```
```{r}
RGB_stack_TUD[[2]]
```
```{r}
RGB_stack_TUD_df <- as.data.frame(RGB_stack_TUD, xy = TRUE)
```
```{r}
str(RGB_stack_TUD_df)
```
```{r}
ggplot() +
geom_histogram(data = RGB_stack_TUD_df, aes(tudlib.rgb.1))
```
```{r}
ggplot() +
geom_raster(data = RGB_stack_TUD_df,
aes(x = x, y = y, alpha = tudlib.rgb.2)) +
coord_equal()
```
## Create a three-band image
To create an RGB image, we will use the ``plotRGB` function from the `raster` package.
```{r}
plotRGB(RGB_stack_TUD,
r = 1, g = 2, b = 3)
```
The image above looks pretty good. We can explore whether applying a stretch to the image might improve clarity and contrast using `stretch="lin"` or `stretch="hist"`.
```{r}
plotRGB(RGB_stack_TUD,
r = 1, g = 2, b = 3,
scale = 800,
stretch = "lin")
```
```{r}
plotRGB(RGB_stack_TUD,
r = 1, g = 2, b = 3,
scale = 800,
stretch = "hist")
```
## RasterStack vs. RasterBrick
The R RasterStack and RasterBrick object types can both store multiple bands. However, how they store each band is different. The bands in a RasterStack are stored as links to raster data that is located somewhere on our computer. A RasterBrick contains all of the objects stored within the actual R object. In most cases, we can work with a RasterBrick in the same way we might work with a RasterStack. However a RasterBrick is often more efficient and faster to process - which is important when working with larger files.
```{r}
object.size(RGB_stack_TUD)
```
```{r}
RGB_brick_TUD <- brick(RGB_stack_TUD)
object.size(RGB_brick_TUD)
```
```{r}
plotRGB(RGB_brick_TUD)
```