-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
260 lines (178 loc) · 9.71 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# gpkg - Utilities for the OGC 'GeoPackage' Format
<!-- badges: start -->
[![R-CMD-check](https://github.com/brownag/gpkg/actions/workflows/R-CMD-check.yml/badge.svg?branch=main)](https://github.com/brownag/gpkg/actions/workflows/R-CMD-check.yml)
[![gpkg HTML Manual](http://img.shields.io/badge/docs-HTML-informational)](https://humus.rocks/gpkg/)
[![CRAN status](https://www.r-pkg.org/badges/version/gpkg)](https://CRAN.R-project.org/package=gpkg)
[![Codecov test coverage](https://codecov.io/gh/brownag/gpkg/branch/main/graph/badge.svg)](https://app.codecov.io/gh/brownag/gpkg?branch=main)
<!-- badges: end -->
High-level wrapper functions to build [Open Geospatial Consortium (OGC) 'GeoPackage' files](https://www.geopackage.org/). [GDAL](https://gdal.org/) utilities for read and write of spatial data ([vector](https://gdal.org/drv_geopackage.html) and [gridded](https://gdal.org/drv_geopackage_raster.html)) are provided via the {[terra](https://cran.r-project.org/package=terra)} package. Additional 'GeoPackage' and 'SQLite' specific functions manipulate attributes and tabular data via the {[RSQLite](https://cran.r-project.org/package=RSQLite)} package.
<a href="https://raw.githubusercontent.com/brownag/gpkg/main/man/figures/gpkg_sticker_v1.png">
<img src = "https://raw.githubusercontent.com/brownag/gpkg/main/man/figures/gpkg_sticker_v1.png" alt = "gpkg hexsticker" title = "gpkg hexsticker: {gpkg} provides high-level wrapper functions to build GeoPackages containing a variety of different data." width = "35%" height = "35%" hspace="25" vspace="25" align="right"/></a>
## Installation
Install the latest release from CRAN:
```r
install.packages("gpkg")
```
The development package can be installed from GitHub with {remotes}
```r
if (!requireNamespace("remotes"))
install.packages("remotes")
remotes::install_github("brownag/gpkg")
```
## Background
### What is a GeoPackage?
[GeoPackage](https://www.geopackage.org/) is an open, standards-based, platform-independent, portable, self-describing, compact format for transferring geospatial information. The [GeoPackage Encoding Standard](https://www.ogc.org/standard/geopackage/) describes a set of conventions for storing the following within an SQLite database:
* vector features
* tile matrix sets of imagery and raster maps at various scales
* attributes (non-spatial data)
* extensions
## Create a Geopackage
`gpkg_write()` can handle a variety of different input types. Here we start by adding two DEM (GeoTIFF) files.
```{r}
library(gpkg)
library(terra)
dem <- system.file("extdata", "dem.tif", package = "gpkg")
stopifnot(nchar(dem) > 0)
gpkg_tmp <- tempfile(fileext = ".gpkg")
if (file.exists(gpkg_tmp))
file.remove(gpkg_tmp)
# write a gpkg with two DEMs in it
gpkg_write(
dem,
destfile = gpkg_tmp,
RASTER_TABLE = "DEM1",
FIELD_NAME = "Elevation"
)
gpkg_write(
dem,
destfile = gpkg_tmp,
append = TRUE,
RASTER_TABLE = "DEM2",
FIELD_NAME = "Elevation",
NoData = -9999
)
```
## Insert Vector Layers
We can also write vector data to GeoPackage. Here we use `gpkg_write()` to add a bounding box polygon layer derived from extent of `"DEM1"`.
```{r}
# add bounding polygon vector layer via named list
r <- gpkg_tables(geopackage(gpkg_tmp))[['DEM1']]
v <- terra::as.polygons(r, ext = TRUE)
gpkg_write(list(bbox = v), destfile = gpkg_tmp, append = TRUE)
```
## Insert Attribute Table
Similarly, `data.frame`-like objects (non-spatial "attributes") can be written to GeoPackage.
```{r}
z <- data.frame(a = 1:10, b = LETTERS[1:10])
gpkg_write(list(myattr = z), destfile = gpkg_tmp, append = TRUE)
```
## Read a GeoPackage
`geopackage()` is a constructor that can create a simple container for working with geopackages from several types of inputs. Often you will have a _character_ file path to a GeoPackage (.gpkg) file.
```{r}
g <- geopackage(gpkg_tmp, connect = TRUE)
g
class(g)
```
Other times you may have a list of tables and layers you want to be in a GeoPackage that does not exist yet.
```{r}
g2 <- geopackage(list(dem = r, bbox = v))
g2
class(g2)
```
Note that a temporary GeoPackage (`r g2$dsn`) is automatically created when using the `geopackage(<list>)` constructor.
You also may have a _DBIConnection_ to a GeoPackage database already opened that you want to use. In any case (_character_, _list_, _SQLiteConnection_) there is an S3 method to facilitate creating the basic _geopackage_ class provided by {gpkg}. All other methods are designed to be able to work smoothly with _geopackage_ class input.
## Inspect Contents of GeoPackage
We can list the table names in a GeoPackage with `gpkg_list_tables()` and fetch pointers (SpatRaster, SpatVectorProxy, and lazy data.frame) to the data in them with `gpkg_table()`. We can check the status of the internal `geopackage` class `SQLiteConnection` with `gpkg_is_connected()` and disconnect it with `gpkg_disconnect()`.
```{r}
# enumerate tables
gpkg_list_tables(g)
# inspect tables
gpkg_tables(g)
# inspect a specific table
gpkg_table(g, "myattr", collect = TRUE)
```
Note that the `collect = TRUE` forces data be loaded into R memory for vector and attribute data; this is the difference in result object class of _SpatVectorProxy_/_SpatVector_ and _tbl_SQLiteConnection_/_data.frame_ for vector and attribute data, respectively.
`gpkg_collect()` is a helper method to call `gpkg_table(..., collect = TRUE)` for in-memory loading of specific tables.
```{r}
gpkg_collect(g, "DEM1")
```
Note that with grid data returned from `gpkg_collect()` you get a table result with the tile contents in a blob column of a _data.frame_ instead of _SpatRaster_ object.
The inverse function of `gpkg_collect()` is `gpkg_tbl()` which always returns a _tbl_SQLiteConnection_.
```{r}
gpkg_tbl(g, "gpkg_contents")
```
More on how to use this type of result next.
### Lazy Data Access
There are several other methods that can be used for working with tabular data in a GeoPackage in a "lazy" fashion.
#### Method 1: `gpkg_table_pragma()`
`gpkg_table_pragma()` is a low-frills `data.frame` result containing important table information, but not values. The `PRAGMA table_info()` is stored as a nested data.frame `table_info`. This representation has no dependencies beyond {RSQLite} and is efficient for inspection of table structure and attributes, though it is less useful for data analysis.
```{r}
head(gpkg_table_pragma(g))
```
#### Method 2: `gpkg_vect()` and `gpkg_query()`
`gpkg_vect()` is a wrapper around `terra::vect()` you can use to create 'terra' `SpatVector` objects from the tables found in a GeoPackage.
```{r}
gpkg_vect(g, 'bbox')
```
The table of interest need not have a geometry column, but this method does not work on GeoPackage that contain only gridded data, and some layer in the GeoPackage must have some geometry.
```{r}
gpkg_vect(g, 'gpkg_ogr_contents')
```
The _SpatVectorProxy_ is used for "lazy" references to of vector and attribute contents of a GeoPackage; this object for vector data is analogous to the _SpatRaster_ for gridded data. The 'terra' package provides "GDAL plumbing" for filter and query utilities.
`gpkg_query()` by default uses the 'RSQLite' driver, but the richer capabilities of OGR data sources can be harnessed with [SQLite SQL dialect](https://gdal.org/user/sql_sqlite_dialect.html). These additional features can be utilized with the `ogr=TRUE` argument to `gpkg_query()`, or `gpkg_ogr_query()` for short. This assumes that GDAL is built with support for SQLite (and ideally also with support for Spatialite).
For example, we use built-in functions such as `ST_MinX()` to calculate summaries for `"bbox"` table, geometry column `"geom"`. In this case we expect the calculated quantities to match the coordinates/boundaries of the bounding box:
```{r}
res <- gpkg_ogr_query(g, "SELECT
ST_MinX(geom) AS xmin,
ST_MinY(geom) AS ymin,
ST_MaxX(geom) AS xmax,
ST_MaxY(geom) AS ymax
FROM bbox")
as.data.frame(res)
```
#### Method 3: `gpkg_rast()`
Using `gpkg_rast()` you can quickly access references to all tile/gridded datasets in a GeoPackage.
For example:
```{r}
gpkg_rast(g)
```
#### Method 4: `gpkg_table()`
With the `gpkg_table()` method you access a specific table (by name) and get a "lazy" `tibble` object referencing that table.
This is achieved via {dplyr} and the {dbplyr} database connection to the GeoPackage via the {RSQLite} driver. The resulting object's data can be used in more complex analyses by using other {dbplyr}/{tidyverse} functions.
For example, we inspect the contents of the `gpkg_contents` table that contains critical information on the data contained in a GeoPackage.
```{r}
gpkg_table(g, "gpkg_contents")
```
As a more complicated example we access the `gpkg_2d_gridded_tile_ancillary` table, and perform some data processing.
We `dplyr::select()` `mean` and `std_dev` columns from the `dplyr::filter()`ed rows where `tpudt_name == "DEM2"`. Finally we materialize a `tibble` with `dplyr::collect()`:
```{r}
library(dplyr, warn.conflicts = FALSE)
gpkg_table(g, "gpkg_2d_gridded_tile_ancillary") %>%
filter(tpudt_name == "DEM2") %>%
select(mean, std_dev) %>%
collect()
```
### Managing Connections
Several helper methods are available for checking GeoPackage `SQLiteConnection` status, as well as connecting and disconnecting an existing `geopackage` object (`g`).
```{r}
# still connected
gpkg_is_connected(g)
# disconnect geopackage
gpkg_disconnect(g)
# reconnect
gpkg_connect(g)
# disconnect
gpkg_disconnect(g)
```