Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New function merge_camtrapdp() #112

Merged
merged 147 commits into from
Nov 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
147 commits
Select commit Hold shift + click to select a range
ffa9ee6
new function merge_camtrapdp()
sannegovaert Jul 23, 2024
0b677c5
Create test-merge_camtrapdp.R
sannegovaert Jul 24, 2024
10d9227
document()
sannegovaert Jul 24, 2024
04afa53
add test "merge_camtrapdp() returns no duplicated deploymentID's"
sannegovaert Jul 24, 2024
2df5162
fix typo
sannegovaert Jul 25, 2024
d8f6f0a
import digest
sannegovaert Jul 25, 2024
2ebdcc0
update test
sannegovaert Jul 25, 2024
bcea3dc
give unique deploymentIDs to deployments
sannegovaert Jul 25, 2024
82fa057
set unique mediaID's and observationID's
sannegovaert Jul 25, 2024
429f201
update comment
sannegovaert Jul 25, 2024
8ce8ad2
update documentation
sannegovaert Jul 25, 2024
9297cd9
update examples
sannegovaert Jul 25, 2024
fefac69
correct typo
sannegovaert Jul 25, 2024
e80eb49
delete space in messages
sannegovaert Jul 25, 2024
298b459
test merge_camtrapdp() returns message when ID's are replaced
sannegovaert Jul 25, 2024
61bb20d
update test "merge_camtrapdp() returns unique deplpymentID's, mediaID…
sannegovaert Jul 26, 2024
53eaded
character limit
sannegovaert Jul 26, 2024
03051d9
create helper function `replace_duplicated_deploymentID()`
sannegovaert Jul 26, 2024
af034cf
create helper function `vdigest_crc32()`
sannegovaert Jul 26, 2024
279faaa
rename and update function `replace_duplicated_deploymentID()` to `ge…
sannegovaert Jul 26, 2024
2c6c71f
update comments
sannegovaert Jul 26, 2024
585d14e
new helper function `generate_mediaID()`
sannegovaert Jul 26, 2024
5f007fb
add helper function `generate_observationID()`
sannegovaert Jul 26, 2024
94f3c36
update `generate_observationID()` to `replace_observationID()`
sannegovaert Jul 26, 2024
0924691
update `generate_mediaID()` to `replace_mediaID()`
sannegovaert Jul 29, 2024
40e0a14
update `generate_deploymentID()` to `replace_deploymentID()`
sannegovaert Jul 29, 2024
f6caca2
new helper function `replace_duplicatedIDs()`
sannegovaert Jul 29, 2024
b533e4e
update comment
sannegovaert Jul 29, 2024
63bb091
check for length ID
sannegovaert Jul 29, 2024
a740d25
test for valid name and title
sannegovaert Jul 29, 2024
13442c1
capitilize comments
sannegovaert Jul 30, 2024
adf5fa1
new helper function `normalize_list()`
sannegovaert Jul 30, 2024
3454961
new helper function `is_subset()`
sannegovaert Jul 30, 2024
9d07cf7
add and use helper functions `update_unique()` and `remove_duplicates()`
sannegovaert Jul 30, 2024
ce1cba2
uncomment camtrapdp_error_length_ aborts because it cannot be tested …
sannegovaert Jul 30, 2024
a07f3d0
replace non-ASCII characters
sannegovaert Jul 30, 2024
3dad912
replace stats::setNames
sannegovaert Jul 30, 2024
1690c3c
grammar
sannegovaert Jul 31, 2024
ccbaea6
Merge branch 'main' into merge_datasets
peterdesmet Sep 9, 2024
8a17dc0
Merge branch 'main' into merge_datasets
sannegovaert Sep 26, 2024
693465d
leave name and title empty (don't have the user set those in the func…
sannegovaert Sep 26, 2024
16f5f28
typo
sannegovaert Sep 26, 2024
c534494
remove title and name arguments from tests
sannegovaert Sep 26, 2024
a00e6cd
Do not generate an id. That also solves the problem of having meaning…
sannegovaert Sep 26, 2024
9c72c1b
remove params title and name from documentation
sannegovaert Sep 26, 2024
500262d
documen()
sannegovaert Sep 26, 2024
7e7011b
replace project with projects
sannegovaert Sep 26, 2024
4519945
add helper function check_duplicate_ids()
sannegovaert Sep 27, 2024
7c3f9c6
remove `replace_ ()` helper functions
sannegovaert Sep 27, 2024
3f145ea
add documentation
sannegovaert Sep 27, 2024
5e52a7b
new helper function 'add_suffx()`
sannegovaert Sep 27, 2024
4f1146d
remove helper function `replace_duplicatedIDs()`
sannegovaert Sep 27, 2024
af0876e
use `add_suffx()`
sannegovaert Sep 27, 2024
4f2ea1e
add param suffix
sannegovaert Sep 27, 2024
d198a68
keep NAs in mediaID when adding suffix
sannegovaert Sep 27, 2024
6cf7e0e
do not merge objects in helper function
sannegovaert Sep 27, 2024
8328065
also add suffix to eventIDs and individualDs
sannegovaert Sep 27, 2024
7ac0413
add warnings
sannegovaert Sep 27, 2024
42bdbde
avoid warning message of `any()`
sannegovaert Sep 27, 2024
db1266c
update tests
sannegovaert Sep 27, 2024
57c0bb3
individualIDs are allowed to be duplicated between data packages
sannegovaert Sep 27, 2024
08f668d
replace suffix with prefix
sannegovaert Sep 27, 2024
f7ca592
typo
sannegovaert Sep 27, 2024
ee90618
merge_camtrapdp() adds prefixes to all values of identifiers
sannegovaert Sep 27, 2024
c0119f0
devtools::document()
sannegovaert Sep 27, 2024
af85ae6
test on warning invalid prefix
sannegovaert Sep 27, 2024
4509b47
Update merge_camtrapdp.R
sannegovaert Sep 27, 2024
8f759b0
raise error, not warning
sannegovaert Sep 27, 2024
fecb8a4
merge_camtrapdp() returns error on duplicate Data Package id
sannegovaert Sep 27, 2024
99e91ad
correction: not a warning but error
sannegovaert Sep 27, 2024
fe7b09b
give unique ids to example datasets to merge
sannegovaert Sep 27, 2024
da0cb18
reorder tests
sannegovaert Sep 30, 2024
dac7973
camtrapdp id must be character
sannegovaert Sep 30, 2024
abe30ab
change default prefix
sannegovaert Sep 30, 2024
b5c0992
set default in function
sannegovaert Sep 30, 2024
006f217
typo
sannegovaert Sep 30, 2024
cbda9db
add tests for metadata
sannegovaert Sep 30, 2024
d76b491
account for ID == NULL
sannegovaert Sep 30, 2024
472f3d3
correct mistake in keywords
sannegovaert Sep 30, 2024
4386be9
add tests (work in progress)
sannegovaert Sep 30, 2024
2ff6d69
typo
sannegovaert Oct 1, 2024
96b9322
update test on metadata
sannegovaert Oct 1, 2024
6aa0983
small update
sannegovaert Oct 1, 2024
7da1b70
update prefix
sannegovaert Oct 8, 2024
c3e874c
update parameter names
sannegovaert Oct 8, 2024
d2acd21
update parameter names and prefix
sannegovaert Oct 8, 2024
649f0f1
fix name merged DP
sannegovaert Oct 8, 2024
3c25c05
rename merged DP
sannegovaert Oct 8, 2024
13d9b44
add tests on custom prefixes
sannegovaert Oct 8, 2024
8822b7e
Update test-merge_camtrapdp.R
sannegovaert Oct 8, 2024
156d6d8
add test on piping
sannegovaert Oct 8, 2024
a9c07d9
set id to NULL instead of NA
sannegovaert Oct 8, 2024
b6b8c68
add tests for description
sannegovaert Oct 8, 2024
f35e46d
id is set to NULL instead of NA
sannegovaert Oct 8, 2024
3c4bb10
taxonomic scope should also be updated in filter_deployments()
sannegovaert Oct 9, 2024
633171b
set directory
sannegovaert Oct 9, 2024
d9943ce
update documentation
sannegovaert Oct 9, 2024
bb70934
Merge branch 'main' into merge_datasets
peterdesmet Oct 16, 2024
56454a8
Update test-merge_camtrapdp.R
sannegovaert Oct 16, 2024
913eb98
Update NEWS.md
sannegovaert Oct 16, 2024
f8327b6
add documentation
sannegovaert Oct 16, 2024
99ebe0d
update on project(s)
sannegovaert Oct 16, 2024
637ee27
use x and y instead of x1 and x2
sannegovaert Oct 16, 2024
0ed0511
fix example
sannegovaert Oct 16, 2024
e166bb6
add visible binding for global variables
sannegovaert Oct 16, 2024
eea76b6
typo
sannegovaert Oct 16, 2024
ccc479b
document()
sannegovaert Oct 16, 2024
800a49c
Update DESCRIPTION
sannegovaert Oct 16, 2024
e86a7b7
avoid error on lacking visible binding
sannegovaert Oct 16, 2024
dce3167
undo mistake
sannegovaert Oct 17, 2024
74983fd
Merge branch 'main' into merge_datasets
peterdesmet Oct 17, 2024
e659911
Merge branch 'main' into merge_datasets
peterdesmet Oct 25, 2024
b959be5
check for additional resources
sannegovaert Oct 25, 2024
f3c6633
add additional resources
sannegovaert Oct 25, 2024
82d2d26
move to helper functions
sannegovaert Oct 25, 2024
19f98bd
typo's
sannegovaert Oct 25, 2024
4451fee
Update test-merge_camtrapdp.R
sannegovaert Oct 25, 2024
427226f
fix `merge_additional_resources()`
sannegovaert Oct 25, 2024
f3f2a45
update documentation
sannegovaert Oct 25, 2024
28802d3
reorder
sannegovaert Oct 28, 2024
fd068f9
Add new helper function
sannegovaert Oct 28, 2024
572d5af
Replace NULL values (generated because of reading JSON) with NA
sannegovaert Oct 28, 2024
84dc619
Update test-write_camtrapdp.R
sannegovaert Oct 28, 2024
7adf2c3
Use resources()
peterdesmet Oct 28, 2024
5e7360f
avoid tidyselect warning
sannegovaert Oct 29, 2024
b01c581
Create a new helper function to prefix duplicates/identifiers
peterdesmet Oct 29, 2024
add540a
Silence downloads
peterdesmet Oct 29, 2024
f3e29d9
Always use and require dataset$id, remove prefix argument
peterdesmet Oct 29, 2024
9dacf3f
Print name of dataset + add additional_resources() helper
peterdesmet Nov 5, 2024
13f6fc8
Use name, not identifier as prefix
peterdesmet Nov 5, 2024
b36f0b1
Create and use utils-merge helpers
peterdesmet Nov 5, 2024
78585d0
Update tests
peterdesmet Nov 5, 2024
db04d9e
Use snapshot files to see if expected metadata is in merged camtrapdp
peterdesmet Nov 7, 2024
32a9177
Test writing of merged as part of merged tests
peterdesmet Nov 7, 2024
0f9a28a
Don't use subdirs in tempdir()
peterdesmet Nov 7, 2024
ece86df
Correct typo
peterdesmet Nov 7, 2024
c386aff
Don't keep created in snapshot
peterdesmet Nov 7, 2024
050b099
Change variable name to xy
peterdesmet Nov 20, 2024
ade305b
Set name and title to NULL
peterdesmet Nov 20, 2024
cec0e79
Use simple unique() to detect duplicates + remove helpers
peterdesmet Nov 20, 2024
ae7e461
Update test comments
peterdesmet Nov 20, 2024
2af4b62
Import rlang for %||%
peterdesmet Nov 20, 2024
e06c490
Combine project info, rather than having 2
peterdesmet Nov 20, 2024
712ee32
Add DOIs as properly-defined relatedIdentifiers
peterdesmet Nov 20, 2024
4c99849
Fix example
peterdesmet Nov 21, 2024
72f7310
Use chuck for required properties
peterdesmet Nov 21, 2024
722b71f
Update doc
peterdesmet Nov 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ export(filter_media)
export(filter_observations)
export(locations)
export(media)
export(merge_camtrapdp)
export(observations)
export(read_camtrapdp)
export(round_coordinates)
Expand All @@ -23,6 +24,7 @@ export(version)
export(write_camtrapdp)
export(write_dwc)
export(write_eml)
import(rlang)
importFrom(dplyr,"%>%")
importFrom(dplyr,.data)
importFrom(memoise,memoise)
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# camtrapdp (development version)

* New function `write_camtrapdp()` writes a Camera Trap Data Package to disk as a `datapackage.json` and CSV files (#137).
* New function `merge_camtrapdp()` allows to merge two datasets (#112).
* New function `write_eml()` transforms Camtrap DP metadata to EML (#99).
* New function `round_coordinates()` allows to fuzzy/generalize location information by rounding deployment `latitude` and `longitude`. It also updates `coordinateUncertainty` in the deployments and `coordinatePrecision` and spatial scope in the metadata (#106).
* New function `shift_time()` allows to shift/correct date-times in data and metadata for specified deploymentIDs and duration (#108).
Expand Down
1 change: 1 addition & 0 deletions R/camtrapdp-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@

## usethis namespace: start
#' @importFrom dplyr %>% .data
#' @import rlang
## usethis namespace: end
NULL
165 changes: 165 additions & 0 deletions R/merge_camtrapdp.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
#' Merge two Camera Trap Data Packages
#'
#' Merges two Camera Trap Data Package objects into one.
#'
#' @param x,y Camera Trap Data Package objects, as returned by
#' [read_camtrapdp()].
#' @return A single Camera Trap Data Package object that is the combination of
#' `x` and `y`.
#' @family transformation functions
#' @export
#' @section Transformation details:
#'
#' Both `x` and `y` must have a unique dataset name `x$name` and `y$name`.
#' This name is used to prefix identifiers in the data that occur in both
#' datasets.
#' For example:
#' - `x` contains `deploymentID`s `c("a", "b")`.
#' - `y` contains `deploymentID`s `c("b", "c")`.
#' - Then merged `xy` will contain `deploymentID`s `c("a", "x_b", "y_b", "c")`.
#'
#' Data are merged as follows:
#' - Deployments are combined, with `deploymentID` kept unique.
#' - Media are combined, with `mediaID`, `deploymentID` and `eventID` kept
#' unique.
#' - Observations are combined, with `observationID`, `deploymentID`, `mediaID`
#' and `eventID` kept unique.
#' - Additional resources are retained, with the resource name kept unique.
#'
#' Metadata properties are merged as follows:
#' - **name**: Removed.
#' - **id**: Removed.
#' - **created**: Set to current timestamp.
#' - **title**: Removed.
#' - **contributors**: Combined, with duplicates removed.
#' - **description**: Combined as two paragraphs.
#' - **version**: Set to `1.0`.
#' - **keywords**: Combined, with duplicates removed.
#' - **image**: Removed.
#' - **homepage**: Removed.
#' - **sources**: Combined, with duplicates removed.
#' - **licenses**: Combined, with duplicates removed.
#' - **bibliographicCitation**: Removed.
#' - **project$id**: Removed.
#' - **project$title**: Combined.
#' - **project$acronym**: Removed.
#' - **project$description**: Combined as two paragraphs.
#' - **project$path**: Removed.
#' - **project$samplingDesign**: Sampling design of `x`.
#' - **project$captureMethod**: Combined, with duplicates removed.
#' - **project$individuals**: `TRUE` if one of the datasets has `TRUE`.
#' - **project$observationLevel**: Combined, with duplicates removed.
#' - **coordinatePrecision**: Set to the least precise `coordinatePrecision`.
#' - **spatial**: Reset based on the new deployments.
#' - **temporal**: Reset based on the new deployments.
#' - **taxonomic**: Combined, with duplicates removed.
#' - **relatedIdentifiers**: Combined, with duplicates removed.
#' - **references**: Combined, with duplicates removed.
#' - Custom properties of `x` are also retained.
#' @examples
#' x <- example_dataset() %>%
#' filter_deployments(deploymentID %in% c("00a2c20d", "29b7d356"))
#' y <- example_dataset() %>%
#' filter_deployments(deploymentID %in% c("577b543a", "62c200a9"))
#' x$name <- "x"
#' y$name <- "y"
#' merge_camtrapdp(x, y)
merge_camtrapdp <- function(x, y) {
check_camtrapdp(x)
check_camtrapdp(y)

# Check names
check_name <- function(name, arg) {
if (is.null(name) || is.na(name) || !is.character(name)) {
cli::cli_abort(
c(
"{.arg {arg}} must have a unique (character) name.",
"i" = "Assign one to {.field {arg}$name}."
),
class = "camtrapdp_error_name_invalid"
)
}
}
check_name(x$name, "x")
check_name(y$name, "y")
if (x$name == y$name) {
cli::cli_abort(
c(
"{.arg x} and {.arg y} must have different unique names.",
"x" = "{.field x$name} and {.field y$name} currently have the same
value: {.val {x$name}}."
),
class = "camtrapdp_error_name_duplicated"
)
}
prefixes <- c(x$name, y$name)

# Create xy from x
xy <- x

# Merge resources
xy$resources <- merge_resources(x, y, prefixes)

# Merge data
deployments(xy) <- merge_deployments(x, y, prefixes)
media(xy) <- merge_media(x, y, prefixes)
observations(xy) <- merge_observations(x, y, prefixes)

# Merge/update metadata
xy$name <- NULL
xy$id <- NULL
xy$created <- format(Sys.time(), "%Y-%m-%dT%H:%M:%SZ")
xy$title <- NULL
xy$contributors <- unique(c(x$contributors, y$contributors))
xy$description <- paste(x$description, y$description, sep = "/n")
xy$version <- "1.0"
xy$keywords <- unique(c(x$keywords, y$keywords))
xy$image <- NULL
xy$homepage <- NULL
xy$sources <- unique(c(x$sources, y$sources))
xy$licenses <- unique(c(x$licenses, y$licenses))
xy$bibliographicCitation <- NULL
xy$project$id <- NULL
xy$project$title <- paste(x$project$title, y$project$title, sep = " / ")
xy$project$acronym <- NULL
xy$project$description <-
paste(x$project$description, y$project$description, sep = "/n")
xy$project$path <- NULL
xy$project$samplingDesign <- x$project$samplingDesign # Second one ignored
xy$project$captureMethod <-
unique(c(x$project$captureMethod, y$project$captureMethod))
xy$project$individuals <- any(x$project$individuals, y$project$individiuals)
xy$project$observationLevel <-
unique(c(x$project$observationLevel, y$project$observationLevel))
xy$coordinatePrecision <-
max(x$coordinatePrecision, y$coordinatePrecision, na.rm = TRUE)
xy$relatedIdentifiers <- unique(c(x$relatedIdentifiers, y$relatedIdentifiers))
xy$references <- unique(c(x$references, y$references))
xy$directory <- "."

# Add package$id to related identifiers if it is a DOI
add_related_id <- function(id, related_ids) {
if (grepl("doi", id %||% "")) {
new_related_id <- list(
relationType = "isDerivedFrom",
relatedIdentifier = id,
resourceTypeGeneral = "Dataset",
relatedIdentifierType = "DOI"
)
related_ids <- c(related_ids, list(new_related_id))

}
return(related_ids)
}
xy$relatedIdentifiers <- add_related_id(x$id, xy$relatedIdentifiers)
xy$relatedIdentifiers <- add_related_id(y$id, xy$relatedIdentifiers)

# Update scopes
xy <-
xy %>%
update_spatial() %>%
update_temporal() %>%
update_taxonomic()

return(xy)
}
18 changes: 9 additions & 9 deletions R/print.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,26 +20,26 @@ print.camtrapdp <- function(x, ...) {
# check_camtrapdp() not necessary: print only triggered for camtrapdp object

# Calculate number of rows for the tables (resources in x$data)
tables_rows <-
tables <-
purrr::pluck(x, "data") %>%
purrr::map(nrow)
tables_length <- length(tables)

# List tables
tables <- names(tables_rows)
# Show name and tables
name <- if (!is.null(x$name)) cli::format_inline("{.val {x$name}} ") else ""
cli::cat_line(
cli::format_inline(
"A Camera Trap Data Package with {length(tables)} table{?s}{?./:/:}"
"A Camera Trap Data Package {name}with {tables_length} table{?s}{?./:/:}"
)
)
purrr::walk2(
names(tables_rows),
tables_rows,
names(tables),
tables,
~ cli::cat_bullet(cli::format_inline("{.x}: {.val {.y}} rows"))
)

# List additional resources (not in x$data), if any
resources <- frictionless::resources(x)
extra_resources <- resources[!resources %in% tables]
# List additional resources, if any
extra_resources <- additional_resources(x)
if (length(extra_resources) > 0) {
cli::cat_line("")
cli::cat_line(
Expand Down
2 changes: 1 addition & 1 deletion R/taxa.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ taxa <- function(x) {
dplyr::select("scientificName", dplyr::starts_with("taxon.")) %>%
dplyr::distinct() %>%
dplyr::rename_with(~ sub("^taxon.", "", .x)) %>%
dplyr::arrange(scientificName)
dplyr::arrange(.data$scientificName)

# Remove duplicates without taxonID
if ("taxonID" %in% names(taxa)) {
Expand Down
3 changes: 3 additions & 0 deletions R/taxonomic.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ taxonomic <- function(x) {
return(NULL)
}

# Replace NULL with NA
taxonomic_list <- replace_null_recursive(taxonomic_list)

# Convert list into a data.frame
taxa <-
purrr::map(
Expand Down
Loading