Skip to content

Commit

Permalink
Add read_intchron_csv(). See #4.
Browse files Browse the repository at this point in the history
  • Loading branch information
joeroe committed Oct 19, 2020
1 parent c8c8f91 commit c33e222
Show file tree
Hide file tree
Showing 6 changed files with 108 additions and 1 deletion.
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ Imports:
curl,
httr,
tibble,
dplyr
dplyr,
readr
Suggests:
knitr,
roxygen2,
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ export(intchron_hosts)
export(intchron_request)
export(intchron_tabulate)
export(intchron_url)
export(read_intchron_csv)
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,6 @@ Initial release, including:
* `intchron_request()` and `intchron_url()` help construct requests to IntChron.
* `intchron_crawl()` recursively retrieves records.
* `intchron_extract()` and `intchron_tabulate()` help wrangle response data.
* Read and write functions:
* `read_intchron_csv()` for CSV files.
* Vignettes: `vignette("rintchron")` and `vignette("intchron-api")`
59 changes: 59 additions & 0 deletions R/read.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Read functions

#' Read a CSV file from IntChron
#'
#' @description
#' Reads records in the CSV format exported by by IntChron. These are regular
#' CSV files with a few elements of non-standard formatting that mean they can't
#' be directly parsed by e.g. [read.csv()] or [readr::read_csv()] (see details).
#'
#' It is usually more robust to retrieve data from IntChron in JSON format using
#' [intchron()] or [intchron_request()].
#'
#' @param file CSV records exported from IntChron; either a path to a downloaded
#' file, a URL, or literal data. Use [readr::clipboard()] to read from the
#' system clipboard.
#'
#' @details
#' CSV files exported from IntChron have the following non-standard formatting:
#'
#' * Comment lines are denoted with '#' and contain metadata before and after
#' the table of data itself.
#' * The comment line immediately above the data contains the column headings
#' * A variable number of empty columns occur at the beginning of rows
#' * A trailing comma occurs at the end of every row except the header
#' * Missing values may be coded as: "", "-"
#'
#' Beyond this, some data tables are malformed (e.g. they contain unmatched
#' quotes) and cannot be parsed without an error.
#'
#' @return
#' A `tibble` containing the data from the record. Associated metadata is
#' discarded.
#'
#' @family read and write functions
#'
#' @export
read_intchron_csv <- function(file) {
lines <- readr::read_lines(file)

# Check whether there's actually any non-comment lines
if (all(grepl("^#", lines) | grepl("^$", lines))) {
return(data.frame(NA))
}

# Reformat the header row
nheader <- grep("^,", lines)[1] - 1
lines[nheader] <- sub("#", "", lines[nheader])
lines[nheader] <- paste0(lines[nheader], ",")

# Read data table
data <- utils::read.csv(text = lines, stringsAsFactors = FALSE,
comment.char = "#", na.strings = c("", "-"))

# Drop unnamed columns (assumed to be empty)
data <- data[!grepl("^X(\\.[0-9]+)?$", names(data))]

data <- tibble::as_tibble(data)
return(data)
}
4 changes: 4 additions & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ reference:
desc: "High-level functions for querying databases indexed by IntChron."
- contents:
- has_concept("functions for querying IntChron")
- title: "Read and write"
desc: "Functions for reading and writing data in IntChron's file formats."
- contents:
- has_concept("read and write functions")
- title: "IntChron API"
desc: "Low-level functions for interacting with the IntChron API directly."
- contents:
Expand Down
40 changes: 40 additions & 0 deletions man/read_intchron_csv.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit c33e222

Please sign in to comment.