diff --git a/_posts/2024-09-01-fetch-files-web/fetch-files-web.Rmd b/_posts/2024-09-22-fetch-files-web/fetch-files-web.Rmd
similarity index 89%
rename from _posts/2024-09-01-fetch-files-web/fetch-files-web.Rmd
rename to _posts/2024-09-22-fetch-files-web/fetch-files-web.Rmd
index 3c3ed003..0163f76b 100644
--- a/_posts/2024-09-01-fetch-files-web/fetch-files-web.Rmd
+++ b/_posts/2024-09-22-fetch-files-web/fetch-files-web.Rmd
@@ -1,7 +1,7 @@
 ---
 title: 'Read files on the web into R'
 description: |
-  Mostly a compilation of some code-snippets for my own use
+  For the download-button-averse of us
 categories:
   - tutorial
 base_url: https://yjunechoe.github.io
@@ -10,7 +10,7 @@ author:
     affiliation: University of Pennsylvania Linguistics
     affiliation_url: https://live-sas-www-ling.pantheon.sas.upenn.edu/
     orcid_id: 0000-0002-0701-921X
-date: 09-01-2024
+date: 09-22-2024
 output:
   distill::distill_article:
     include-after-body: "highlighting.html"
@@ -20,7 +20,6 @@ output:
 editor_options: 
   chunk_output_type: console
 preview: github-dplyr-starwars.jpg
-draft: true
 ---
 
 ```{r setup, include=FALSE}
@@ -36,7 +35,7 @@ knitr::opts_chunk$set(
 
 Every so often I'll have a link to some file on hand and want to read it in R without going out of my way to browse the web page, find a download link, download it somewhere onto my computer, grab the path to it, and then finally read it into R.
 
-Over the years I've accumulated some tricks to get data into R "straight from a url", even if the url does not point to the raw file contents itself. The method varies between data sources though, and I have a hard time keeping track of them in my head, so I thought I'd write some of these down for my own reference. This is not meant to be comprehensive though - keep in mind that I'm someone who primarily works with tabular data and use GitHub and OSF as data repositories.
+Over the years I've accumulated some tricks to get data into R "straight from a url", even if the url does not point to the raw file contents itself. The method varies between data sources though, and I have a hard time keeping track of them in my head, so I thought I'd write some of these down for my own reference. This is not meant to be comprehensive though - keep in mind that I'm someone who primarily works with tabular data and interface with GitHub and OSF as data repositories.
 
 ## GitHub (public repos)
 
@@ -91,9 +90,9 @@ emphatic::hl_diff(
 
 ## GitHub (gists)
 
-It's a similar idea with GitHub Gists (sometimes I like to store small datasets for demos as gists). For example, here's a link to a simulated data for a [Stroop experiment](https://en.wikipedia.org/wiki/Stroop_effect) `stroop.csv`: <https://gist.github.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6>.
+It's a similar idea with GitHub Gists, where I sometimes like to store small toy datasets for use in demos. For example, here's a link to a simulated data for a [Stroop experiment](https://en.wikipedia.org/wiki/Stroop_effect) `stroop.csv`: <https://gist.github.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6>.
 
-But that's a full on webpage. The url which actually hosts the csv contents is <https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw/c643b9760126d92b8ac100860ac5b50ba492f316/stroop.csv>, which you can again get to by clicking the **Raw** button at the top-right corner of the gist
+But that's again a full-on webpage. The url which actually hosts the csv contents is <https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw/c643b9760126d92b8ac100860ac5b50ba492f316/stroop.csv>, which you can again get to by clicking the **Raw** button at the top-right corner of the gist
 
 ```{r, echo=FALSE, fig.align='center', out.width="100%", out.extra="class=external"}
 knitr::include_graphics("github-gist-stroop.jpg", error = FALSE)
@@ -121,7 +120,7 @@ We now turn to the harder problem of accessing a file in a private GitHub reposi
 
 Except this time, when you open the file at that url (assuming it can display in plain text), you'll see the url come with a "token" attached at the end (I'll show an example further down). This token is necessary to remotely access the data in a private repo. Once a token is generated, the file can be accessed using that token from anywhere, but note that it *will expire* at some point as GitHub refreshes tokens periodically (so treat them as if they're for single use).
 
-For a more robust approach, you can use the [GitHub Contents API](https://docs.github.com/en/rest/repos/contents). If you have your credentials set up in [`{gh}`](https://gh.r-lib.org/) (which you can check with `gh::gh_whoami()`), you can request a token-tagged url to the private file using the syntax:[^Thanks [@tanho](https://fosstodon.org/@tanho) for pointing me to this at the [R4DS/DSLC](https://fosstodon.org/@DSLC) slack.]
+For a more robust approach, you can use the [GitHub Contents API](https://docs.github.com/en/rest/repos/contents). If you have your credentials set up in [`{gh}`](https://gh.r-lib.org/) (which you can check with `gh::gh_whoami()`), you can request a token-tagged url to the private file using the syntax:^[Thanks [@tanho](https://fosstodon.org/@tanho) for pointing me to this at the [R4DS/DSLC](https://fosstodon.org/@DSLC) slack.]
 
 ```{r, eval=FALSE}
 gh::gh("/repos/{user}/{repo}/contents/{path}")$download_url
@@ -173,7 +172,7 @@ arrow::read_feather("https://osf.io/download/9vztj/") |>
 
 You might have already caught on to this, but the pattern is to simply point to `osf.io/download/` instead of `osf.io/`.
 
-This method also works for view-only links to anonymized OSF projects as well. For example, this is an anonymized link to a csv file from one of my projects <https://osf.io/tr8qm?view_only=998ad87d86cc4049af4ec6c96a91d9ad>. Navigating to this link will show a web preview of the csv file contents, just like in the GitHub example with `dplyr::starwars`.
+This method also works for view-only links to anonymized OSF projects as well. For example, this is an anonymized link to a csv file from one of my projects <https://osf.io/tr8qm?view_only=998ad87d86cc4049af4ec6c96a91d9ad>. Navigating to this link will show a web preview of the csv file contents.
 
 By inserting `/download` into this url, we can read the csv file contents directly:
 
@@ -186,9 +185,9 @@ See also the [`{osfr}`](https://docs.ropensci.org/osfr/reference/osfr-package.ht
 
 ## Aside: Can't go wrong with a copy-paste!
 
-Reading remote files aside, I think it's severly under-rated how base R has a  `readClipboard()` function and a collection of `read.*()` functions which can also read directly from a `"clipboard"` connection.^[The special value `"clipboard"` works for most base-R read functions that take a `file` or `con` argument.]
+Reading remote files aside, I think it's severely underrated how base R has a `readClipboard()` function and a collection of `read.*()` functions which can also read directly from a `"clipboard"` connection.^[The special value `"clipboard"` works for most base-R read functions that take a `file` or `con` argument.]
 
-I sometimes do this for html/markdown summary tables that a website might display, or sometimes even for entire excel/googlesheets tables after doing a select-all + copy. For such relatively small chunks of data that you just want to quickly get into R, you can also lean on base R's clipboard functionalities.
+I sometimes do this for html/markdown summary tables that a website might display, or sometimes even for entire excel/googlesheets tables after doing a select-all + copy. For such relatively small chunks of data that you just want to quickly get into R, you can lean on base R's clipboard functionalities.
 
 For example, given this markdown table:
 
@@ -197,7 +196,7 @@ aggregate(mtcars, mpg ~ cyl, mean) |>
   knitr::kable()
 ```
 
-You can copy it and run the following code to get that data back as an R data frame:
+You can copy its contents and run the following code to get that data back as an R data frame:
 
 ```{r, eval=FALSE}
 read.delim("clipboard")
@@ -257,9 +256,13 @@ For this example I will use a [parquet file](https://duckdb.org/docs/data/parque
 ```{r}
 # A parquet file of tokens from a sample of child-directed speech
 file <- "https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID%3D1/part-7.parquet"
+
+# For comparison, reading its contents with {arrow}
+arrow::read_parquet(file) |> 
+  head(5)
 ```
 
-In duckdb, the `httpfs` extension allows `PARQUET_SCAN`^[Or `READ_PARQUET` - [same thing](https://duckdb.org/docs/data/parquet/overview.html#read_parquet-function).] to read a remote parquet file.
+In duckdb, the `httpfs` extension we loaded above allows `PARQUET_SCAN`^[Or `READ_PARQUET` - [same thing](https://duckdb.org/docs/data/parquet/overview.html#read_parquet-function).] to read a remote parquet file.
 
 ```{r}
 query1 <- glue::glue_sql("
@@ -310,11 +313,11 @@ To get the file tree of the repo on the master branch, we use:
 files <- gh::gh("/repos/yjunechoe/repetition_events/git/trees/master?recursive=true")$tree
 ```
 
-With `recursive=true`, this returns all files in the repo. We can filter for just the parquet files we want with a little regex:
+With `recursive=true`, this returns all files in the repo. Then, we can filter for just the parquet files we want with a little regex:
 
 ```{r}
 parquet_files <- sapply(files, `[[`, "path") |> 
-  grep(x = _, pattern = ".*data/tokens_data/.*parquet$", value = TRUE)
+  grep(x = _, pattern = ".*/tokens_data/.*parquet$", value = TRUE)
 length(parquet_files)
 head(parquet_files)
 ```
@@ -423,21 +426,21 @@ Lastly, I inadvertently(?) started some discussion around remotely accessing spa
 
 I also have some random tricks that are more situational. Unfortunately, I can only recall like 20% of them at any given moment, so I'll be updating this space as more come back to me:
 
-- When reading remote `.rda` or `.RData` files with `load()`, you need to wrap the link in `url()` first (ref: [stackoverflow](https://stackoverflow.com/questions/26108575/loading-rdata-files-from-url)).
+- When reading remote `.rda` or `.RData` files with `load()`, you may need to wrap the link in `url()` first (ref: [stackoverflow](https://stackoverflow.com/questions/26108575/loading-rdata-files-from-url)).
 
 - [`{vroom}`](https://vroom.r-lib.org/) can [remotely read gzipped files](https://vroom.r-lib.org/articles/vroom.html#reading-remote-files), without having to `download.file()` and `unzip()` first.
 
 - [`{curl}`](https://jeroen.cran.dev/curl/), of course, will always have the most comprehensive set of low-level tools you need to read any arbitrary data remotely. For example, using `curl::curl_fetch_memory()` to read the `dplyr::storms` data again from the GitHub raw contents link:
 
-    ```{r}
-    fetched <- curl::curl_fetch_memory(
-    "https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/starwars.csv"
+```{r}
+fetched <- curl::curl_fetch_memory(
+  "https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/starwars.csv"
 )
 read.csv(text = rawToChar(fetched$content)) |> 
-    dplyr::glimpse()
-    ```
+  dplyr::glimpse()
+```
 
-  And even if you're going the route of downloading the file first, `curl::multi_download()` can offer big performance improvements over `download.file()`.^[See an example implemented for [`{openalexR}`](https://github.com/ropensci/openalexR/pull/63), an API package.] Many `{curl}` functions can also handle [retries and stop/resumes](https://fosstodon.org/@eliocamp@mastodon.social/111885424355264237) which is cool too.
+- Even if you're going the route of downloading the file first, `curl::multi_download()` can offer big performance improvements over `download.file()`.^[See an example implemented for [`{openalexR}`](https://github.com/ropensci/openalexR/pull/63), an API package.] Many `{curl}` functions can also handle [retries and stop/resumes](https://fosstodon.org/@eliocamp@mastodon.social/111885424355264237) which is cool too.
 
 - [`{httr2}`](https://httr2.r-lib.org/) can capture a *continuous data stream* with `httr2::req_perform_stream()` up to a set time or size.
 
diff --git a/_posts/2024-09-01-fetch-files-web/fetch-files-web.html b/_posts/2024-09-22-fetch-files-web/fetch-files-web.html
similarity index 91%
rename from _posts/2024-09-01-fetch-files-web/fetch-files-web.html
rename to _posts/2024-09-22-fetch-files-web/fetch-files-web.html
index 06589d7f..b0083a46 100644
--- a/_posts/2024-09-01-fetch-files-web/fetch-files-web.html
+++ b/_posts/2024-09-22-fetch-files-web/fetch-files-web.html
@@ -32,7 +32,7 @@
 }
 @media print {
 pre > code.sourceCode { white-space: pre-wrap; }
-pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
+pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; }
 }
 pre.numberSource code
   { counter-reset: source-line 0; }
@@ -90,32 +90,32 @@
   <!--radix_placeholder_meta_tags-->
   <title>Read files on the web into R</title>
 
-  <meta property="description" itemprop="description" content="Mostly a compilation of some code-snippets for my own use"/>
+  <meta property="description" itemprop="description" content="For the download-button-averse of us"/>
 
 
   <!--  https://schema.org/Article -->
-  <meta property="article:published" itemprop="datePublished" content="2024-09-01"/>
-  <meta property="article:created" itemprop="dateCreated" content="2024-09-01"/>
+  <meta property="article:published" itemprop="datePublished" content="2024-09-22"/>
+  <meta property="article:created" itemprop="dateCreated" content="2024-09-22"/>
   <meta name="article:author" content="June Choe"/>
 
   <!--  https://developers.facebook.com/docs/sharing/webmasters#markup -->
   <meta property="og:title" content="Read files on the web into R"/>
   <meta property="og:type" content="article"/>
-  <meta property="og:description" content="Mostly a compilation of some code-snippets for my own use"/>
+  <meta property="og:description" content="For the download-button-averse of us"/>
   <meta property="og:image" content="https://yjunechoe.github.io/github-dplyr-starwars.jpg"/>
   <meta property="og:locale" content="en_US"/>
 
   <!--  https://dev.twitter.com/cards/types/summary -->
   <meta property="twitter:card" content="summary_large_image"/>
   <meta property="twitter:title" content="Read files on the web into R"/>
-  <meta property="twitter:description" content="Mostly a compilation of some code-snippets for my own use"/>
+  <meta property="twitter:description" content="For the download-button-averse of us"/>
   <meta property="twitter:image" content="https://yjunechoe.github.io/github-dplyr-starwars.jpg"/>
 
   <!--/radix_placeholder_meta_tags-->
   <!--radix_placeholder_rmarkdown_metadata-->
 
   <script type="text/json" id="radix-rmarkdown-metadata">
-  {"type":"list","attributes":{"names":{"type":"character","attributes":{},"value":["title","description","categories","base_url","author","date","output","editor_options","preview","draft"]}},"value":[{"type":"character","attributes":{},"value":["Read files on the web into R"]},{"type":"character","attributes":{},"value":["Mostly a compilation of some code-snippets for my own use\n"]},{"type":"character","attributes":{},"value":["tutorial"]},{"type":"character","attributes":{},"value":["https://yjunechoe.github.io"]},{"type":"list","attributes":{},"value":[{"type":"list","attributes":{"names":{"type":"character","attributes":{},"value":["name","affiliation","affiliation_url","orcid_id"]}},"value":[{"type":"character","attributes":{},"value":["June Choe"]},{"type":"character","attributes":{},"value":["University of Pennsylvania Linguistics"]},{"type":"character","attributes":{},"value":["https://live-sas-www-ling.pantheon.sas.upenn.edu/"]},{"type":"character","attributes":{},"value":["0000-0002-0701-921X"]}]}]},{"type":"character","attributes":{},"value":["09-01-2024"]},{"type":"list","attributes":{"names":{"type":"character","attributes":{},"value":["distill::distill_article"]}},"value":[{"type":"list","attributes":{"names":{"type":"character","attributes":{},"value":["include-after-body","toc","self_contained","css"]}},"value":[{"type":"character","attributes":{},"value":["highlighting.html"]},{"type":"logical","attributes":{},"value":[true]},{"type":"logical","attributes":{},"value":[false]},{"type":"character","attributes":{},"value":["../../styles.css"]}]}]},{"type":"list","attributes":{"names":{"type":"character","attributes":{},"value":["chunk_output_type"]}},"value":[{"type":"character","attributes":{},"value":["console"]}]},{"type":"character","attributes":{},"value":["github-dplyr-starwars.jpg"]},{"type":"logical","attributes":{},"value":[true]}]}
+  {"type":"list","attributes":{"names":{"type":"character","attributes":{},"value":["title","description","categories","base_url","author","date","output","editor_options","preview"]}},"value":[{"type":"character","attributes":{},"value":["Read files on the web into R"]},{"type":"character","attributes":{},"value":["For the download-button-averse of us\n"]},{"type":"character","attributes":{},"value":["tutorial"]},{"type":"character","attributes":{},"value":["https://yjunechoe.github.io"]},{"type":"list","attributes":{},"value":[{"type":"list","attributes":{"names":{"type":"character","attributes":{},"value":["name","affiliation","affiliation_url","orcid_id"]}},"value":[{"type":"character","attributes":{},"value":["June Choe"]},{"type":"character","attributes":{},"value":["University of Pennsylvania Linguistics"]},{"type":"character","attributes":{},"value":["https://live-sas-www-ling.pantheon.sas.upenn.edu/"]},{"type":"character","attributes":{},"value":["0000-0002-0701-921X"]}]}]},{"type":"character","attributes":{},"value":["09-22-2024"]},{"type":"list","attributes":{"names":{"type":"character","attributes":{},"value":["distill::distill_article"]}},"value":[{"type":"list","attributes":{"names":{"type":"character","attributes":{},"value":["include-after-body","toc","self_contained","css"]}},"value":[{"type":"character","attributes":{},"value":["highlighting.html"]},{"type":"logical","attributes":{},"value":[true]},{"type":"logical","attributes":{},"value":[false]},{"type":"character","attributes":{},"value":["../../styles.css"]}]}]},{"type":"list","attributes":{"names":{"type":"character","attributes":{},"value":["chunk_output_type"]}},"value":[{"type":"character","attributes":{},"value":["console"]}]},{"type":"character","attributes":{},"value":["github-dplyr-starwars.jpg"]}]}
   </script>
   <!--/radix_placeholder_rmarkdown_metadata-->
   
@@ -1524,7 +1524,7 @@
 <!--radix_placeholder_front_matter-->
 
 <script id="distill-front-matter" type="text/json">
-{"title":"Read files on the web into R","description":"Mostly a compilation of some code-snippets for my own use","authors":[{"author":"June Choe","authorURL":"#","affiliation":"University of Pennsylvania Linguistics","affiliationURL":"https://live-sas-www-ling.pantheon.sas.upenn.edu/","orcidID":"0000-0002-0701-921X"}],"publishedDate":"2024-09-01T00:00:00.000-04:00","citationText":"Choe, 2024"}
+{"title":"Read files on the web into R","description":"For the download-button-averse of us","authors":[{"author":"June Choe","authorURL":"#","affiliation":"University of Pennsylvania Linguistics","affiliationURL":"https://live-sas-www-ling.pantheon.sas.upenn.edu/","orcidID":"0000-0002-0701-921X"}],"publishedDate":"2024-09-22T00:00:00.000-04:00","citationText":"Choe, 2024"}
 </script>
 
 <!--/radix_placeholder_front_matter-->
@@ -1541,13 +1541,13 @@ <h1>Read files on the web into R</h1>
 <div class="dt-tag">tutorial</div>
 </div>
 <!--/radix_placeholder_categories-->
-<p><p>Mostly a compilation of some code-snippets for my own use</p></p>
+<p><p>For the download-button-averse of us</p></p>
 </div>
 
 <div class="d-byline">
   June Choe  (University of Pennsylvania Linguistics)<a href="https://live-sas-www-ling.pantheon.sas.upenn.edu/" class="uri">https://live-sas-www-ling.pantheon.sas.upenn.edu/</a>
   
-<br/>09-01-2024
+<br/>09-22-2024
 </div>
 
 <div class="d-article">
@@ -1571,7 +1571,7 @@ <h3>Contents</h3>
 </nav>
 </div>
 <p>Every so often I’ll have a link to some file on hand and want to read it in R without going out of my way to browse the web page, find a download link, download it somewhere onto my computer, grab the path to it, and then finally read it into R.</p>
-<p>Over the years I’ve accumulated some tricks to get data into R “straight from a url”, even if the url does not point to the raw file contents itself. The method varies between data sources though, and I have a hard time keeping track of them in my head, so I thought I’d write some of these down for my own reference. This is not meant to be comprehensive though - keep in mind that I’m someone who primarily works with tabular data and use GitHub and OSF as data repositories.</p>
+<p>Over the years I’ve accumulated some tricks to get data into R “straight from a url”, even if the url does not point to the raw file contents itself. The method varies between data sources though, and I have a hard time keeping track of them in my head, so I thought I’d write some of these down for my own reference. This is not meant to be comprehensive though - keep in mind that I’m someone who primarily works with tabular data and interface with GitHub and OSF as data repositories.</p>
 <h2 id="github-public-repos">GitHub (public repos)</h2>
 <p>GitHub has nice a point-and-click interface for browsing repositories and previewing files. For example, you can navigate to the <code>dplyr::starwars</code> dataset from <a href="https://github.com/tidyverse/dplyr/">tidyverse/dplyr</a>, at <a href="https://github.com/tidyverse/dplyr/blob/main/data-raw/starwars.csv" class="uri">https://github.com/tidyverse/dplyr/blob/main/data-raw/starwars.csv</a>:</p>
 <div class="layout-chunk" data-layout="l-body">
@@ -1632,8 +1632,8 @@ <h2 id="github-public-repos">GitHub (public repos)</h2>
 </pre>
 </div>
 <h2 id="github-gists">GitHub (gists)</h2>
-<p>It’s a similar idea with GitHub Gists (sometimes I like to store small datasets for demos as gists). For example, here’s a link to a simulated data for a <a href="https://en.wikipedia.org/wiki/Stroop_effect">Stroop experiment</a> <code>stroop.csv</code>: <a href="https://gist.github.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6" class="uri">https://gist.github.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6</a>.</p>
-<p>But that’s a full on webpage. The url which actually hosts the csv contents is <a href="https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw/c643b9760126d92b8ac100860ac5b50ba492f316/stroop.csv" class="uri">https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw/c643b9760126d92b8ac100860ac5b50ba492f316/stroop.csv</a>, which you can again get to by clicking the <strong>Raw</strong> button at the top-right corner of the gist</p>
+<p>It’s a similar idea with GitHub Gists, where I sometimes like to store small toy datasets for use in demos. For example, here’s a link to a simulated data for a <a href="https://en.wikipedia.org/wiki/Stroop_effect">Stroop experiment</a> <code>stroop.csv</code>: <a href="https://gist.github.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6" class="uri">https://gist.github.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6</a>.</p>
+<p>But that’s again a full-on webpage. The url which actually hosts the csv contents is <a href="https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw/c643b9760126d92b8ac100860ac5b50ba492f316/stroop.csv" class="uri">https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw/c643b9760126d92b8ac100860ac5b50ba492f316/stroop.csv</a>, which you can again get to by clicking the <strong>Raw</strong> button at the top-right corner of the gist</p>
 <div class="layout-chunk" data-layout="l-body">
 <p><img src="github-gist-stroop.jpg" width="100%" class=external style="display: block; margin: auto;" /></p>
 </div>
@@ -1666,7 +1666,7 @@ <h2 id="github-gists">GitHub (gists)</h2>
 <h2 id="github-private-repos">GitHub (private repos)</h2>
 <p>We now turn to the harder problem of accessing a file in a private GitHub repository. If you already have the GitHub webpage open and you’re signed in, you can follow the same step of copying the link that the <strong>Raw</strong> button redirects to.</p>
 <p>Except this time, when you open the file at that url (assuming it can display in plain text), you’ll see the url come with a “token” attached at the end (I’ll show an example further down). This token is necessary to remotely access the data in a private repo. Once a token is generated, the file can be accessed using that token from anywhere, but note that it <em>will expire</em> at some point as GitHub refreshes tokens periodically (so treat them as if they’re for single use).</p>
-<p>For a more robust approach, you can use the <a href="https://docs.github.com/en/rest/repos/contents">GitHub Contents API</a>. If you have your credentials set up in <a href="https://gh.r-lib.org/"><code>{gh}</code></a> (which you can check with <code>gh::gh_whoami()</code>), you can request a token-tagged url to the private file using the syntax:</p>
+<p>For a more robust approach, you can use the <a href="https://docs.github.com/en/rest/repos/contents">GitHub Contents API</a>. If you have your credentials set up in <a href="https://gh.r-lib.org/"><code>{gh}</code></a> (which you can check with <code>gh::gh_whoami()</code>), you can request a token-tagged url to the private file using the syntax:<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a></p>
 <div class="layout-chunk" data-layout="l-body">
 <div class="sourceCode">
 <pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'>gh</span><span class='fu'>::</span><span class='fu'><a href='https://gh.r-lib.org/reference/gh.html'>gh</a></span><span class='op'>(</span><span class='st'>"/repos/{user}/{repo}/contents/{path}"</span><span class='op'>)</span><span class='op'>$</span><span class='va'>download_url</span></span></code></pre>
@@ -1686,9 +1686,9 @@ <h2 id="github-private-repos">GitHub (private repos)</h2>
 <span>  <span class='co'># truncating</span></span>
 <span>  <span class='fu'><a href='https://rdrr.io/r/base/grep.html'>gsub</a></span><span class='op'>(</span>x <span class='op'>=</span> <span class='va'>_</span>, <span class='st'>"^(.{100}).*"</span>, <span class='st'>"\\1..."</span><span class='op'>)</span></span></code></pre>
 </div>
-<pre><code>  [1] &quot;https://raw.githubusercontent.com/yjunechoe/my-super-secret-repo/main/README.md?token=AMTCUR6BQGEERA...&quot;</code></pre>
+<pre><code>  [1] &quot;https://raw.githubusercontent.com/yjunechoe/my-super-secret-repo/main/README.md?token=AMTCUR2JPXCIX5...&quot;</code></pre>
 </div>
-<p>I can then use this url to read the private file:<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a></p>
+<p>I can then use this url to read the private file:<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a></p>
 <div class="layout-chunk" data-layout="l-body">
 <div class="sourceCode">
 <pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'>gh</span><span class='fu'>::</span><span class='fu'><a href='https://gh.r-lib.org/reference/gh.html'>gh</a></span><span class='op'>(</span><span class='st'>"/repos/yjunechoe/my-super-secret-repo/contents/README.md"</span><span class='op'>)</span><span class='op'>$</span><span class='va'>download_url</span> <span class='op'>|&gt;</span> </span>
@@ -1718,7 +1718,7 @@ <h2 id="osf">OSF</h2>
   $ yield &lt;int&gt; 1545, 1440, 1440, 1520, 1580, 1540, 1555, 1490, 1560, 1495, 1595…</code></pre>
 </div>
 <p>You might have already caught on to this, but the pattern is to simply point to <code>osf.io/download/</code> instead of <code>osf.io/</code>.</p>
-<p>This method also works for view-only links to anonymized OSF projects as well. For example, this is an anonymized link to a csv file from one of my projects <a href="https://osf.io/tr8qm?view_only=998ad87d86cc4049af4ec6c96a91d9ad" class="uri">https://osf.io/tr8qm?view_only=998ad87d86cc4049af4ec6c96a91d9ad</a>. Navigating to this link will show a web preview of the csv file contents, just like in the GitHub example with <code>dplyr::starwars</code>.</p>
+<p>This method also works for view-only links to anonymized OSF projects as well. For example, this is an anonymized link to a csv file from one of my projects <a href="https://osf.io/tr8qm?view_only=998ad87d86cc4049af4ec6c96a91d9ad" class="uri">https://osf.io/tr8qm?view_only=998ad87d86cc4049af4ec6c96a91d9ad</a>. Navigating to this link will show a web preview of the csv file contents.</p>
 <p>By inserting <code>/download</code> into this url, we can read the csv file contents directly:</p>
 <div class="layout-chunk" data-layout="l-body">
 <div class="sourceCode">
@@ -1735,8 +1735,8 @@ <h2 id="osf">OSF</h2>
 </div>
 <p>See also the <a href="https://docs.ropensci.org/osfr/reference/osfr-package.html"><code>{osfr}</code></a> package for a more principled interface to OSF.</p>
 <h2 id="aside-cant-go-wrong-with-a-copy-paste">Aside: Can’t go wrong with a copy-paste!</h2>
-<p>Reading remote files aside, I think it’s severly under-rated how base R has a <code>readClipboard()</code> function and a collection of <code>read.*()</code> functions which can also read directly from a <code>"clipboard"</code> connection.<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a></p>
-<p>I sometimes do this for html/markdown summary tables that a website might display, or sometimes even for entire excel/googlesheets tables after doing a select-all + copy. For such relatively small chunks of data that you just want to quickly get into R, you can also lean on base R’s clipboard functionalities.</p>
+<p>Reading remote files aside, I think it’s severely underrated how base R has a <code>readClipboard()</code> function and a collection of <code>read.*()</code> functions which can also read directly from a <code>"clipboard"</code> connection.<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a></p>
+<p>I sometimes do this for html/markdown summary tables that a website might display, or sometimes even for entire excel/googlesheets tables after doing a select-all + copy. For such relatively small chunks of data that you just want to quickly get into R, you can lean on base R’s clipboard functionalities.</p>
 <p>For example, given this markdown table:</p>
 <div class="layout-chunk" data-layout="l-body">
 <div class="sourceCode">
@@ -1745,28 +1745,28 @@ <h2 id="aside-cant-go-wrong-with-a-copy-paste">Aside: Can’t go wrong with a co
 </div>
 <table>
 <thead>
-<tr class="header">
+<tr>
 <th style="text-align: right;">cyl</th>
 <th style="text-align: right;">mpg</th>
 </tr>
 </thead>
 <tbody>
-<tr class="odd">
+<tr>
 <td style="text-align: right;">4</td>
 <td style="text-align: right;">26.66364</td>
 </tr>
-<tr class="even">
+<tr>
 <td style="text-align: right;">6</td>
 <td style="text-align: right;">19.74286</td>
 </tr>
-<tr class="odd">
+<tr>
 <td style="text-align: right;">8</td>
 <td style="text-align: right;">15.10000</td>
 </tr>
 </tbody>
 </table>
 </div>
-<p>You can copy it and run the following code to get that data back as an R data frame:</p>
+<p>You can copy its contents and run the following code to get that data back as an R data frame:</p>
 <div class="layout-chunk" data-layout="l-body">
 <div class="sourceCode">
 <pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'><a href='https://rdrr.io/r/utils/read.table.html'>read.delim</a></span><span class='op'>(</span><span class='st'>"clipboard"</span><span class='op'>)</span></span>
@@ -1779,7 +1779,7 @@ <h2 id="aside-cant-go-wrong-with-a-copy-paste">Aside: Can’t go wrong with a co
   2   6 19.74286
   3   8 15.10000</code></pre>
 </div>
-<p>If you’re instead copying something flat like a list of numbers or strings, you can also use <code>scan()</code> and specify the appropriate <code>sep</code> to get that data back as a vector:<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a></p>
+<p>If you’re instead copying something flat like a list of numbers or strings, you can also use <code>scan()</code> and specify the appropriate <code>sep</code> to get that data back as a vector:<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a></p>
 <div class="layout-chunk" data-layout="l-body">
 <div class="sourceCode">
 <pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'><a href='https://rdrr.io/r/base/paste.html'>paste</a></span><span class='op'>(</span><span class='fl'>1</span><span class='op'>:</span><span class='fl'>10</span>, collapse <span class='op'>=</span> <span class='st'>", "</span><span class='op'>)</span> <span class='op'>|&gt;</span> </span>
@@ -1816,10 +1816,22 @@ <h3 id="streaming-with-duckdb">Streaming with <code>{duckdb}</code></h3>
 <div class="layout-chunk" data-layout="l-body">
 <div class="sourceCode">
 <pre class="sourceCode r"><code class="sourceCode r"><span><span class='co'># A parquet file of tokens from a sample of child-directed speech</span></span>
-<span><span class='va'>file</span> <span class='op'>&lt;-</span> <span class='st'>"https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID%3D1/part-7.parquet"</span></span></code></pre>
-</div>
-</div>
-<p>In duckdb, the <code>httpfs</code> extension allows <code>PARQUET_SCAN</code><a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a> to read a remote parquet file.</p>
+<span><span class='va'>file</span> <span class='op'>&lt;-</span> <span class='st'>"https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID%3D1/part-7.parquet"</span></span>
+<span></span>
+<span><span class='co'># For comparison, reading its contents with {arrow}</span></span>
+<span><span class='fu'>arrow</span><span class='fu'>::</span><span class='fu'><a href='https://arrow.apache.org/docs/r/reference/read_parquet.html'>read_parquet</a></span><span class='op'>(</span><span class='va'>file</span><span class='op'>)</span> <span class='op'>|&gt;</span> </span>
+<span>  <span class='fu'><a href='https://rdrr.io/r/utils/head.html'>head</a></span><span class='op'>(</span><span class='fl'>5</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  # A tibble: 5 × 3
+    utterance_id gloss   part_of_speech
+           &lt;int&gt; &lt;chr&gt;   &lt;chr&gt;         
+  1            1 www     &quot;&quot;            
+  2            2 bye     &quot;co&quot;          
+  3            3 mhm     &quot;co&quot;          
+  4            4 Mommy&#39;s &quot;n:prop&quot;      
+  5            4 here    &quot;adv&quot;</code></pre>
+</div>
+<p>In duckdb, the <code>httpfs</code> extension we loaded above allows <code>PARQUET_SCAN</code><a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a> to read a remote parquet file.</p>
 <div class="layout-chunk" data-layout="l-body">
 <div class="sourceCode">
 <pre class="sourceCode r"><code class="sourceCode r"><span><span class='va'>query1</span> <span class='op'>&lt;-</span> <span class='fu'>glue</span><span class='fu'>::</span><span class='fu'><a href='https://glue.tidyverse.org/reference/glue_sql.html'>glue_sql</a></span><span class='op'>(</span><span class='st'>"</span></span>
@@ -1885,7 +1897,7 @@ <h3 id="streaming-with-duckdb">Streaming with <code>{duckdb}</code></h3>
   4            4 Mommy&#39;s         n:prop       1
   5            4    here            adv       1</code></pre>
 </div>
-<p>To do this more programmatically over <em>all</em> (parquet) files under <code>/tokens_data</code> in the repository, we need to transition to using the <a href="https://docs.github.com/en/rest/git/trees">GitHub Trees API</a>. The idea is similar to using the Contents API but now we are requesting a list of all files using the following syntax:</p>
+<p>To do this more programmatically over <em>all</em> parquet files under <code>/tokens_data</code> in the repository, we need to transition to using the <a href="https://docs.github.com/en/rest/git/trees">GitHub Trees API</a>. The idea is similar to using the Contents API but now we are requesting a list of all files using the following syntax:</p>
 <div class="layout-chunk" data-layout="l-body">
 <div class="sourceCode">
 <pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'>gh</span><span class='fu'>::</span><span class='fu'><a href='https://gh.r-lib.org/reference/gh.html'>gh</a></span><span class='op'>(</span><span class='st'>"/repos/{user}/{repo}/git/trees/{branch/tag/commitSHA}?recursive=true"</span><span class='op'>)</span><span class='op'>$</span><span class='va'>tree</span></span></code></pre>
@@ -1897,11 +1909,11 @@ <h3 id="streaming-with-duckdb">Streaming with <code>{duckdb}</code></h3>
 <pre class="sourceCode r"><code class="sourceCode r"><span><span class='va'>files</span> <span class='op'>&lt;-</span> <span class='fu'>gh</span><span class='fu'>::</span><span class='fu'><a href='https://gh.r-lib.org/reference/gh.html'>gh</a></span><span class='op'>(</span><span class='st'>"/repos/yjunechoe/repetition_events/git/trees/master?recursive=true"</span><span class='op'>)</span><span class='op'>$</span><span class='va'>tree</span></span></code></pre>
 </div>
 </div>
-<p>With <code>recursive=true</code>, this returns all files in the repo. We can filter for just the parquet files we want with a little regex:</p>
+<p>With <code>recursive=true</code>, this returns all files in the repo. Then, we can filter for just the parquet files we want with a little regex:</p>
 <div class="layout-chunk" data-layout="l-body">
 <div class="sourceCode">
 <pre class="sourceCode r"><code class="sourceCode r"><span><span class='va'>parquet_files</span> <span class='op'>&lt;-</span> <span class='fu'><a href='https://rdrr.io/r/base/lapply.html'>sapply</a></span><span class='op'>(</span><span class='va'>files</span>, <span class='va'>`[[`</span>, <span class='st'>"path"</span><span class='op'>)</span> <span class='op'>|&gt;</span> </span>
-<span>  <span class='fu'><a href='https://rdrr.io/r/base/grep.html'>grep</a></span><span class='op'>(</span>x <span class='op'>=</span> <span class='va'>_</span>, pattern <span class='op'>=</span> <span class='st'>".*data/tokens_data/.*parquet$"</span>, value <span class='op'>=</span> <span class='cn'>TRUE</span><span class='op'>)</span></span>
+<span>  <span class='fu'><a href='https://rdrr.io/r/base/grep.html'>grep</a></span><span class='op'>(</span>x <span class='op'>=</span> <span class='va'>_</span>, pattern <span class='op'>=</span> <span class='st'>".*/tokens_data/.*parquet$"</span>, value <span class='op'>=</span> <span class='cn'>TRUE</span><span class='op'>)</span></span>
 <span><span class='fu'><a href='https://rdrr.io/r/base/length.html'>length</a></span><span class='op'>(</span><span class='va'>parquet_files</span><span class='op'>)</span></span></code></pre>
 </div>
 <pre><code>  [1] 70</code></pre>
@@ -1931,7 +1943,7 @@ <h3 id="streaming-with-duckdb">Streaming with <code>{duckdb}</code></h3>
   [5] &quot;https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID=13/part-1.parquet&quot;
   [6] &quot;https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID=14/part-2.parquet&quot;</code></pre>
 </div>
-<p>Back on duckdb, we can use <code>PARQUET_SCAN</code> to read <em>multiple</em> files by supplying a vector <code>['file1.parquet', 'file2.parquet', ...]</code>.<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a> This time, we also ask for a quick computation to count the number of distinct <code>childID</code>s:</p>
+<p>Back on duckdb, we can use <code>PARQUET_SCAN</code> to read <em>multiple</em> files by supplying a vector <code>['file1.parquet', 'file2.parquet', ...]</code>.<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a> This time, we also ask for a quick computation to count the number of distinct <code>childID</code>s:</p>
 <div class="layout-chunk" data-layout="l-body">
 <div class="sourceCode">
 <pre class="sourceCode r"><code class="sourceCode r"><span><span class='va'>query3</span> <span class='op'>&lt;-</span> <span class='fu'>glue</span><span class='fu'>::</span><span class='fu'><a href='https://glue.tidyverse.org/reference/glue_sql.html'>glue_sql</a></span><span class='op'>(</span><span class='st'>"</span></span>
@@ -1955,7 +1967,7 @@ <h3 id="streaming-with-duckdb">Streaming with <code>{duckdb}</code></h3>
   1                      70</code></pre>
 </div>
 <p>This returns <code>70</code> which matches the length of the <code>parquet_files</code> vector listing the files that had been partitioned by childID.</p>
-<p>For further analyses, we can <code>CREATE TABLE</code><a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a> our data in our in-memory database <code>con</code>:</p>
+<p>For further analyses, we can <code>CREATE TABLE</code><a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a> our data in our in-memory database <code>con</code>:</p>
 <div class="layout-chunk" data-layout="l-body">
 <div class="sourceCode">
 <pre class="sourceCode r"><code class="sourceCode r"><span><span class='va'>query4</span> <span class='op'>&lt;-</span> <span class='fu'>glue</span><span class='fu'>::</span><span class='fu'><a href='https://glue.tidyverse.org/reference/glue_sql.html'>glue_sql</a></span><span class='op'>(</span><span class='st'>"</span></span>
@@ -2046,21 +2058,22 @@ <h3 id="streaming-with-duckdb">Streaming with <code>{duckdb}</code></h3>
 <h3 id="other-sources-for-data">Other sources for data</h3>
 <p>In writing this blog post, I’m indebted to all the knowledgeable folks on <a href="https://fosstodon.org/@yjunechoe/113040141392861021">Mastodon</a> who suggested their own recommended tools and workflows for various kinds of remote data. Unfortunately, I’m not familiar enough with most of them enough to do them justice, but I still wanted to record the suggestions I got from there for posterity.</p>
 <p>First, a post about reading remote files would not be complete without a mention of the wonderful <a href="https://googlesheets4.tidyverse.org/"><code>{googlesheets4}</code></a> package for reading from Google Sheets. I debated whether I should include a larger discussion of <code>{googlesheets4}</code>, and despite using it quite often myself I ultimately decided to omit it for the sake of space and because the package website is already very comprehensive. I would suggest starting from the <a href="https://googlesheets4.tidyverse.org/articles/googlesheets4.html"><em>Get Started</em></a> vignette if you are new and interested.</p>
-<p>Second, along the lines of <code>{osfr}</code>, there are other similar <a href="https://ropensci.org/">rOpensci</a> packages for retrieving data from the kinds of data sources that may be of interest to academics, such as <a href="https://docs.ropensci.org/deposits/"><code>{deposits}</code></a> for <a href="https://zenodo.org/">zenodo</a> and <a href="https://figshare.com/">figshare</a>, and <a href="https://docs.ropensci.org/piggyback/"><code>{piggyback}</code></a> for GitHub release assets (<a href="https://fosstodon.org/@maelle@mastodon.social/113044065044359603">Maëlle Salmon’s comment</a> pointed me to the first two; I responded with <a href="https://fosstodon.org/@yjunechoe/113045714727018087">some of my experiences</a>). I was also reminded that <a href="https://pins.rstudio.com/"><code>{pins}</code></a> exists - I’m not familiar with it myself so I thought I wouldn’t write anything for it here BUT <a href="https://fosstodon.org/@ivelasq3/113079721335721253">Isabella Velásquez</a> came in clutch with a whole talk on <a href="https://www.youtube.com/watch?v=u2OK8IWJWhk">dynamically loading up-to-date data with {pins}</a> which is a great usecase demo of the unique strength of <code>{pins}</code>.</p>
+<p>Second, along the lines of <code>{osfr}</code>, there are other similar <a href="https://ropensci.org/">rOpensci</a> packages for retrieving data from the kinds of data sources that may be of interest to academics, such as <a href="https://docs.ropensci.org/deposits/"><code>{deposits}</code></a> for <a href="https://zenodo.org/">zenodo</a> and <a href="https://figshare.com/">figshare</a>, and <a href="https://docs.ropensci.org/piggyback/"><code>{piggyback}</code></a> for GitHub release assets (<a href="https://fosstodon.org/@maelle@mastodon.social/113044065044359603">Maëlle Salmon’s comment</a> pointed me to the first two; I responded with <a href="https://fosstodon.org/@yjunechoe/113045714727018087">some of my experiences</a>). I was also reminded that <a href="https://pins.rstudio.com/"><code>{pins}</code></a> exists - I’m not familiar with it myself so I thought I wouldn’t write anything for it here BUT <a href="https://fosstodon.org/@ivelasq3/113079721335721253">Isabella Velásquez</a> came in clutch sharing a recent talk on <a href="https://www.youtube.com/watch?v=u2OK8IWJWhk">dynamically loading up-to-date data with {pins}</a> which is a great demo of the unique strengths of <code>{pins}</code>.</p>
 <p>Lastly, I inadvertently(?) started some discussion around remotely accessing spatial files. I don’t work with spatial data <em>at all</em> but I can totally imagine how the hassle of the traditional click-download-find-load workflow would be even more pronounced for spatial data which are presumably much larger in size and more difficult to preview. On this note, I’ll just link to <a href="https://fosstodon.org/@cboettig@ecoevo.social">Carl Boettiger’s comment</a> about the fact that <a href="https://gdal.org/en/latest/user/virtual_file_systems.html">GDAL has a virtual file system</a> that you can interface with from R packages wrapping this API (ex: <a href="https://usdaforestservice.github.io/gdalraster/">{gdalraster}</a>), and to <a href="https://fosstodon.org/@mdsumner@rstats.me/113041566793211094">Michael Sumner’s comment/gist</a> + <a href="https://fosstodon.org/@ctoney/113043719551668933">Chris Toney’s comment</a> on the fact that you can even use this feature to stream non-spatial data!</p>
 <h3 id="miscellaneous-tips-and-tricks">Miscellaneous tips and tricks</h3>
 <p>I also have some random tricks that are more situational. Unfortunately, I can only recall like 20% of them at any given moment, so I’ll be updating this space as more come back to me:</p>
 <ul>
-<li><p>When reading remote <code>.rda</code> or <code>.RData</code> files with <code>load()</code>, you need to wrap the link in <code>url()</code> first (ref: <a href="https://stackoverflow.com/questions/26108575/loading-rdata-files-from-url">stackoverflow</a>).</p></li>
+<li><p>When reading remote <code>.rda</code> or <code>.RData</code> files with <code>load()</code>, you may need to wrap the link in <code>url()</code> first (ref: <a href="https://stackoverflow.com/questions/26108575/loading-rdata-files-from-url">stackoverflow</a>).</p></li>
 <li><p><a href="https://vroom.r-lib.org/"><code>{vroom}</code></a> can <a href="https://vroom.r-lib.org/articles/vroom.html#reading-remote-files">remotely read gzipped files</a>, without having to <code>download.file()</code> and <code>unzip()</code> first.</p></li>
-<li><p><a href="https://jeroen.cran.dev/curl/"><code>{curl}</code></a>, of course, will always have the most comprehensive set of low-level tools you need to read any arbitrary data remotely. For example, using <code>curl::curl_fetch_memory()</code> to read the <code>dplyr::storms</code> data again from the GitHub raw contents link:</p>
+<li><p><a href="https://jeroen.cran.dev/curl/"><code>{curl}</code></a>, of course, will always have the most comprehensive set of low-level tools you need to read any arbitrary data remotely. For example, using <code>curl::curl_fetch_memory()</code> to read the <code>dplyr::storms</code> data again from the GitHub raw contents link:</p></li>
+</ul>
 <div class="layout-chunk" data-layout="l-body">
 <div class="sourceCode">
 <pre class="sourceCode r"><code class="sourceCode r"><span><span class='va'>fetched</span> <span class='op'>&lt;-</span> <span class='fu'>curl</span><span class='fu'>::</span><span class='fu'><a href='https://rdrr.io/pkg/curl/man/curl_fetch.html'>curl_fetch_memory</a></span><span class='op'>(</span></span>
-  <span><span class='st'>"https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/starwars.csv"</span></span>
-  <span>  <span class='op'>)</span></span>
-  <span>  <span class='fu'><a href='https://rdrr.io/r/utils/read.table.html'>read.csv</a></span><span class='op'>(</span>text <span class='op'>=</span> <span class='fu'><a href='https://rdrr.io/r/base/rawConversion.html'>rawToChar</a></span><span class='op'>(</span><span class='va'>fetched</span><span class='op'>$</span><span class='va'>content</span><span class='op'>)</span><span class='op'>)</span> <span class='op'>|&gt;</span> </span>
-  <span><span class='fu'>dplyr</span><span class='fu'>::</span><span class='fu'><a href='https://pillar.r-lib.org/reference/glimpse.html'>glimpse</a></span><span class='op'>(</span><span class='op'>)</span></span></code></pre>
+<span>  <span class='st'>"https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/starwars.csv"</span></span>
+<span><span class='op'>)</span></span>
+<span><span class='fu'><a href='https://rdrr.io/r/utils/read.table.html'>read.csv</a></span><span class='op'>(</span>text <span class='op'>=</span> <span class='fu'><a href='https://rdrr.io/r/base/rawConversion.html'>rawToChar</a></span><span class='op'>(</span><span class='va'>fetched</span><span class='op'>$</span><span class='va'>content</span><span class='op'>)</span><span class='op'>)</span> <span class='op'>|&gt;</span> </span>
+<span>  <span class='fu'>dplyr</span><span class='fu'>::</span><span class='fu'><a href='https://pillar.r-lib.org/reference/glimpse.html'>glimpse</a></span><span class='op'>(</span><span class='op'>)</span></span></code></pre>
 </div>
 <pre><code>  Rows: 87
   Columns: 14
@@ -2079,7 +2092,8 @@ <h3 id="miscellaneous-tips-and-tricks">Miscellaneous tips and tricks</h3>
   $ vehicles   &lt;chr&gt; &quot;Snowspeeder, Imperial Speeder Bike&quot;, &quot;&quot;, &quot;&quot;, &quot;&quot;, &quot;Imperial…
   $ starships  &lt;chr&gt; &quot;X-wing, Imperial shuttle&quot;, &quot;&quot;, &quot;&quot;, &quot;TIE Advanced x1&quot;, &quot;&quot;, …</code></pre>
 </div>
-<p>And even if you’re going the route of downloading the file first, <code>curl::multi_download()</code> can offer big performance improvements over <code>download.file()</code>.[^See an example implemented for <a href="https://github.com/ropensci/openalexR/pull/63"><code>{openalexR}</code></a>, an API package.] Many <code>{curl}</code> functions also take a <code>retry</code> parameter in some form which is cool too.</p></li>
+<ul>
+<li><p>Even if you’re going the route of downloading the file first, <code>curl::multi_download()</code> can offer big performance improvements over <code>download.file()</code>.<a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a> Many <code>{curl}</code> functions can also handle <a href="https://fosstodon.org/@eliocamp@mastodon.social/111885424355264237">retries and stop/resumes</a> which is cool too.</p></li>
 <li><p><a href="https://httr2.r-lib.org/"><code>{httr2}</code></a> can capture a <em>continuous data stream</em> with <code>httr2::req_perform_stream()</code> up to a set time or size.</p></li>
 </ul>
 <h2 id="sessioninfo">sessionInfo()</h2>
@@ -2131,16 +2145,18 @@ <h2 id="sessioninfo">sessionInfo()</h2>
 <div class="layout-chunk" data-layout="l-body">
 
 </div>
-<div class="sourceCode" id="cb26"><pre class="sourceCode r distill-force-highlighting-css"><code class="sourceCode r"></code></pre></div>
+<div class="sourceCode" id="cb27"><pre class="sourceCode r distill-force-highlighting-css"><code class="sourceCode r"></code></pre></div>
 <section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
 <hr />
 <ol>
-<li id="fn1"><p>Note that the API will actually generate a <em>new</em> token every time you send a request (and again, these tokens will expire with time).<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
-<li id="fn2"><p>The special value <code>"clipboard"</code> works for most base-R read functions that take a <code>file</code> or <code>con</code> argument.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
-<li id="fn3"><p>Thanks <a href="https://fosstodon.org/@coolbutuseless/113042231377588589"><span class="citation" data-cites="coolbutuseless">@coolbutuseless</span></a> for pointing me to <code>textConnection()</code>!<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
-<li id="fn4"><p>Or <code>READ_PARQUET</code> - <a href="https://duckdb.org/docs/data/parquet/overview.html#read_parquet-function">same thing</a>.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
-<li id="fn5"><p>We can also get this formatting with a combination of <code>shQuote()</code> and <code>toString()</code>.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
-<li id="fn6"><p>Whereas <code>CREATE TABLE</code> results in a physical copy of the data in memory, <code>CREATE VIEW</code> will dynamically fetch the data from the source every time you query the table. If the data fits into memory (as in this case), I prefer <code>CREATE</code> as queries will be much faster (though you pay up-front for the time copying the data). If the data is larger than memory, <code>CREATE VIEW</code> will be your only option.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn1"><p>Thanks <a href="https://fosstodon.org/@tanho"><span class="citation" data-cites="tanho">@tanho</span></a> for pointing me to this at the <a href="https://fosstodon.org/@DSLC">R4DS/DSLC</a> slack.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn2"><p>Note that the API will actually generate a <em>new</em> token every time you send a request (and again, these tokens will expire with time).<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn3"><p>The special value <code>"clipboard"</code> works for most base-R read functions that take a <code>file</code> or <code>con</code> argument.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn4"><p>Thanks <a href="https://fosstodon.org/@coolbutuseless/113042231377588589"><span class="citation" data-cites="coolbutuseless">@coolbutuseless</span></a> for pointing me to <code>textConnection()</code>!<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn5"><p>Or <code>READ_PARQUET</code> - <a href="https://duckdb.org/docs/data/parquet/overview.html#read_parquet-function">same thing</a>.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn6"><p>We can also get this formatting with a combination of <code>shQuote()</code> and <code>toString()</code>.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn7"><p>Whereas <code>CREATE TABLE</code> results in a physical copy of the data in memory, <code>CREATE VIEW</code> will dynamically fetch the data from the source every time you query the table. If the data fits into memory (as in this case), I prefer <code>CREATE</code> as queries will be much faster (though you pay up-front for the time copying the data). If the data is larger than memory, <code>CREATE VIEW</code> will be your only option.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn8"><p>See an example implemented for <a href="https://github.com/ropensci/openalexR/pull/63"><code>{openalexR}</code></a>, an API package.<a href="#fnref8" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
 </ol>
 </section>
 <!--radix_placeholder_article_footer-->
diff --git a/_posts/2024-09-01-fetch-files-web/fetch-files-web_files/anchor-4.2.2/anchor.min.js b/_posts/2024-09-22-fetch-files-web/fetch-files-web_files/anchor-4.2.2/anchor.min.js
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/fetch-files-web_files/anchor-4.2.2/anchor.min.js
rename to _posts/2024-09-22-fetch-files-web/fetch-files-web_files/anchor-4.2.2/anchor.min.js
diff --git a/_posts/2024-09-01-fetch-files-web/fetch-files-web_files/bowser-1.9.3/bowser.min.js b/_posts/2024-09-22-fetch-files-web/fetch-files-web_files/bowser-1.9.3/bowser.min.js
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/fetch-files-web_files/bowser-1.9.3/bowser.min.js
rename to _posts/2024-09-22-fetch-files-web/fetch-files-web_files/bowser-1.9.3/bowser.min.js
diff --git a/_posts/2024-09-01-fetch-files-web/fetch-files-web_files/distill-2.2.21/template.v2.js b/_posts/2024-09-22-fetch-files-web/fetch-files-web_files/distill-2.2.21/template.v2.js
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/fetch-files-web_files/distill-2.2.21/template.v2.js
rename to _posts/2024-09-22-fetch-files-web/fetch-files-web_files/distill-2.2.21/template.v2.js
diff --git a/_posts/2024-09-01-fetch-files-web/fetch-files-web_files/header-attrs-2.27/header-attrs.js b/_posts/2024-09-22-fetch-files-web/fetch-files-web_files/header-attrs-2.27/header-attrs.js
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/fetch-files-web_files/header-attrs-2.27/header-attrs.js
rename to _posts/2024-09-22-fetch-files-web/fetch-files-web_files/header-attrs-2.27/header-attrs.js
diff --git a/_posts/2024-09-01-fetch-files-web/fetch-files-web_files/jquery-3.6.0/jquery-3.6.0.js b/_posts/2024-09-22-fetch-files-web/fetch-files-web_files/jquery-3.6.0/jquery-3.6.0.js
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/fetch-files-web_files/jquery-3.6.0/jquery-3.6.0.js
rename to _posts/2024-09-22-fetch-files-web/fetch-files-web_files/jquery-3.6.0/jquery-3.6.0.js
diff --git a/_posts/2024-09-01-fetch-files-web/fetch-files-web_files/jquery-3.6.0/jquery-3.6.0.min.js b/_posts/2024-09-22-fetch-files-web/fetch-files-web_files/jquery-3.6.0/jquery-3.6.0.min.js
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/fetch-files-web_files/jquery-3.6.0/jquery-3.6.0.min.js
rename to _posts/2024-09-22-fetch-files-web/fetch-files-web_files/jquery-3.6.0/jquery-3.6.0.min.js
diff --git a/_posts/2024-09-01-fetch-files-web/fetch-files-web_files/jquery-3.6.0/jquery-3.6.0.min.map b/_posts/2024-09-22-fetch-files-web/fetch-files-web_files/jquery-3.6.0/jquery-3.6.0.min.map
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/fetch-files-web_files/jquery-3.6.0/jquery-3.6.0.min.map
rename to _posts/2024-09-22-fetch-files-web/fetch-files-web_files/jquery-3.6.0/jquery-3.6.0.min.map
diff --git a/_posts/2024-09-01-fetch-files-web/fetch-files-web_files/popper-2.6.0/popper.min.js b/_posts/2024-09-22-fetch-files-web/fetch-files-web_files/popper-2.6.0/popper.min.js
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/fetch-files-web_files/popper-2.6.0/popper.min.js
rename to _posts/2024-09-22-fetch-files-web/fetch-files-web_files/popper-2.6.0/popper.min.js
diff --git a/_posts/2024-09-01-fetch-files-web/fetch-files-web_files/tippy-6.2.7/tippy-bundle.umd.min.js b/_posts/2024-09-22-fetch-files-web/fetch-files-web_files/tippy-6.2.7/tippy-bundle.umd.min.js
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/fetch-files-web_files/tippy-6.2.7/tippy-bundle.umd.min.js
rename to _posts/2024-09-22-fetch-files-web/fetch-files-web_files/tippy-6.2.7/tippy-bundle.umd.min.js
diff --git a/_posts/2024-09-01-fetch-files-web/fetch-files-web_files/tippy-6.2.7/tippy-light-border.css b/_posts/2024-09-22-fetch-files-web/fetch-files-web_files/tippy-6.2.7/tippy-light-border.css
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/fetch-files-web_files/tippy-6.2.7/tippy-light-border.css
rename to _posts/2024-09-22-fetch-files-web/fetch-files-web_files/tippy-6.2.7/tippy-light-border.css
diff --git a/_posts/2024-09-01-fetch-files-web/fetch-files-web_files/tippy-6.2.7/tippy.css b/_posts/2024-09-22-fetch-files-web/fetch-files-web_files/tippy-6.2.7/tippy.css
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/fetch-files-web_files/tippy-6.2.7/tippy.css
rename to _posts/2024-09-22-fetch-files-web/fetch-files-web_files/tippy-6.2.7/tippy.css
diff --git a/_posts/2024-09-01-fetch-files-web/fetch-files-web_files/tippy-6.2.7/tippy.umd.min.js b/_posts/2024-09-22-fetch-files-web/fetch-files-web_files/tippy-6.2.7/tippy.umd.min.js
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/fetch-files-web_files/tippy-6.2.7/tippy.umd.min.js
rename to _posts/2024-09-22-fetch-files-web/fetch-files-web_files/tippy-6.2.7/tippy.umd.min.js
diff --git a/_posts/2024-09-01-fetch-files-web/fetch-files-web_files/webcomponents-2.0.0/webcomponents.js b/_posts/2024-09-22-fetch-files-web/fetch-files-web_files/webcomponents-2.0.0/webcomponents.js
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/fetch-files-web_files/webcomponents-2.0.0/webcomponents.js
rename to _posts/2024-09-22-fetch-files-web/fetch-files-web_files/webcomponents-2.0.0/webcomponents.js
diff --git a/_posts/2024-09-01-fetch-files-web/github-dplyr-starwars-csv.jpg b/_posts/2024-09-22-fetch-files-web/github-dplyr-starwars-csv.jpg
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/github-dplyr-starwars-csv.jpg
rename to _posts/2024-09-22-fetch-files-web/github-dplyr-starwars-csv.jpg
diff --git a/_posts/2024-09-01-fetch-files-web/github-dplyr-starwars-raw.jpg b/_posts/2024-09-22-fetch-files-web/github-dplyr-starwars-raw.jpg
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/github-dplyr-starwars-raw.jpg
rename to _posts/2024-09-22-fetch-files-web/github-dplyr-starwars-raw.jpg
diff --git a/_posts/2024-09-01-fetch-files-web/github-dplyr-starwars.jpg b/_posts/2024-09-22-fetch-files-web/github-dplyr-starwars.jpg
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/github-dplyr-starwars.jpg
rename to _posts/2024-09-22-fetch-files-web/github-dplyr-starwars.jpg
diff --git a/_posts/2024-09-01-fetch-files-web/github-gist-stroop.jpg b/_posts/2024-09-22-fetch-files-web/github-gist-stroop.jpg
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/github-gist-stroop.jpg
rename to _posts/2024-09-22-fetch-files-web/github-gist-stroop.jpg
diff --git a/_posts/2024-09-01-fetch-files-web/osf-MixedModels-dyestuff-download.jpg b/_posts/2024-09-22-fetch-files-web/osf-MixedModels-dyestuff-download.jpg
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/osf-MixedModels-dyestuff-download.jpg
rename to _posts/2024-09-22-fetch-files-web/osf-MixedModels-dyestuff-download.jpg
diff --git a/_posts/2024-09-01-fetch-files-web/osf-MixedModels-dyestuff.jpg b/_posts/2024-09-22-fetch-files-web/osf-MixedModels-dyestuff.jpg
similarity index 100%
rename from _posts/2024-09-01-fetch-files-web/osf-MixedModels-dyestuff.jpg
rename to _posts/2024-09-22-fetch-files-web/osf-MixedModels-dyestuff.jpg
diff --git a/docs/blog.html b/docs/blog.html
index e61d3dc8..3daf1b4e 100644
--- a/docs/blog.html
+++ b/docs/blog.html
@@ -2784,6 +2784,22 @@ <h3>${suggestion.title}</h3>
 <div class="posts-container posts-with-sidebar posts-apply-limit l-screen-inset">
 <div class="posts-list">
 <h1 class="posts-list-caption" data-caption="Blog Posts">Blog Posts</h1>
+<a href="posts/2024-09-22-fetch-files-web/" class="post-preview">
+<script class="post-metadata" type="text/json">{"categories":["tutorial"]}</script>
+<div class="metadata">
+<div class="publishedDate">Sept. 22, 2024</div>
+</div>
+<div class="thumbnail">
+<img src="posts/2024-09-22-fetch-files-web/github-dplyr-starwars.jpg"/>
+</div>
+<div class="description">
+<h2>Read files on the web into R</h2>
+<div class="dt-tags">
+<div class="dt-tag">tutorial</div>
+</div>
+<p>For the download-button-averse of us</p>
+</div>
+</a>
 <a href="posts/2024-07-21-enumerate-possible-options/" class="post-preview">
 <script class="post-metadata" type="text/json">{"categories":["design"]}</script>
 <div class="metadata">
@@ -2855,7 +2871,7 @@ <h2>HelloWorld("print")</h2>
 <div class="publishedDate">Dec. 31, 2023</div>
 </div>
 <div class="thumbnail">
-<img src="posts/2023-12-31-2023-year-in-review/preview.png"/>
+<img data-src="posts/2023-12-31-2023-year-in-review/preview.png"/>
 </div>
 <div class="description">
 <h2>2023 Year in Review</h2>
@@ -3411,7 +3427,7 @@ <h3>Categories</h3>
 <ul>
 <li>
 <a href="#category:Articles">Articles</a>
-<span class="category-count">(35)</span>
+<span class="category-count">(36)</span>
 </li>
 <li>
 <a href="#category:args">args</a>
@@ -3531,7 +3547,7 @@ <h3>Categories</h3>
 </li>
 <li>
 <a href="#category:tutorial">tutorial</a>
-<span class="category-count">(8)</span>
+<span class="category-count">(9)</span>
 </li>
 <li>
 <a href="#category:typography">typography</a>
diff --git a/docs/blog.xml b/docs/blog.xml
index b20bc019..c5ad6754 100644
--- a/docs/blog.xml
+++ b/docs/blog.xml
@@ -12,7 +12,17 @@
       <link>https://yjunechoe.github.io</link>
     </image>
     <generator>Distill</generator>
-    <lastBuildDate>Sun, 21 Jul 2024 00:00:00 +0000</lastBuildDate>
+    <lastBuildDate>Sun, 22 Sep 2024 00:00:00 +0000</lastBuildDate>
+    <item>
+      <title>Read files on the web into R</title>
+      <dc:creator>June Choe</dc:creator>
+      <link>https://yjunechoe.github.io/posts/2024-09-22-fetch-files-web</link>
+      <description>For the download-button-averse of us</description>
+      <category>tutorial</category>
+      <guid>https://yjunechoe.github.io/posts/2024-09-22-fetch-files-web</guid>
+      <pubDate>Sun, 22 Sep 2024 00:00:00 +0000</pubDate>
+      <media:content url="https://yjunechoe.github.io/posts/2024-09-22-fetch-files-web/github-dplyr-starwars.jpg" medium="image" type="image/jpeg"/>
+    </item>
     <item>
       <title>Naming patterns for boolean enums</title>
       <dc:creator>June Choe</dc:creator>
diff --git a/docs/posts/2024-09-22-fetch-files-web/github-dplyr-starwars-csv.jpg b/docs/posts/2024-09-22-fetch-files-web/github-dplyr-starwars-csv.jpg
new file mode 100644
index 00000000..cff11217
Binary files /dev/null and b/docs/posts/2024-09-22-fetch-files-web/github-dplyr-starwars-csv.jpg differ
diff --git a/docs/posts/2024-09-22-fetch-files-web/github-dplyr-starwars-raw.jpg b/docs/posts/2024-09-22-fetch-files-web/github-dplyr-starwars-raw.jpg
new file mode 100644
index 00000000..1b12043d
Binary files /dev/null and b/docs/posts/2024-09-22-fetch-files-web/github-dplyr-starwars-raw.jpg differ
diff --git a/docs/posts/2024-09-22-fetch-files-web/github-dplyr-starwars.jpg b/docs/posts/2024-09-22-fetch-files-web/github-dplyr-starwars.jpg
new file mode 100644
index 00000000..f455dcd0
Binary files /dev/null and b/docs/posts/2024-09-22-fetch-files-web/github-dplyr-starwars.jpg differ
diff --git a/docs/posts/2024-09-22-fetch-files-web/github-gist-stroop.jpg b/docs/posts/2024-09-22-fetch-files-web/github-gist-stroop.jpg
new file mode 100644
index 00000000..80fec00c
Binary files /dev/null and b/docs/posts/2024-09-22-fetch-files-web/github-gist-stroop.jpg differ
diff --git a/docs/posts/2024-09-22-fetch-files-web/index.html b/docs/posts/2024-09-22-fetch-files-web/index.html
new file mode 100644
index 00000000..68928f2b
--- /dev/null
+++ b/docs/posts/2024-09-22-fetch-files-web/index.html
@@ -0,0 +1,3332 @@
+<!DOCTYPE html>
+
+<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
+
+<head>
+  <meta charset="utf-8"/>
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1"/>
+  <meta name="generator" content="distill" />
+
+  <style type="text/css">
+  /* Hide doc at startup (prevent jankiness while JS renders/transforms) */
+  body {
+    visibility: hidden;
+  }
+  </style>
+
+ <!--radix_placeholder_import_source-->
+ <!--/radix_placeholder_import_source-->
+
+<style type="text/css">code{white-space: pre;}</style>
+<style type="text/css" data-origin="pandoc">
+pre > code.sourceCode { white-space: pre; position: relative; }
+pre > code.sourceCode > span { line-height: 1.25; }
+pre > code.sourceCode > span:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode > span { color: inherit; text-decoration: inherit; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+pre > code.sourceCode { white-space: pre-wrap; }
+pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; }
+}
+pre.numberSource code
+  { counter-reset: source-line 0; }
+pre.numberSource code > span
+  { position: relative; left: -4em; counter-increment: source-line; }
+pre.numberSource code > span > a:first-child::before
+  { content: counter(source-line);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  { color: #00769e; background-color: #f1f3f5; }
+@media screen {
+pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
+}
+code span { color: #00769e; } /* Normal */
+code span.al { color: #ad0000; } /* Alert */
+code span.an { color: #5e5e5e; } /* Annotation */
+code span.at { color: #657422; } /* Attribute */
+code span.bn { color: #ad0000; } /* BaseN */
+code span.bu { } /* BuiltIn */
+code span.cf { color: #00769e; } /* ControlFlow */
+code span.ch { color: #20794d; } /* Char */
+code span.cn { color: #8f5902; } /* Constant */
+code span.co { color: #5e5e5e; } /* Comment */
+code span.cv { color: #5e5e5e; font-style: italic; } /* CommentVar */
+code span.do { color: #5e5e5e; font-style: italic; } /* Documentation */
+code span.dt { color: #ad0000; } /* DataType */
+code span.dv { color: #ad0000; } /* DecVal */
+code span.er { color: #ad0000; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #ad0000; } /* Float */
+code span.fu { color: #4758ab; } /* Function */
+code span.im { } /* Import */
+code span.in { color: #5e5e5e; } /* Information */
+code span.kw { color: #00769e; } /* Keyword */
+code span.op { color: #5e5e5e; } /* Operator */
+code span.ot { color: #00769e; } /* Other */
+code span.pp { color: #ad0000; } /* Preprocessor */
+code span.sc { color: #5e5e5e; } /* SpecialChar */
+code span.ss { color: #20794d; } /* SpecialString */
+code span.st { color: #20794d; } /* String */
+code span.va { color: #111111; } /* Variable */
+code span.vs { color: #20794d; } /* VerbatimString */
+code span.wa { color: #5e5e5e; font-style: italic; } /* Warning */
+</style>
+
+
+  <!--radix_placeholder_meta_tags-->
+<title>June Choe: Read files on the web into R</title>
+
+<meta property="description" itemprop="description" content="For the download-button-averse of us"/>
+
+<link rel="canonical" href="https://yjunechoe.github.io/posts/2024-09-22-fetch-files-web/"/>
+<link rel="icon" type="image/png" href="../../static/img/icon.png"/>
+
+<!--  https://schema.org/Article -->
+<meta property="article:published" itemprop="datePublished" content="2024-09-22"/>
+<meta property="article:created" itemprop="dateCreated" content="2024-09-22"/>
+<meta name="article:author" content="June Choe"/>
+
+<!--  https://developers.facebook.com/docs/sharing/webmasters#markup -->
+<meta property="og:title" content="June Choe: Read files on the web into R"/>
+<meta property="og:type" content="article"/>
+<meta property="og:description" content="For the download-button-averse of us"/>
+<meta property="og:url" content="https://yjunechoe.github.io/posts/2024-09-22-fetch-files-web/"/>
+<meta property="og:image" content="https://yjunechoe.github.io/posts/2024-09-22-fetch-files-web/github-dplyr-starwars.jpg"/>
+<meta property="og:locale" content="en_US"/>
+<meta property="og:site_name" content="June Choe"/>
+
+<!--  https://dev.twitter.com/cards/types/summary -->
+<meta property="twitter:card" content="summary_large_image"/>
+<meta property="twitter:title" content="June Choe: Read files on the web into R"/>
+<meta property="twitter:description" content="For the download-button-averse of us"/>
+<meta property="twitter:url" content="https://yjunechoe.github.io/posts/2024-09-22-fetch-files-web/"/>
+<meta property="twitter:image" content="https://yjunechoe.github.io/posts/2024-09-22-fetch-files-web/github-dplyr-starwars.jpg"/>
+<meta property="twitter:site" content="@yjunechoe"/>
+
+<!--/radix_placeholder_meta_tags-->
+  <!--radix_placeholder_rmarkdown_metadata-->
+
+<script type="text/json" id="radix-rmarkdown-metadata">
+{"type":"list","attributes":{"names":{"type":"character","attributes":{},"value":["title","description","categories","base_url","author","date","output","editor_options","preview","canonical_url"]}},"value":[{"type":"character","attributes":{},"value":["Read files on the web into R"]},{"type":"character","attributes":{},"value":["For the download-button-averse of us"]},{"type":"character","attributes":{},"value":["tutorial"]},{"type":"character","attributes":{},"value":["https://yjunechoe.github.io"]},{"type":"list","attributes":{},"value":[{"type":"list","attributes":{"names":{"type":"character","attributes":{},"value":["name","affiliation","affiliation_url","orcid_id"]}},"value":[{"type":"character","attributes":{},"value":["June Choe"]},{"type":"character","attributes":{},"value":["University of Pennsylvania Linguistics"]},{"type":"character","attributes":{},"value":["https://live-sas-www-ling.pantheon.sas.upenn.edu/"]},{"type":"character","attributes":{},"value":["0000-0002-0701-921X"]}]}]},{"type":"character","attributes":{},"value":["09-22-2024"]},{"type":"list","attributes":{"names":{"type":"character","attributes":{},"value":["distill::distill_article"]}},"value":[{"type":"list","attributes":{"names":{"type":"character","attributes":{},"value":["include-after-body","toc","self_contained","css"]}},"value":[{"type":"character","attributes":{},"value":["highlighting.html"]},{"type":"logical","attributes":{},"value":[true]},{"type":"logical","attributes":{},"value":[false]},{"type":"character","attributes":{},"value":["../../styles.css"]}]}]},{"type":"list","attributes":{"names":{"type":"character","attributes":{},"value":["chunk_output_type"]}},"value":[{"type":"character","attributes":{},"value":["console"]}]},{"type":"character","attributes":{},"value":["github-dplyr-starwars.jpg"]},{"type":"character","attributes":{},"value":["https://yjunechoe.github.io/posts/2024-09-22-fetch-files-web/"]}]}
+</script>
+<!--/radix_placeholder_rmarkdown_metadata-->
+  
+  <script type="text/json" id="radix-resource-manifest">
+  {"type":"character","attributes":{},"value":["fetch-files-web_files/anchor-4.2.2/anchor.min.js","fetch-files-web_files/bowser-1.9.3/bowser.min.js","fetch-files-web_files/distill-2.2.21/template.v2.js","fetch-files-web_files/header-attrs-2.27/header-attrs.js","fetch-files-web_files/jquery-3.6.0/jquery-3.6.0.js","fetch-files-web_files/jquery-3.6.0/jquery-3.6.0.min.js","fetch-files-web_files/jquery-3.6.0/jquery-3.6.0.min.map","fetch-files-web_files/popper-2.6.0/popper.min.js","fetch-files-web_files/tippy-6.2.7/tippy-bundle.umd.min.js","fetch-files-web_files/tippy-6.2.7/tippy-light-border.css","fetch-files-web_files/tippy-6.2.7/tippy.css","fetch-files-web_files/tippy-6.2.7/tippy.umd.min.js","fetch-files-web_files/webcomponents-2.0.0/webcomponents.js","github-dplyr-starwars-csv.jpg","github-dplyr-starwars-raw.jpg","github-dplyr-starwars.jpg","github-gist-stroop.jpg","osf-MixedModels-dyestuff-download.jpg","osf-MixedModels-dyestuff.jpg"]}
+  </script>
+  <!--radix_placeholder_navigation_in_header-->
+<meta name="distill:offset" content="../.."/>
+
+<script type="application/javascript">
+
+  window.headroom_prevent_pin = false;
+
+  window.document.addEventListener("DOMContentLoaded", function (event) {
+
+    // initialize headroom for banner
+    var header = $('header').get(0);
+    var headerHeight = header.offsetHeight;
+    var headroom = new Headroom(header, {
+      tolerance: 5,
+      onPin : function() {
+        if (window.headroom_prevent_pin) {
+          window.headroom_prevent_pin = false;
+          headroom.unpin();
+        }
+      }
+    });
+    headroom.init();
+    if(window.location.hash)
+      headroom.unpin();
+    $(header).addClass('headroom--transition');
+
+    // offset scroll location for banner on hash change
+    // (see: https://github.com/WickyNilliams/headroom.js/issues/38)
+    window.addEventListener("hashchange", function(event) {
+      window.scrollTo(0, window.pageYOffset - (headerHeight + 25));
+    });
+
+    // responsive menu
+    $('.distill-site-header').each(function(i, val) {
+      var topnav = $(this);
+      var toggle = topnav.find('.nav-toggle');
+      toggle.on('click', function() {
+        topnav.toggleClass('responsive');
+      });
+    });
+
+    // nav dropdowns
+    $('.nav-dropbtn').click(function(e) {
+      $(this).next('.nav-dropdown-content').toggleClass('nav-dropdown-active');
+      $(this).parent().siblings('.nav-dropdown')
+         .children('.nav-dropdown-content').removeClass('nav-dropdown-active');
+    });
+    $("body").click(function(e){
+      $('.nav-dropdown-content').removeClass('nav-dropdown-active');
+    });
+    $(".nav-dropdown").click(function(e){
+      e.stopPropagation();
+    });
+  });
+</script>
+
+<style type="text/css">
+
+/* Theme (user-documented overrideables for nav appearance) */
+
+.distill-site-nav {
+  color: rgba(255, 255, 255, 0.8);
+  background-color: #0F2E3D;
+  font-size: 15px;
+  font-weight: 300;
+}
+
+.distill-site-nav a {
+  color: inherit;
+  text-decoration: none;
+}
+
+.distill-site-nav a:hover {
+  color: white;
+}
+
+@media print {
+  .distill-site-nav {
+    display: none;
+  }
+}
+
+.distill-site-header {
+
+}
+
+.distill-site-footer {
+
+}
+
+
+/* Site Header */
+
+.distill-site-header {
+  width: 100%;
+  box-sizing: border-box;
+  z-index: 3;
+}
+
+.distill-site-header .nav-left {
+  display: inline-block;
+  margin-left: 8px;
+}
+
+@media screen and (max-width: 768px) {
+  .distill-site-header .nav-left {
+    margin-left: 0;
+  }
+}
+
+
+.distill-site-header .nav-right {
+  float: right;
+  margin-right: 8px;
+}
+
+.distill-site-header a,
+.distill-site-header .title {
+  display: inline-block;
+  text-align: center;
+  padding: 14px 10px 14px 10px;
+}
+
+.distill-site-header .title {
+  font-size: 18px;
+  min-width: 150px;
+}
+
+.distill-site-header .logo {
+  padding: 0;
+}
+
+.distill-site-header .logo img {
+  display: none;
+  max-height: 20px;
+  width: auto;
+  margin-bottom: -4px;
+}
+
+.distill-site-header .nav-image img {
+  max-height: 18px;
+  width: auto;
+  display: inline-block;
+  margin-bottom: -3px;
+}
+
+
+
+@media screen and (min-width: 1000px) {
+  .distill-site-header .logo img {
+    display: inline-block;
+  }
+  .distill-site-header .nav-left {
+    margin-left: 20px;
+  }
+  .distill-site-header .nav-right {
+    margin-right: 20px;
+  }
+  .distill-site-header .title {
+    padding-left: 12px;
+  }
+}
+
+
+.distill-site-header .nav-toggle {
+  display: none;
+}
+
+.nav-dropdown {
+  display: inline-block;
+  position: relative;
+}
+
+.nav-dropdown .nav-dropbtn {
+  border: none;
+  outline: none;
+  color: rgba(255, 255, 255, 0.8);
+  padding: 16px 10px;
+  background-color: transparent;
+  font-family: inherit;
+  font-size: inherit;
+  font-weight: inherit;
+  margin: 0;
+  margin-top: 1px;
+  z-index: 2;
+}
+
+.nav-dropdown-content {
+  display: none;
+  position: absolute;
+  background-color: white;
+  min-width: 200px;
+  border: 1px solid rgba(0,0,0,0.15);
+  border-radius: 4px;
+  box-shadow: 0px 8px 16px 0px rgba(0,0,0,0.1);
+  z-index: 1;
+  margin-top: 2px;
+  white-space: nowrap;
+  padding-top: 4px;
+  padding-bottom: 4px;
+}
+
+.nav-dropdown-content hr {
+  margin-top: 4px;
+  margin-bottom: 4px;
+  border: none;
+  border-bottom: 1px solid rgba(0, 0, 0, 0.1);
+}
+
+.nav-dropdown-active {
+  display: block;
+}
+
+.nav-dropdown-content a, .nav-dropdown-content .nav-dropdown-header {
+  color: black;
+  padding: 6px 24px;
+  text-decoration: none;
+  display: block;
+  text-align: left;
+}
+
+.nav-dropdown-content .nav-dropdown-header {
+  display: block;
+  padding: 5px 24px;
+  padding-bottom: 0;
+  text-transform: uppercase;
+  font-size: 14px;
+  color: #999999;
+  white-space: nowrap;
+}
+
+.nav-dropdown:hover .nav-dropbtn {
+  color: white;
+}
+
+.nav-dropdown-content a:hover {
+  background-color: #ddd;
+  color: black;
+}
+
+.nav-right .nav-dropdown-content {
+  margin-left: -45%;
+  right: 0;
+}
+
+@media screen and (max-width: 768px) {
+  .distill-site-header a, .distill-site-header .nav-dropdown  {display: none;}
+  .distill-site-header a.nav-toggle {
+    float: right;
+    display: block;
+  }
+  .distill-site-header .title {
+    margin-left: 0;
+  }
+  .distill-site-header .nav-right {
+    margin-right: 0;
+  }
+  .distill-site-header {
+    overflow: hidden;
+  }
+  .nav-right .nav-dropdown-content {
+    margin-left: 0;
+  }
+}
+
+
+@media screen and (max-width: 768px) {
+  .distill-site-header.responsive {position: relative; min-height: 500px; }
+  .distill-site-header.responsive a.nav-toggle {
+    position: absolute;
+    right: 0;
+    top: 0;
+  }
+  .distill-site-header.responsive a,
+  .distill-site-header.responsive .nav-dropdown {
+    display: block;
+    text-align: left;
+  }
+  .distill-site-header.responsive .nav-left,
+  .distill-site-header.responsive .nav-right {
+    width: 100%;
+  }
+  .distill-site-header.responsive .nav-dropdown {float: none;}
+  .distill-site-header.responsive .nav-dropdown-content {position: relative;}
+  .distill-site-header.responsive .nav-dropdown .nav-dropbtn {
+    display: block;
+    width: 100%;
+    text-align: left;
+  }
+}
+
+/* Site Footer */
+
+.distill-site-footer {
+  width: 100%;
+  overflow: hidden;
+  box-sizing: border-box;
+  z-index: 3;
+  margin-top: 30px;
+  padding-top: 30px;
+  padding-bottom: 30px;
+  text-align: center;
+}
+
+/* Headroom */
+
+d-title {
+  padding-top: 6rem;
+}
+
+@media print {
+  d-title {
+    padding-top: 4rem;
+  }
+}
+
+.headroom {
+  z-index: 1000;
+  position: fixed;
+  top: 0;
+  left: 0;
+  right: 0;
+}
+
+.headroom--transition {
+  transition: all .4s ease-in-out;
+}
+
+.headroom--unpinned {
+  top: -100px;
+}
+
+.headroom--pinned {
+  top: 0;
+}
+
+/* adjust viewport for navbar height */
+/* helps vertically center bootstrap (non-distill) content */
+.min-vh-100 {
+  min-height: calc(100vh - 100px) !important;
+}
+
+</style>
+
+<script src="../../site_libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
+<link href="../../site_libs/font-awesome-6.4.2/css/all.min.css" rel="stylesheet"/>
+<link href="../../site_libs/font-awesome-6.4.2/css/v4-shims.min.css" rel="stylesheet"/>
+<script src="../../site_libs/headroom-0.9.4/headroom.min.js"></script>
+<script src="../../site_libs/autocomplete-0.37.1/autocomplete.min.js"></script>
+<script src="../../site_libs/fuse-6.4.1/fuse.min.js"></script>
+
+<script type="application/javascript">
+
+function getMeta(metaName) {
+  var metas = document.getElementsByTagName('meta');
+  for (let i = 0; i < metas.length; i++) {
+    if (metas[i].getAttribute('name') === metaName) {
+      return metas[i].getAttribute('content');
+    }
+  }
+  return '';
+}
+
+function offsetURL(url) {
+  var offset = getMeta('distill:offset');
+  return offset ? offset + '/' + url : url;
+}
+
+function createFuseIndex() {
+
+  // create fuse index
+  var options = {
+    keys: [
+      { name: 'title', weight: 20 },
+      { name: 'categories', weight: 15 },
+      { name: 'description', weight: 10 },
+      { name: 'contents', weight: 5 },
+    ],
+    ignoreLocation: true,
+    threshold: 0
+  };
+  var fuse = new window.Fuse([], options);
+
+  // fetch the main search.json
+  return fetch(offsetURL('search.json'))
+    .then(function(response) {
+      if (response.status == 200) {
+        return response.json().then(function(json) {
+          // index main articles
+          json.articles.forEach(function(article) {
+            fuse.add(article);
+          });
+          // download collections and index their articles
+          return Promise.all(json.collections.map(function(collection) {
+            return fetch(offsetURL(collection)).then(function(response) {
+              if (response.status === 200) {
+                return response.json().then(function(articles) {
+                  articles.forEach(function(article) {
+                    fuse.add(article);
+                  });
+                })
+              } else {
+                return Promise.reject(
+                  new Error('Unexpected status from search index request: ' +
+                            response.status)
+                );
+              }
+            });
+          })).then(function() {
+            return fuse;
+          });
+        });
+
+      } else {
+        return Promise.reject(
+          new Error('Unexpected status from search index request: ' +
+                      response.status)
+        );
+      }
+    });
+}
+
+window.document.addEventListener("DOMContentLoaded", function (event) {
+
+  // get search element (bail if we don't have one)
+  var searchEl = window.document.getElementById('distill-search');
+  if (!searchEl)
+    return;
+
+  createFuseIndex()
+    .then(function(fuse) {
+
+      // make search box visible
+      searchEl.classList.remove('hidden');
+
+      // initialize autocomplete
+      var options = {
+        autoselect: true,
+        hint: false,
+        minLength: 2,
+      };
+      window.autocomplete(searchEl, options, [{
+        source: function(query, callback) {
+          const searchOptions = {
+            isCaseSensitive: false,
+            shouldSort: true,
+            minMatchCharLength: 2,
+            limit: 10,
+          };
+          var results = fuse.search(query, searchOptions);
+          callback(results
+            .map(function(result) { return result.item; })
+          );
+        },
+        templates: {
+          suggestion: function(suggestion) {
+            var img = suggestion.preview && Object.keys(suggestion.preview).length > 0
+              ? `<img src="${offsetURL(suggestion.preview)}"</img>`
+              : '';
+            var html = `
+              <div class="search-item">
+                <h3>${suggestion.title}</h3>
+                <div class="search-item-description">
+                  ${suggestion.description || ''}
+                </div>
+                <div class="search-item-preview">
+                  ${img}
+                </div>
+              </div>
+            `;
+            return html;
+          }
+        }
+      }]).on('autocomplete:selected', function(event, suggestion) {
+        window.location.href = offsetURL(suggestion.path);
+      });
+      // remove inline display style on autocompleter (we want to
+      // manage responsive display via css)
+      $('.algolia-autocomplete').css("display", "");
+    })
+    .catch(function(error) {
+      console.log(error);
+    });
+
+});
+
+</script>
+
+<style type="text/css">
+
+.nav-search {
+  font-size: x-small;
+}
+
+/* Algolioa Autocomplete */
+
+.algolia-autocomplete {
+  display: inline-block;
+  margin-left: 10px;
+  vertical-align: sub;
+  background-color: white;
+  color: black;
+  padding: 6px;
+  padding-top: 8px;
+  padding-bottom: 0;
+  border-radius: 6px;
+  border: 1px #0F2E3D solid;
+  width: 180px;
+}
+
+
+@media screen and (max-width: 768px) {
+  .distill-site-nav .algolia-autocomplete {
+    display: none;
+    visibility: hidden;
+  }
+  .distill-site-nav.responsive .algolia-autocomplete {
+    display: inline-block;
+    visibility: visible;
+  }
+  .distill-site-nav.responsive .algolia-autocomplete .aa-dropdown-menu {
+    margin-left: 0;
+    width: 400px;
+    max-height: 400px;
+  }
+}
+
+.algolia-autocomplete .aa-input, .algolia-autocomplete .aa-hint {
+  width: 90%;
+  outline: none;
+  border: none;
+}
+
+.algolia-autocomplete .aa-hint {
+  color: #999;
+}
+.algolia-autocomplete .aa-dropdown-menu {
+  width: 550px;
+  max-height: 70vh;
+  overflow-x: visible;
+  overflow-y: scroll;
+  padding: 5px;
+  margin-top: 3px;
+  margin-left: -150px;
+  background-color: #fff;
+  border-radius: 5px;
+  border: 1px solid #999;
+  border-top: none;
+}
+
+.algolia-autocomplete .aa-dropdown-menu .aa-suggestion {
+  cursor: pointer;
+  padding: 5px 4px;
+  border-bottom: 1px solid #eee;
+}
+
+.algolia-autocomplete .aa-dropdown-menu .aa-suggestion:last-of-type {
+  border-bottom: none;
+  margin-bottom: 2px;
+}
+
+.algolia-autocomplete .aa-dropdown-menu .aa-suggestion .search-item {
+  overflow: hidden;
+  font-size: 0.8em;
+  line-height: 1.4em;
+}
+
+.algolia-autocomplete .aa-dropdown-menu .aa-suggestion .search-item h3 {
+  font-size: 1rem;
+  margin-block-start: 0;
+  margin-block-end: 5px;
+}
+
+.algolia-autocomplete .aa-dropdown-menu .aa-suggestion .search-item-description {
+  display: inline-block;
+  overflow: hidden;
+  height: 2.8em;
+  width: 80%;
+  margin-right: 4%;
+}
+
+.algolia-autocomplete .aa-dropdown-menu .aa-suggestion .search-item-preview {
+  display: inline-block;
+  width: 15%;
+}
+
+.algolia-autocomplete .aa-dropdown-menu .aa-suggestion .search-item-preview img {
+  height: 3em;
+  width: auto;
+  display: none;
+}
+
+.algolia-autocomplete .aa-dropdown-menu .aa-suggestion .search-item-preview img[src] {
+  display: initial;
+}
+
+.algolia-autocomplete .aa-dropdown-menu .aa-suggestion.aa-cursor {
+  background-color: #eee;
+}
+.algolia-autocomplete .aa-dropdown-menu .aa-suggestion em {
+  font-weight: bold;
+  font-style: normal;
+}
+
+</style>
+
+
+<!--/radix_placeholder_navigation_in_header-->
+  <!--radix_placeholder_distill-->
+
+<style type="text/css">
+
+body {
+  background-color: white;
+}
+
+.pandoc-table {
+  width: 100%;
+}
+
+.pandoc-table>caption {
+  margin-bottom: 10px;
+}
+
+.pandoc-table th:not([align]) {
+  text-align: left;
+}
+
+.pagedtable-footer {
+  font-size: 15px;
+}
+
+d-byline .byline {
+  grid-template-columns: 2fr 2fr;
+}
+
+d-byline .byline h3 {
+  margin-block-start: 1.5em;
+}
+
+d-byline .byline .authors-affiliations h3 {
+  margin-block-start: 0.5em;
+}
+
+.authors-affiliations .orcid-id {
+  width: 16px;
+  height:16px;
+  margin-left: 4px;
+  margin-right: 4px;
+  vertical-align: middle;
+  padding-bottom: 2px;
+}
+
+d-title .dt-tags {
+  margin-top: 1em;
+  grid-column: text;
+}
+
+.dt-tags .dt-tag {
+  text-decoration: none;
+  display: inline-block;
+  color: rgba(0,0,0,0.6);
+  padding: 0em 0.4em;
+  margin-right: 0.5em;
+  margin-bottom: 0.4em;
+  font-size: 70%;
+  border: 1px solid rgba(0,0,0,0.2);
+  border-radius: 3px;
+  text-transform: uppercase;
+  font-weight: 500;
+}
+
+d-article table.gt_table td,
+d-article table.gt_table th {
+  border-bottom: none;
+  font-size: 100%;
+}
+
+.html-widget {
+  margin-bottom: 2.0em;
+}
+
+.l-screen-inset {
+  padding-right: 16px;
+}
+
+.l-screen .caption {
+  margin-left: 10px;
+}
+
+.shaded {
+  background: rgb(247, 247, 247);
+  padding-top: 20px;
+  padding-bottom: 20px;
+  border-top: 1px solid rgba(0, 0, 0, 0.1);
+  border-bottom: 1px solid rgba(0, 0, 0, 0.1);
+}
+
+.shaded .html-widget {
+  margin-bottom: 0;
+  border: 1px solid rgba(0, 0, 0, 0.1);
+}
+
+.shaded .shaded-content {
+  background: white;
+}
+
+.text-output {
+  margin-top: 0;
+  line-height: 1.5em;
+}
+
+.hidden {
+  display: none !important;
+}
+
+hr.section-separator {
+  border: none;
+  border-top: 1px solid rgba(0, 0, 0, 0.1);
+  margin: 0px;
+}
+
+
+d-byline {
+  border-top: none;
+}
+
+d-article {
+  padding-top: 2.5rem;
+  padding-bottom: 30px;
+  border-top: none;
+}
+
+d-appendix {
+  padding-top: 30px;
+}
+
+d-article>p>img {
+  width: 100%;
+}
+
+d-article h2 {
+  margin: 1rem 0 1.5rem 0;
+}
+
+d-article h3 {
+  margin-top: 1.5rem;
+}
+
+d-article iframe {
+  border: 1px solid rgba(0, 0, 0, 0.1);
+  margin-bottom: 2.0em;
+  width: 100%;
+}
+
+/* Tweak code blocks */
+
+d-article div.sourceCode code,
+d-article pre code {
+  font-family: Consolas, Monaco, 'Andale Mono', 'Ubuntu Mono', monospace;
+}
+
+d-article pre,
+d-article div.sourceCode,
+d-article div.sourceCode pre {
+  overflow: auto;
+}
+
+d-article div.sourceCode {
+  background-color: white;
+}
+
+d-article div.sourceCode pre {
+  padding-left: 10px;
+  font-size: 12px;
+  border-left: 2px solid rgba(0,0,0,0.1);
+}
+
+d-article pre {
+  font-size: 12px;
+  color: black;
+  background: none;
+  margin-top: 0;
+  text-align: left;
+  white-space: pre;
+  word-spacing: normal;
+  word-break: normal;
+  word-wrap: normal;
+  line-height: 1.5;
+
+  -moz-tab-size: 4;
+  -o-tab-size: 4;
+  tab-size: 4;
+
+  -webkit-hyphens: none;
+  -moz-hyphens: none;
+  -ms-hyphens: none;
+  hyphens: none;
+}
+
+d-article pre a {
+  border-bottom: none;
+}
+
+d-article pre a:hover {
+  border-bottom: none;
+  text-decoration: underline;
+}
+
+d-article details {
+  grid-column: text;
+  margin-bottom: 0.8em;
+}
+
+@media(min-width: 768px) {
+
+d-article pre,
+d-article div.sourceCode,
+d-article div.sourceCode pre {
+  overflow: visible !important;
+}
+
+d-article div.sourceCode pre {
+  padding-left: 18px;
+  font-size: 14px;
+}
+
+/* tweak for Pandoc numbered line within distill */
+d-article pre.numberSource code > span {
+    left: -2em;
+}
+
+d-article pre {
+  font-size: 14px;
+}
+
+}
+
+figure img.external {
+  background: white;
+  border: 1px solid rgba(0, 0, 0, 0.1);
+  box-shadow: 0 1px 8px rgba(0, 0, 0, 0.1);
+  padding: 18px;
+  box-sizing: border-box;
+}
+
+/* CSS for d-contents */
+
+.d-contents {
+  grid-column: text;
+  color: rgba(0,0,0,0.8);
+  font-size: 0.9em;
+  padding-bottom: 1em;
+  margin-bottom: 1em;
+  padding-bottom: 0.5em;
+  margin-bottom: 1em;
+  padding-left: 0.25em;
+  justify-self: start;
+}
+
+@media(min-width: 1000px) {
+  .d-contents.d-contents-float {
+    height: 0;
+    grid-column-start: 1;
+    grid-column-end: 4;
+    justify-self: center;
+    padding-right: 3em;
+    padding-left: 2em;
+  }
+}
+
+.d-contents nav h3 {
+  font-size: 18px;
+  margin-top: 0;
+  margin-bottom: 1em;
+}
+
+.d-contents li {
+  list-style-type: none
+}
+
+.d-contents nav > ul {
+  padding-left: 0;
+}
+
+.d-contents ul {
+  padding-left: 1em
+}
+
+.d-contents nav ul li {
+  margin-top: 0.6em;
+  margin-bottom: 0.2em;
+}
+
+.d-contents nav a {
+  font-size: 13px;
+  border-bottom: none;
+  text-decoration: none
+  color: rgba(0, 0, 0, 0.8);
+}
+
+.d-contents nav a:hover {
+  text-decoration: underline solid rgba(0, 0, 0, 0.6)
+}
+
+.d-contents nav > ul > li > a {
+  font-weight: 600;
+}
+
+.d-contents nav > ul > li > ul {
+  font-weight: inherit;
+}
+
+.d-contents nav > ul > li > ul > li {
+  margin-top: 0.2em;
+}
+
+
+.d-contents nav ul {
+  margin-top: 0;
+  margin-bottom: 0.25em;
+}
+
+.d-article-with-toc h2:nth-child(2) {
+  margin-top: 0;
+}
+
+
+/* Figure */
+
+.figure {
+  position: relative;
+  margin-bottom: 2.5em;
+  margin-top: 1.5em;
+}
+
+.figure .caption {
+  color: rgba(0, 0, 0, 0.6);
+  font-size: 12px;
+  line-height: 1.5em;
+}
+
+.figure img.external {
+  background: white;
+  border: 1px solid rgba(0, 0, 0, 0.1);
+  box-shadow: 0 1px 8px rgba(0, 0, 0, 0.1);
+  padding: 18px;
+  box-sizing: border-box;
+}
+
+.figure .caption a {
+  color: rgba(0, 0, 0, 0.6);
+}
+
+.figure .caption b,
+.figure .caption strong, {
+  font-weight: 600;
+  color: rgba(0, 0, 0, 1.0);
+}
+
+/* Citations */
+
+d-article .citation {
+  color: inherit;
+  cursor: inherit;
+}
+
+div.hanging-indent{
+  margin-left: 1em; text-indent: -1em;
+}
+
+/* Citation hover box */
+
+.tippy-box[data-theme~=light-border] {
+  background-color: rgba(250, 250, 250, 0.95);
+}
+
+.tippy-content > p {
+  margin-bottom: 0;
+  padding: 2px;
+}
+
+
+/* Tweak 1000px media break to show more text */
+
+@media(min-width: 1000px) {
+  .base-grid,
+  distill-header,
+  d-title,
+  d-abstract,
+  d-article,
+  d-appendix,
+  distill-appendix,
+  d-byline,
+  d-footnote-list,
+  d-citation-list,
+  distill-footer {
+    grid-template-columns: [screen-start] 1fr [page-start kicker-start] 80px [middle-start] 50px [text-start kicker-end] 65px 65px 65px 65px 65px 65px 65px 65px [text-end gutter-start] 65px [middle-end] 65px [page-end gutter-end] 1fr [screen-end];
+    grid-column-gap: 16px;
+  }
+
+  .grid {
+    grid-column-gap: 16px;
+  }
+
+  d-article {
+    font-size: 1.06rem;
+    line-height: 1.7em;
+  }
+  figure .caption, .figure .caption, figure figcaption {
+    font-size: 13px;
+  }
+}
+
+@media(min-width: 1180px) {
+  .base-grid,
+  distill-header,
+  d-title,
+  d-abstract,
+  d-article,
+  d-appendix,
+  distill-appendix,
+  d-byline,
+  d-footnote-list,
+  d-citation-list,
+  distill-footer {
+    grid-template-columns: [screen-start] 1fr [page-start kicker-start] 60px [middle-start] 60px [text-start kicker-end] 60px 60px 60px 60px 60px 60px 60px 60px [text-end gutter-start] 60px [middle-end] 60px [page-end gutter-end] 1fr [screen-end];
+    grid-column-gap: 32px;
+  }
+
+  .grid {
+    grid-column-gap: 32px;
+  }
+}
+
+
+/* Get the citation styles for the appendix (not auto-injected on render since
+   we do our own rendering of the citation appendix) */
+
+d-appendix .citation-appendix,
+.d-appendix .citation-appendix {
+  font-size: 11px;
+  line-height: 15px;
+  border-left: 1px solid rgba(0, 0, 0, 0.1);
+  padding-left: 18px;
+  border: 1px solid rgba(0,0,0,0.1);
+  background: rgba(0, 0, 0, 0.02);
+  padding: 10px 18px;
+  border-radius: 3px;
+  color: rgba(150, 150, 150, 1);
+  overflow: hidden;
+  margin-top: -12px;
+  white-space: pre-wrap;
+  word-wrap: break-word;
+}
+
+/* Include appendix styles here so they can be overridden */
+
+d-appendix {
+  contain: layout style;
+  font-size: 0.8em;
+  line-height: 1.7em;
+  margin-top: 60px;
+  margin-bottom: 0;
+  border-top: 1px solid rgba(0, 0, 0, 0.1);
+  color: rgba(0,0,0,0.5);
+  padding-top: 60px;
+  padding-bottom: 48px;
+}
+
+d-appendix h3 {
+  grid-column: page-start / text-start;
+  font-size: 15px;
+  font-weight: 500;
+  margin-top: 1em;
+  margin-bottom: 0;
+  color: rgba(0,0,0,0.65);
+}
+
+d-appendix h3 + * {
+  margin-top: 1em;
+}
+
+d-appendix ol {
+  padding: 0 0 0 15px;
+}
+
+@media (min-width: 768px) {
+  d-appendix ol {
+    padding: 0 0 0 30px;
+    margin-left: -30px;
+  }
+}
+
+d-appendix li {
+  margin-bottom: 1em;
+}
+
+d-appendix a {
+  color: rgba(0, 0, 0, 0.6);
+}
+
+d-appendix > * {
+  grid-column: text;
+}
+
+d-appendix > d-footnote-list,
+d-appendix > d-citation-list,
+d-appendix > distill-appendix {
+  grid-column: screen;
+}
+
+/* Include footnote styles here so they can be overridden */
+
+d-footnote-list {
+  contain: layout style;
+}
+
+d-footnote-list > * {
+  grid-column: text;
+}
+
+d-footnote-list a.footnote-backlink {
+  color: rgba(0,0,0,0.3);
+  padding-left: 0.5em;
+}
+
+
+
+/* Anchor.js */
+
+.anchorjs-link {
+  /*transition: all .25s linear; */
+  text-decoration: none;
+  border-bottom: none;
+}
+*:hover > .anchorjs-link {
+  margin-left: -1.125em !important;
+  text-decoration: none;
+  border-bottom: none;
+}
+
+/* Social footer */
+
+.social_footer {
+  margin-top: 30px;
+  margin-bottom: 0;
+  color: rgba(0,0,0,0.67);
+}
+
+.disqus-comments {
+  margin-right: 30px;
+}
+
+.disqus-comment-count {
+  border-bottom: 1px solid rgba(0, 0, 0, 0.4);
+  cursor: pointer;
+}
+
+#disqus_thread {
+  margin-top: 30px;
+}
+
+.article-sharing a {
+  border-bottom: none;
+  margin-right: 8px;
+}
+
+.article-sharing a:hover {
+  border-bottom: none;
+}
+
+.sidebar-section.subscribe {
+  font-size: 12px;
+  line-height: 1.6em;
+}
+
+.subscribe p {
+  margin-bottom: 0.5em;
+}
+
+
+.article-footer .subscribe {
+  font-size: 15px;
+  margin-top: 45px;
+}
+
+
+.sidebar-section.custom {
+  font-size: 12px;
+  line-height: 1.6em;
+}
+
+.custom p {
+  margin-bottom: 0.5em;
+}
+
+/* Styles for listing layout (hide title) */
+.layout-listing d-title, .layout-listing .d-title {
+  display: none;
+}
+
+/* Styles for posts lists (not auto-injected) */
+
+
+.posts-with-sidebar {
+  padding-left: 45px;
+  padding-right: 45px;
+}
+
+.posts-list .description h2,
+.posts-list .description p {
+  font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen, Ubuntu, Cantarell, "Fira Sans", "Droid Sans", "Helvetica Neue", Arial, sans-serif;
+}
+
+.posts-list .description h2 {
+  font-weight: 700;
+  border-bottom: none;
+  padding-bottom: 0;
+}
+
+.posts-list h2.post-tag {
+  border-bottom: 1px solid rgba(0, 0, 0, 0.2);
+  padding-bottom: 12px;
+}
+.posts-list {
+  margin-top: 60px;
+  margin-bottom: 24px;
+}
+
+.posts-list .post-preview {
+  text-decoration: none;
+  overflow: hidden;
+  display: block;
+  border-bottom: 1px solid rgba(0, 0, 0, 0.1);
+  padding: 24px 0;
+}
+
+.post-preview-last {
+  border-bottom: none !important;
+}
+
+.posts-list .posts-list-caption {
+  grid-column: screen;
+  font-weight: 400;
+}
+
+.posts-list .post-preview h2 {
+  margin: 0 0 6px 0;
+  line-height: 1.2em;
+  font-style: normal;
+  font-size: 24px;
+}
+
+.posts-list .post-preview p {
+  margin: 0 0 12px 0;
+  line-height: 1.4em;
+  font-size: 16px;
+}
+
+.posts-list .post-preview .thumbnail {
+  box-sizing: border-box;
+  margin-bottom: 24px;
+  position: relative;
+  max-width: 500px;
+}
+.posts-list .post-preview img {
+  width: 100%;
+  display: block;
+}
+
+.posts-list .metadata {
+  font-size: 12px;
+  line-height: 1.4em;
+  margin-bottom: 18px;
+}
+
+.posts-list .metadata > * {
+  display: inline-block;
+}
+
+.posts-list .metadata .publishedDate {
+  margin-right: 2em;
+}
+
+.posts-list .metadata .dt-authors {
+  display: block;
+  margin-top: 0.3em;
+  margin-right: 2em;
+}
+
+.posts-list .dt-tags {
+  display: block;
+  line-height: 1em;
+}
+
+.posts-list .dt-tags .dt-tag {
+  display: inline-block;
+  color: rgba(0,0,0,0.6);
+  padding: 0.3em 0.4em;
+  margin-right: 0.2em;
+  margin-bottom: 0.4em;
+  font-size: 60%;
+  border: 1px solid rgba(0,0,0,0.2);
+  border-radius: 3px;
+  text-transform: uppercase;
+  font-weight: 500;
+}
+
+.posts-list img {
+  opacity: 1;
+}
+
+.posts-list img[data-src] {
+  opacity: 0;
+}
+
+.posts-more {
+  clear: both;
+}
+
+
+.posts-sidebar {
+  font-size: 16px;
+}
+
+.posts-sidebar h3 {
+  font-size: 16px;
+  margin-top: 0;
+  margin-bottom: 0.5em;
+  font-weight: 400;
+  text-transform: uppercase;
+}
+
+.sidebar-section {
+  margin-bottom: 30px;
+}
+
+.categories ul {
+  list-style-type: none;
+  margin: 0;
+  padding: 0;
+}
+
+.categories li {
+  color: rgba(0, 0, 0, 0.8);
+  margin-bottom: 0;
+}
+
+.categories li>a {
+  border-bottom: none;
+}
+
+.categories li>a:hover {
+  border-bottom: 1px solid rgba(0, 0, 0, 0.4);
+}
+
+.categories .active {
+  font-weight: 600;
+}
+
+.categories .category-count {
+  color: rgba(0, 0, 0, 0.4);
+}
+
+
+@media(min-width: 768px) {
+  .posts-list .post-preview h2 {
+    font-size: 26px;
+  }
+  .posts-list .post-preview .thumbnail {
+    float: right;
+    width: 30%;
+    margin-bottom: 0;
+  }
+  .posts-list .post-preview .description {
+    float: left;
+    width: 45%;
+  }
+  .posts-list .post-preview .metadata {
+    float: left;
+    width: 20%;
+    margin-top: 8px;
+  }
+  .posts-list .post-preview p {
+    margin: 0 0 12px 0;
+    line-height: 1.5em;
+    font-size: 16px;
+  }
+  .posts-with-sidebar .posts-list {
+    float: left;
+    width: 75%;
+  }
+  .posts-with-sidebar .posts-sidebar {
+    float: right;
+    width: 20%;
+    margin-top: 60px;
+    padding-top: 24px;
+    padding-bottom: 24px;
+  }
+}
+
+
+/* Improve display for browsers without grid (IE/Edge <= 15) */
+
+.downlevel {
+  line-height: 1.6em;
+  font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen, Ubuntu, Cantarell, "Fira Sans", "Droid Sans", "Helvetica Neue", Arial, sans-serif;
+  margin: 0;
+}
+
+.downlevel .d-title {
+  padding-top: 6rem;
+  padding-bottom: 1.5rem;
+}
+
+.downlevel .d-title h1 {
+  font-size: 50px;
+  font-weight: 700;
+  line-height: 1.1em;
+  margin: 0 0 0.5rem;
+}
+
+.downlevel .d-title p {
+  font-weight: 300;
+  font-size: 1.2rem;
+  line-height: 1.55em;
+  margin-top: 0;
+}
+
+.downlevel .d-byline {
+  padding-top: 0.8em;
+  padding-bottom: 0.8em;
+  font-size: 0.8rem;
+  line-height: 1.8em;
+}
+
+.downlevel .section-separator {
+  border: none;
+  border-top: 1px solid rgba(0, 0, 0, 0.1);
+}
+
+.downlevel .d-article {
+  font-size: 1.06rem;
+  line-height: 1.7em;
+  padding-top: 1rem;
+  padding-bottom: 2rem;
+}
+
+
+.downlevel .d-appendix {
+  padding-left: 0;
+  padding-right: 0;
+  max-width: none;
+  font-size: 0.8em;
+  line-height: 1.7em;
+  margin-bottom: 0;
+  color: rgba(0,0,0,0.5);
+  padding-top: 40px;
+  padding-bottom: 48px;
+}
+
+.downlevel .footnotes ol {
+  padding-left: 13px;
+}
+
+.downlevel .base-grid,
+.downlevel .distill-header,
+.downlevel .d-title,
+.downlevel .d-abstract,
+.downlevel .d-article,
+.downlevel .d-appendix,
+.downlevel .distill-appendix,
+.downlevel .d-byline,
+.downlevel .d-footnote-list,
+.downlevel .d-citation-list,
+.downlevel .distill-footer,
+.downlevel .appendix-bottom,
+.downlevel .posts-container {
+  padding-left: 40px;
+  padding-right: 40px;
+}
+
+@media(min-width: 768px) {
+  .downlevel .base-grid,
+  .downlevel .distill-header,
+  .downlevel .d-title,
+  .downlevel .d-abstract,
+  .downlevel .d-article,
+  .downlevel .d-appendix,
+  .downlevel .distill-appendix,
+  .downlevel .d-byline,
+  .downlevel .d-footnote-list,
+  .downlevel .d-citation-list,
+  .downlevel .distill-footer,
+  .downlevel .appendix-bottom,
+  .downlevel .posts-container {
+  padding-left: 150px;
+  padding-right: 150px;
+  max-width: 900px;
+}
+}
+
+.downlevel pre code {
+  display: block;
+  border-left: 2px solid rgba(0, 0, 0, .1);
+  padding: 0 0 0 20px;
+  font-size: 14px;
+}
+
+.downlevel code, .downlevel pre {
+  color: black;
+  background: none;
+  font-family: Consolas, Monaco, 'Andale Mono', 'Ubuntu Mono', monospace;
+  text-align: left;
+  white-space: pre;
+  word-spacing: normal;
+  word-break: normal;
+  word-wrap: normal;
+  line-height: 1.5;
+
+  -moz-tab-size: 4;
+  -o-tab-size: 4;
+  tab-size: 4;
+
+  -webkit-hyphens: none;
+  -moz-hyphens: none;
+  -ms-hyphens: none;
+  hyphens: none;
+}
+
+.downlevel .posts-list .post-preview {
+  color: inherit;
+}
+
+
+
+</style>
+
+<script type="application/javascript">
+
+function is_downlevel_browser() {
+  if (bowser.isUnsupportedBrowser({ msie: "12", msedge: "16"},
+                                 window.navigator.userAgent)) {
+    return true;
+  } else {
+    return window.load_distill_framework === undefined;
+  }
+}
+
+// show body when load is complete
+function on_load_complete() {
+
+  // add anchors
+  if (window.anchors) {
+    window.anchors.options.placement = 'left';
+    window.anchors.add('d-article > h2, d-article > h3, d-article > h4, d-article > h5');
+  }
+
+
+  // set body to visible
+  document.body.style.visibility = 'visible';
+
+  // force redraw for leaflet widgets
+  if (window.HTMLWidgets) {
+    var maps = window.HTMLWidgets.findAll(".leaflet");
+    $.each(maps, function(i, el) {
+      var map = this.getMap();
+      map.invalidateSize();
+      map.eachLayer(function(layer) {
+        if (layer instanceof L.TileLayer)
+          layer.redraw();
+      });
+    });
+  }
+
+  // trigger 'shown' so htmlwidgets resize
+  $('d-article').trigger('shown');
+}
+
+function init_distill() {
+
+  init_common();
+
+  // create front matter
+  var front_matter = $('<d-front-matter></d-front-matter>');
+  $('#distill-front-matter').wrap(front_matter);
+
+  // create d-title
+  $('.d-title').changeElementType('d-title');
+
+  // separator
+  var separator = '<hr class="section-separator" style="clear: both"/>';
+  // prepend separator above appendix
+  $('.d-byline').before(separator);
+  $('.d-article').before(separator);
+
+  // create d-byline
+  var byline = $('<d-byline></d-byline>');
+  $('.d-byline').replaceWith(byline);
+
+  // create d-article
+  var article = $('<d-article></d-article>');
+  $('.d-article').wrap(article).children().unwrap();
+
+  // move posts container into article
+  $('.posts-container').appendTo($('d-article'));
+
+  // create d-appendix
+  $('.d-appendix').changeElementType('d-appendix');
+
+  // flag indicating that we have appendix items
+  var appendix = $('.appendix-bottom').children('h3').length > 0;
+
+  // replace footnotes with <d-footnote>
+  $('.footnote-ref').each(function(i, val) {
+    appendix = true;
+    var href = $(this).attr('href');
+    var id = href.replace('#', '');
+    var fn = $('#' + id);
+    var fn_p = $('#' + id + '>p');
+    fn_p.find('.footnote-back').remove();
+    var text = fn_p.html();
+    var dtfn = $('<d-footnote></d-footnote>');
+    dtfn.html(text);
+    $(this).replaceWith(dtfn);
+  });
+  // remove footnotes
+  $('.footnotes').remove();
+
+  // move refs into #references-listing
+  $('#references-listing').replaceWith($('#refs'));
+
+  $('h1.appendix, h2.appendix').each(function(i, val) {
+    $(this).changeElementType('h3');
+  });
+  $('h3.appendix').each(function(i, val) {
+    var id = $(this).attr('id');
+    $('.d-contents a[href="#' + id + '"]').parent().remove();
+    appendix = true;
+    $(this).nextUntil($('h1, h2, h3')).addBack().appendTo($('d-appendix'));
+  });
+
+  // show d-appendix if we have appendix content
+  $("d-appendix").css('display', appendix ? 'grid' : 'none');
+
+  // localize layout chunks to just output
+  $('.layout-chunk').each(function(i, val) {
+
+    // capture layout
+    var layout = $(this).attr('data-layout');
+
+    // apply layout to markdown level block elements
+    var elements = $(this).children().not('details, div.sourceCode, pre, script');
+    elements.each(function(i, el) {
+      var layout_div = $('<div class="' + layout + '"></div>');
+      if (layout_div.hasClass('shaded')) {
+        var shaded_content = $('<div class="shaded-content"></div>');
+        $(this).wrap(shaded_content);
+        $(this).parent().wrap(layout_div);
+      } else {
+        $(this).wrap(layout_div);
+      }
+    });
+
+
+    // unwrap the layout-chunk div
+    $(this).children().unwrap();
+  });
+
+  // remove code block used to force  highlighting css
+  $('.distill-force-highlighting-css').parent().remove();
+
+  // remove empty line numbers inserted by pandoc when using a
+  // custom syntax highlighting theme, except when numbering line
+  // in code chunk
+  $('pre:not(.numberLines) code.sourceCode a:empty').remove();
+
+  // load distill framework
+  load_distill_framework();
+
+  // wait for window.distillRunlevel == 4 to do post processing
+  function distill_post_process() {
+
+    if (!window.distillRunlevel || window.distillRunlevel < 4)
+      return;
+
+    // hide author/affiliations entirely if we have no authors
+    var front_matter = JSON.parse($("#distill-front-matter").html());
+    var have_authors = front_matter.authors && front_matter.authors.length > 0;
+    if (!have_authors)
+      $('d-byline').addClass('hidden');
+
+    // article with toc class
+    $('.d-contents').parent().addClass('d-article-with-toc');
+
+    // strip links that point to #
+    $('.authors-affiliations').find('a[href="#"]').removeAttr('href');
+
+    // add orcid ids
+    $('.authors-affiliations').find('.author').each(function(i, el) {
+      var orcid_id = front_matter.authors[i].orcidID;
+      var author_name = front_matter.authors[i].author
+      if (orcid_id) {
+        var a = $('<a></a>');
+        a.attr('href', 'https://orcid.org/' + orcid_id);
+        var img = $('<img></img>');
+        img.addClass('orcid-id');
+        img.attr('alt', author_name ? 'ORCID ID for ' + author_name : 'ORCID ID');
+        img.attr('src','data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAA2ZpVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADw/eHBhY2tldCBiZWdpbj0i77u/IiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+IDx4OnhtcG1ldGEgeG1sbnM6eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IkFkb2JlIFhNUCBDb3JlIDUuMC1jMDYwIDYxLjEzNDc3NywgMjAxMC8wMi8xMi0xNzozMjowMCAgICAgICAgIj4gPHJkZjpSREYgeG1sbnM6cmRmPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIj4gPHJkZjpEZXNjcmlwdGlvbiByZGY6YWJvdXQ9IiIgeG1sbnM6eG1wTU09Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC9tbS8iIHhtbG5zOnN0UmVmPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvc1R5cGUvUmVzb3VyY2VSZWYjIiB4bWxuczp4bXA9Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC8iIHhtcE1NOk9yaWdpbmFsRG9jdW1lbnRJRD0ieG1wLmRpZDo1N0NEMjA4MDI1MjA2ODExOTk0QzkzNTEzRjZEQTg1NyIgeG1wTU06RG9jdW1lbnRJRD0ieG1wLmRpZDozM0NDOEJGNEZGNTcxMUUxODdBOEVCODg2RjdCQ0QwOSIgeG1wTU06SW5zdGFuY2VJRD0ieG1wLmlpZDozM0NDOEJGM0ZGNTcxMUUxODdBOEVCODg2RjdCQ0QwOSIgeG1wOkNyZWF0b3JUb29sPSJBZG9iZSBQaG90b3Nob3AgQ1M1IE1hY2ludG9zaCI+IDx4bXBNTTpEZXJpdmVkRnJvbSBzdFJlZjppbnN0YW5jZUlEPSJ4bXAuaWlkOkZDN0YxMTc0MDcyMDY4MTE5NUZFRDc5MUM2MUUwNEREIiBzdFJlZjpkb2N1bWVudElEPSJ4bXAuZGlkOjU3Q0QyMDgwMjUyMDY4MTE5OTRDOTM1MTNGNkRBODU3Ii8+IDwvcmRmOkRlc2NyaXB0aW9uPiA8L3JkZjpSREY+IDwveDp4bXBtZXRhPiA8P3hwYWNrZXQgZW5kPSJyIj8+84NovQAAAR1JREFUeNpiZEADy85ZJgCpeCB2QJM6AMQLo4yOL0AWZETSqACk1gOxAQN+cAGIA4EGPQBxmJA0nwdpjjQ8xqArmczw5tMHXAaALDgP1QMxAGqzAAPxQACqh4ER6uf5MBlkm0X4EGayMfMw/Pr7Bd2gRBZogMFBrv01hisv5jLsv9nLAPIOMnjy8RDDyYctyAbFM2EJbRQw+aAWw/LzVgx7b+cwCHKqMhjJFCBLOzAR6+lXX84xnHjYyqAo5IUizkRCwIENQQckGSDGY4TVgAPEaraQr2a4/24bSuoExcJCfAEJihXkWDj3ZAKy9EJGaEo8T0QSxkjSwORsCAuDQCD+QILmD1A9kECEZgxDaEZhICIzGcIyEyOl2RkgwAAhkmC+eAm0TAAAAABJRU5ErkJggg==');
+        a.append(img);
+        $(this).append(a);
+      }
+    });
+
+    // hide elements of author/affiliations grid that have no value
+    function hide_byline_column(caption) {
+      $('d-byline').find('h3:contains("' + caption + '")').parent().css('visibility', 'hidden');
+    }
+
+    // affiliations
+    var have_affiliations = false;
+    for (var i = 0; i<front_matter.authors.length; ++i) {
+      var author = front_matter.authors[i];
+      if (author.affiliation !== "&nbsp;") {
+        have_affiliations = true;
+        break;
+      }
+    }
+    if (!have_affiliations)
+      $('d-byline').find('h3:contains("Affiliations")').css('visibility', 'hidden');
+
+    // published date
+    if (!front_matter.publishedDate)
+      hide_byline_column("Published");
+
+    // document object identifier
+    var doi = $('d-byline').find('h3:contains("DOI")');
+    var doi_p = doi.next().empty();
+    if (!front_matter.doi) {
+      // if we have a citation and valid citationText then link to that
+      if ($('#citation').length > 0 && front_matter.citationText) {
+        doi.html('Citation');
+        $('<a href="#citation"></a>')
+          .text(front_matter.citationText)
+          .appendTo(doi_p);
+      } else {
+        hide_byline_column("DOI");
+      }
+    } else {
+      $('<a></a>')
+         .attr('href', "https://doi.org/" + front_matter.doi)
+         .html(front_matter.doi)
+         .appendTo(doi_p);
+    }
+
+     // change plural form of authors/affiliations
+    if (front_matter.authors.length === 1) {
+      var grid = $('.authors-affiliations');
+      grid.children('h3:contains("Authors")').text('Author');
+      grid.children('h3:contains("Affiliations")').text('Affiliation');
+    }
+
+    // remove d-appendix and d-footnote-list local styles
+    $('d-appendix > style:first-child').remove();
+    $('d-footnote-list > style:first-child').remove();
+
+    // move appendix-bottom entries to the bottom
+    $('.appendix-bottom').appendTo('d-appendix').children().unwrap();
+    $('.appendix-bottom').remove();
+
+    // hoverable references
+    $('span.citation[data-cites]').each(function() {
+      const citeChild = $(this).children()[0]
+      // Do not process if @xyz has been used without escaping and without bibliography activated
+      // https://github.com/rstudio/distill/issues/466
+      if (citeChild === undefined) return true
+
+      if (citeChild.nodeName == "D-FOOTNOTE") {
+        var fn = citeChild
+        $(this).html(fn.shadowRoot.querySelector("sup"))
+        $(this).id = fn.id
+        fn.remove()
+      }
+      var refs = $(this).attr('data-cites').split(" ");
+      var refHtml = refs.map(function(ref) {
+        // Could use CSS.escape too here, we insure backward compatibility in navigator
+        return "<p>" + $('div[id="ref-' + ref + '"]').html() + "</p>";
+      }).join("\n");
+      window.tippy(this, {
+        allowHTML: true,
+        content: refHtml,
+        maxWidth: 500,
+        interactive: true,
+        interactiveBorder: 10,
+        theme: 'light-border',
+        placement: 'bottom-start'
+      });
+    });
+
+    // fix footnotes in tables (#411)
+    // replacing broken distill.pub feature
+    $('table d-footnote').each(function() {
+      // we replace internal showAtNode methode which is triggered when hovering a footnote
+      this.hoverBox.showAtNode = function(node) {
+        // ported from https://github.com/distillpub/template/pull/105/files
+        calcOffset = function(elem) {
+            let x = elem.offsetLeft;
+            let y = elem.offsetTop;
+            // Traverse upwards until an `absolute` element is found or `elem`
+            // becomes null.
+            while (elem = elem.offsetParent && elem.style.position != 'absolute') {
+                x += elem.offsetLeft;
+                y += elem.offsetTop;
+            }
+
+            return { left: x, top: y };
+        }
+        // https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/offsetTop
+        const bbox = node.getBoundingClientRect();
+        const offset = calcOffset(node);
+        this.show([offset.left + bbox.width, offset.top + bbox.height]);
+      }
+    })
+
+    // clear polling timer
+    clearInterval(tid);
+
+    // show body now that everything is ready
+    on_load_complete();
+  }
+
+  var tid = setInterval(distill_post_process, 50);
+  distill_post_process();
+
+}
+
+function init_downlevel() {
+
+  init_common();
+
+   // insert hr after d-title
+  $('.d-title').after($('<hr class="section-separator"/>'));
+
+  // check if we have authors
+  var front_matter = JSON.parse($("#distill-front-matter").html());
+  var have_authors = front_matter.authors && front_matter.authors.length > 0;
+
+  // manage byline/border
+  if (!have_authors)
+    $('.d-byline').remove();
+  $('.d-byline').after($('<hr class="section-separator"/>'));
+  $('.d-byline a').remove();
+
+  // remove toc
+  $('.d-contents').remove();
+
+  // move appendix elements
+  $('h1.appendix, h2.appendix').each(function(i, val) {
+    $(this).changeElementType('h3');
+  });
+  $('h3.appendix').each(function(i, val) {
+    $(this).nextUntil($('h1, h2, h3')).addBack().appendTo($('.d-appendix'));
+  });
+
+
+  // inject headers into references and footnotes
+  var refs_header = $('<h3></h3>');
+  refs_header.text('References');
+  $('#refs').prepend(refs_header);
+
+  var footnotes_header = $('<h3></h3');
+  footnotes_header.text('Footnotes');
+  $('.footnotes').children('hr').first().replaceWith(footnotes_header);
+
+  // move appendix-bottom entries to the bottom
+  $('.appendix-bottom').appendTo('.d-appendix').children().unwrap();
+  $('.appendix-bottom').remove();
+
+  // remove appendix if it's empty
+  if ($('.d-appendix').children().length === 0)
+    $('.d-appendix').remove();
+
+  // prepend separator above appendix
+  $('.d-appendix').before($('<hr class="section-separator" style="clear: both"/>'));
+
+  // trim code
+  $('pre>code').each(function(i, val) {
+    $(this).html($.trim($(this).html()));
+  });
+
+  // move posts-container right before article
+  $('.posts-container').insertBefore($('.d-article'));
+
+  $('body').addClass('downlevel');
+
+  on_load_complete();
+}
+
+
+function init_common() {
+
+  // jquery plugin to change element types
+  (function($) {
+    $.fn.changeElementType = function(newType) {
+      var attrs = {};
+
+      $.each(this[0].attributes, function(idx, attr) {
+        attrs[attr.nodeName] = attr.nodeValue;
+      });
+
+      this.replaceWith(function() {
+        return $("<" + newType + "/>", attrs).append($(this).contents());
+      });
+    };
+  })(jQuery);
+
+  // prevent underline for linked images
+  $('a > img').parent().css({'border-bottom' : 'none'});
+
+  // mark non-body figures created by knitr chunks as 100% width
+  $('.layout-chunk').each(function(i, val) {
+    var figures = $(this).find('img, .html-widget');
+    // ignore leaflet img layers (#106)
+    figures = figures.filter(':not(img[class*="leaflet"])')
+    if ($(this).attr('data-layout') !== "l-body") {
+      figures.css('width', '100%');
+    } else {
+      figures.css('max-width', '100%');
+      figures.filter("[width]").each(function(i, val) {
+        var fig = $(this);
+        fig.css('width', fig.attr('width') + 'px');
+      });
+
+    }
+  });
+
+  // auto-append index.html to post-preview links in file: protocol
+  // and in rstudio ide preview
+  $('.post-preview').each(function(i, val) {
+    if (window.location.protocol === "file:")
+      $(this).attr('href', $(this).attr('href') + "index.html");
+  });
+
+  // get rid of index.html references in header
+  if (window.location.protocol !== "file:") {
+    $('.distill-site-header a[href]').each(function(i,val) {
+      $(this).attr('href', $(this).attr('href').replace(/^index[.]html/, "./"));
+    });
+  }
+
+  // add class to pandoc style tables
+  $('tr.header').parent('thead').parent('table').addClass('pandoc-table');
+  $('.kable-table').children('table').addClass('pandoc-table');
+
+  // add figcaption style to table captions
+  $('caption').parent('table').addClass("figcaption");
+
+  // initialize posts list
+  if (window.init_posts_list)
+    window.init_posts_list();
+
+  // implmement disqus comment link
+  $('.disqus-comment-count').click(function() {
+    window.headroom_prevent_pin = true;
+    $('#disqus_thread').toggleClass('hidden');
+    if (!$('#disqus_thread').hasClass('hidden')) {
+      var offset = $(this).offset();
+      $(window).resize();
+      $('html, body').animate({
+        scrollTop: offset.top - 35
+      });
+    }
+  });
+}
+
+document.addEventListener('DOMContentLoaded', function() {
+  if (is_downlevel_browser())
+    init_downlevel();
+  else
+    window.addEventListener('WebComponentsReady', init_distill);
+});
+
+</script>
+
+<!--/radix_placeholder_distill-->
+  <script src="../../site_libs/header-attrs-2.27/header-attrs.js"></script>
+  <script src="../../site_libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
+  <script src="../../site_libs/popper-2.6.0/popper.min.js"></script>
+  <link href="../../site_libs/tippy-6.2.7/tippy.css" rel="stylesheet" />
+  <link href="../../site_libs/tippy-6.2.7/tippy-light-border.css" rel="stylesheet" />
+  <script src="../../site_libs/tippy-6.2.7/tippy.umd.min.js"></script>
+  <script src="../../site_libs/anchor-4.2.2/anchor.min.js"></script>
+  <script src="../../site_libs/bowser-1.9.3/bowser.min.js"></script>
+  <script src="../../site_libs/webcomponents-2.0.0/webcomponents.js"></script>
+  <script src="../../site_libs/distill-2.2.21/template.v2.js"></script>
+  <!--radix_placeholder_site_in_header-->
+<script type="text/javascript" cookie-consent="tracking" async src="https://www.googletagmanager.com/gtag/js?id=UA-178413324-1"></script>
+<script type="text/javascript" cookie-consent="tracking">
+window.dataLayer = window.dataLayer || [];
+function gtag(){dataLayer.push(arguments);}
+gtag('js', new Date());
+gtag('config', 'UA-178413324-1');
+</script>
+<style type="text/css">
+/* ---------- Global Styles ---------- */
+
+@import url('https://fonts.googleapis.com/css?family=Atkinson+Hyperlegible&display=swap');
+@import url('https://fonts.googleapis.com/css?family=Roboto+Mono&display=swap');
+@import url('https://fonts.googleapis.com/css?family=Roboto+Slab&display=swap');
+@import url('https://fonts.googleapis.com/css?family=Roboto&display=swap');
+@import url('https://fonts.googleapis.com/css?family=Fira+Mono&display=swap');
+
+html, body, p {
+  
+  font-family: 'Atkinson Hyperlegible';
+  font-weight: 400;
+  line-height: 1.7; 
+  font-size: 1em;
+  color: #333333;
+  font-style: normal;
+  
+}
+
+h2, h3 {
+  font-family: 'Roboto Slab'
+}
+
+d-article h4 {
+  font-size: 20px;
+  text-align: center;
+  margin-top: 0.5em;
+  padding: 5px 0px;
+  outline: 1px solid black;
+}
+
+d-article h5 {
+  font-size: 18px;
+  text-decoration: underline;
+  margin-bottom: 0.5em;
+}
+
+em {
+  margin-right: 3px;
+}
+
+div.opener {
+  text-align: justify;
+}
+
+/* Small caps */
+
+.sc {
+  font-variant: small-caps;
+  letter-spacing: 0.1em;
+}
+
+
+/* ---------- Index Page Styles ---------- */
+
+
+/* Name time */
+
+div.nav-left.title {
+  font-family: 'Roboto Slab';
+}
+
+
+/* Search bar */
+
+.angolia-autocomplete {
+  width: 300px;
+}
+
+.nav-search {
+    font-size: inherit;
+}
+
+/* Publication Date */
+
+.publishedDate {
+  font-size: 1.1em;
+}
+
+
+/* Post Description */
+
+.posts-list .description h2 {
+  font-family: 'Roboto Slab';
+}
+
+.description p {
+  font-style: italic;
+  padding-top: 5px;
+}
+
+
+/* Tags */
+
+.posts-list .dt-tags {
+  margin-top: 10px;
+  margin-right: 10px;
+}
+
+
+.posts-list .dt-tags .dt-tag {
+  font-family: 'Roboto Slab';
+  font-weight: bold;
+  letter-spacing: 1px;
+  background-color: #565656;
+  color: white;
+}
+
+/* Top & Bot bars */
+
+.distill-site-nav {
+  background-color: #262c2f;
+}
+
+.distill-site-footer p {
+  color: white;
+}
+
+
+
+/* Post published date medatada */
+.posts-list .metadata .publishedDate{
+  font-family: 'Roboto Mono';
+}
+
+
+/* Post list caption */
+.posts-list .posts-list-caption{
+  font-family: 'Fira Mono';
+  border-bottom: 5px solid #f3f3f3;
+  padding-bottom: 2rem;
+}
+
+
+/* Sidebar Categories */
+.posts-sidebar {
+  font-family: 'Fira Mono';
+}
+
+.sidebar-section h3 {
+  font-size: 22px;
+  margin-bottom: 20px;
+}
+
+
+/* ---------- Article Styles ---------- */
+
+
+/* Title */
+
+d-title {
+  padding-bottom: 0;
+}
+
+d-title h1 {
+  font-family: 'Roboto Slab';
+  line-height: 1.3;
+}
+
+d-title p {
+  font-weight: 600;
+}
+
+
+/* General */
+
+d-article > h2 {
+  margin: 2.5rem 0 1.5rem 0;
+  border-bottom: 2px solid black;
+}
+
+div.l-body + h2 {
+  margin-top: 0px;
+}
+
+d-article > h3 {
+  margin-top: 1em;
+  font-size: 28px;
+  padding-bottom: 10px;
+  border-bottom: 2px solid lightgrey;
+}
+
+d-article ol {
+  padding-bottom: 15px;
+}
+
+/* Aside */
+
+aside {
+  margin-left: 20px;
+  width: 200px;
+  font-family: 'Roboto Slab';
+}
+
+div.sourceCode + aside {
+  margin-top: 1.5em;
+}
+
+aside code {
+  font-weight: 600;
+}
+
+/* Details */
+summary {
+  padding-left: 5px;
+}
+
+details {
+  background-color: #008bff0f;
+  outline: 2px solid #38566F;
+  padding: 1rem;
+}
+
+d-article details[open] summary {
+  margin-bottom: 1em;
+}
+
+/* Block Quotes */
+
+d-article blockquote {
+  border-left: 5px solid rgba(0, 0, 0, 0.2);
+  padding-left: 1em;
+}
+
+blockquote p {
+  margin-bottom: 0;
+}
+
+
+/* Footnote margins */
+d-footnote {
+  margin-left: -.2em;
+  margin-right: .1em;
+}
+
+/* Footnote Image */
+
+d-footnote-list ol li img {
+  width: 700px;
+}
+
+d-footnote img {
+  width: 660px;
+  padding: 10px;
+}
+
+
+/* -------- Layout Helpers -------- */
+
+/* 2-column Even */
+
+.pull-left {
+  float: left;
+  width: 47%;
+}
+
+.pull-right {
+  float: right;
+  width: 47%;
+}
+
+/* 2-column Left-skew */
+
+.pull-left-wide {
+  float: left;
+  width: 75%;
+}
+
+.pull-right-narrow {
+  float: right;
+  width: 20%;
+}
+
+/* 2-column Right-skew */
+
+.pull-left-narrow {
+  float: left;
+  width: 20%;
+}
+
+.pull-right-wide {
+  float: right;
+  width: 75%;
+}
+
+
+/* ---------- Code Styles ---------- */
+
+/* Inline Code */
+
+p > code, li > code, summary > code {
+  border-radius: 4px;
+  padding: 2px;
+  color: black;
+  background-color: #fbc4ff80;
+  font-family: 'Fira Mono';
+  font-weight: 600;
+  font-size: 0.8em;
+}
+
+a > code {
+    font-weight: bold;
+}
+
+
+/* Code Chunk */
+
+d-article div.sourceCode pre {
+  position: relative;
+  background-color: white;
+  padding-top: 10px;
+  padding-bottom: 10px;
+  margin-bottom: 10px;
+  border-left: 5px solid #4D8DC9;
+}
+
+d-article pre.out-extended {
+  width: fit-content;
+  max-width: fit-content;
+  padding-right: 10px;
+}
+
+/* R chunk */
+d-article div.sourceCode pre.r {
+  background-color: #faffff;
+}
+
+/* Julia chunk */
+d-article div.sourceCode pre.julia {
+  background-color: #f8fffb;
+  border-left: 5px solid #4A916B;
+}
+
+d-article details div.sourceCode pre {
+  background-color: white;
+}
+
+/*
+d-article div.sourceCode pre.r:before {
+  position: absolute;
+  font-size: 1rem;
+  content: 'R';
+  font-family: 'Fira Mono';
+  font-weight: 600;
+  left: -20px;
+  top: -5px;
+  color: #59a2e6;
+}
+*/
+
+d-article div.sourceCode pre code {
+  font-family: 'Fira Mono';
+  font-weight: 600;
+}
+
+
+/* overrides some syntax highlighting */
+
+span.va {
+  color: black;
+}
+span.fu {
+  color: #3456cc;
+}
+span.cn {
+  color: #ef3e3a;
+}
+span.fl {
+  color: #c8531d;
+}
+span.kw {
+  color: #9f1eb1;
+}
+span.st {
+  color: #148a14;
+}
+span.co {
+  color: #999999;
+  font-weight: 500;
+}
+
+i.code-hl {
+  font-style: normal;
+  font-weight: bold;
+  background-color: #acf120;
+  padding: 2px 5px;
+  font-size: 1.1em;
+  outline: 1px solid lightgrey;
+}
+
+/* underline auto-linked functions */
+
+span.va a {
+  text-decoration: underline;
+}
+span.fu a {
+  text-decoration: underline;
+}
+
+
+/* Output Chunk */
+
+d-article pre code {
+  font-family: 'Fira Mono';
+  font-weight: 500;
+}
+
+d-article pre {
+  background-color: #f9f9f9;
+  padding-top: 10px;
+  padding-bottom: 10px;
+  margin-bottom: 30px;
+  overflow-x: scroll;
+}
+
+
+/* Footnote inline code */
+
+d-footnote-list code {
+  color: rgb(0 0 0 / 0.70);
+  font-weight: 600;
+}
+
+/* ---------- ADD-ONS ---------- */
+
+
+/* {rmarkdown} */
+
+.pagedtable-wrapper {
+  margin-bottom: 20px;
+}
+
+
+/* html widgets */
+
+.html-widget {
+  margin-top: 1.0em;
+  margin-bottom: 1.0em;
+}
+
+
+/* {reactable} */
+
+.ReactTable {
+  font-size: 14px;
+  font-family: 'Roboto Slab';
+  background-color: #fbfbfb;
+}
+
+
+/* {xaringanExtra} */
+
+.panelset {
+  margin-top: 10px;
+  --panel-tabs-border-bottom: black;
+  --panel-tab-font-family: 'Roboto Slab';
+  --panel-tab-hover-background: #ddd;
+  --panel-tab-text-decoration: none;
+  border-bottom: 2px solid black;
+  margin-bottom: 30px;
+}
+
+.panelset section {
+    margin-top: 30px;
+    margin-bottom: 20px;
+}
+
+.panel > pre {
+  overflow-x: auto !important;
+}
+
+</style>
+<!--/radix_placeholder_site_in_header-->
+
+  <link rel="stylesheet" href="../../styles.css" type="text/css"/>
+
+</head>
+
+<body>
+
+<!--radix_placeholder_front_matter-->
+
+<script id="distill-front-matter" type="text/json">
+{"title":"Read files on the web into R","description":"For the download-button-averse of us","authors":[{"author":"June Choe","authorURL":"#","affiliation":"University of Pennsylvania Linguistics","affiliationURL":"https://live-sas-www-ling.pantheon.sas.upenn.edu/","orcidID":"0000-0002-0701-921X"}],"publishedDate":"2024-09-22T00:00:00.000-04:00","citationText":"Choe, 2024"}
+</script>
+
+<!--/radix_placeholder_front_matter-->
+<!--radix_placeholder_navigation_before_body-->
+<header class="header header--fixed" role="banner">
+<nav class="distill-site-nav distill-site-header">
+<div class="nav-left">
+<a href="../../index.html" class="title">June Choe</a>
+<input id="distill-search" class="nav-search hidden" type="text" placeholder="Search..."/>
+</div>
+<div class="nav-right">
+<a href="../../index.html">Home</a>
+<a href="../../blog.html">Blog</a>
+<a href="../../research.html">Research</a>
+<a href="../../static/CV.pdf">CV</a>
+<a href="../../software.html">Software</a>
+<a href="../../news.html">News</a>
+<div class="nav-dropdown">
+<button class="nav-dropbtn">
+Misc.
+ 
+<span class="down-arrow">&#x25BE;</span>
+</button>
+<div class="nav-dropdown-content">
+<a href="../../resources.html">Resources</a>
+</div>
+</div>
+<a href="https://twitter.com/yjunechoe">
+<i class="fa fa-twitter" aria-hidden="true"></i>
+</a>
+<a href="https://fosstodon.org/@yjunechoe">
+<i class="fa fa-brands fa-mastodon" aria-hidden="true"></i>
+</a>
+<a href="https://github.com/yjunechoe">
+<i class="fa fa-github" aria-hidden="true"></i>
+</a>
+<a href="https://osf.io/72vrb/" class="nav-image">
+<img src="../../static/img/osf.png"/>
+</a>
+<a href="../../sitemap.xml">
+<i class="fa fa-rss" aria-hidden="true"></i>
+</a>
+<a href="javascript:void(0);" class="nav-toggle">&#9776;</a>
+</div>
+</nav>
+</header>
+<!--/radix_placeholder_navigation_before_body-->
+<!--radix_placeholder_site_before_body-->
+<!--/radix_placeholder_site_before_body-->
+
+<div class="d-title">
+<h1>Read files on the web into R</h1>
+
+<!--radix_placeholder_categories-->
+<div class="dt-tags">
+  <a href="../../blog.html#category:tutorial" class="dt-tag">tutorial</a>
+</div>
+<!--/radix_placeholder_categories-->
+<p><p>For the download-button-averse of us</p></p>
+</div>
+
+<div class="d-byline">
+  June Choe  (University of Pennsylvania Linguistics)<a href="https://live-sas-www-ling.pantheon.sas.upenn.edu/" class="uri">https://live-sas-www-ling.pantheon.sas.upenn.edu/</a>
+  
+<br/>09-22-2024
+</div>
+
+<div class="d-article">
+<div class="d-contents d-contents-float">
+<nav class="l-text toc figcaption" id="TOC">
+<h3>Contents</h3>
+<ul>
+<li><a href="#github-public-repos" id="toc-github-public-repos">GitHub (public repos)</a></li>
+<li><a href="#github-gists" id="toc-github-gists">GitHub (gists)</a></li>
+<li><a href="#github-private-repos" id="toc-github-private-repos">GitHub (private repos)</a></li>
+<li><a href="#osf" id="toc-osf">OSF</a></li>
+<li><a href="#aside-cant-go-wrong-with-a-copy-paste" id="toc-aside-cant-go-wrong-with-a-copy-paste">Aside: Can’t go wrong with a copy-paste!</a></li>
+<li><a href="#other-goodies" id="toc-other-goodies">Other goodies</a>
+<ul>
+<li><a href="#streaming-with-duckdb" id="toc-streaming-with-duckdb">Streaming with <code>{duckdb}</code></a></li>
+<li><a href="#other-sources-for-data" id="toc-other-sources-for-data">Other sources for data</a></li>
+<li><a href="#miscellaneous-tips-and-tricks" id="toc-miscellaneous-tips-and-tricks">Miscellaneous tips and tricks</a></li>
+</ul></li>
+<li><a href="#sessioninfo" id="toc-sessioninfo">sessionInfo()</a></li>
+</ul>
+</nav>
+</div>
+<p>Every so often I’ll have a link to some file on hand and want to read it in R without going out of my way to browse the web page, find a download link, download it somewhere onto my computer, grab the path to it, and then finally read it into R.</p>
+<p>Over the years I’ve accumulated some tricks to get data into R “straight from a url”, even if the url does not point to the raw file contents itself. The method varies between data sources though, and I have a hard time keeping track of them in my head, so I thought I’d write some of these down for my own reference. This is not meant to be comprehensive though - keep in mind that I’m someone who primarily works with tabular data and interface with GitHub and OSF as data repositories.</p>
+<h2 id="github-public-repos">GitHub (public repos)</h2>
+<p>GitHub has nice a point-and-click interface for browsing repositories and previewing files. For example, you can navigate to the <code>dplyr::starwars</code> dataset from <a href="https://github.com/tidyverse/dplyr/">tidyverse/dplyr</a>, at <a href="https://github.com/tidyverse/dplyr/blob/main/data-raw/starwars.csv" class="uri">https://github.com/tidyverse/dplyr/blob/main/data-raw/starwars.csv</a>:</p>
+<div class="layout-chunk" data-layout="l-body">
+<p><img src="github-dplyr-starwars.jpg" width="500px" class=external style="display: block; margin: auto;" /></p>
+</div>
+<p>That url, despite ending in a <code>.csv</code>, does not point to the raw data - instead, the contents of the page is a full html document:</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'>rvest</span><span class='fu'>::</span><span class='fu'><a href='http://xml2.r-lib.org/reference/read_xml.html'>read_html</a></span><span class='op'>(</span><span class='st'>"https://github.com/tidyverse/dplyr/blob/main/data-raw/starwars.csv"</span><span class='op'>)</span></span></code></pre>
+</div>
+</div>
+<pre><code>  {html_document}
+  &lt;html lang=&quot;en&quot; data-color-mode=&quot;auto&quot; data-light-theme=&quot;light&quot; ...
+  [1] &lt;head&gt;\n&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html; charset=UTF-8 ...
+  [2] &lt;body class=&quot;logged-out env-production page-responsive&quot; style=&quot;word-wrap: ...</code></pre>
+<p>To actually point to the csv contents, we want to click on the <strong>Raw</strong> button to the top-right corner of the preview:</p>
+<div class="layout-chunk" data-layout="l-body">
+<p><img src="github-dplyr-starwars-raw.jpg" width="300px" class=external style="display: block; margin: auto;" /></p>
+</div>
+<p>That gets us to the comma separated values we want, which is at a new url <a href="https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/starwars.csv" class="uri">https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/starwars.csv</a>:</p>
+<div class="layout-chunk" data-layout="l-body">
+<p><img src="github-dplyr-starwars-csv.jpg" width="100%" class=external style="display: block; margin: auto;" /></p>
+</div>
+<p>We can then read from that URL at “raw.githubusercontent.com/…” using <code>read.csv()</code>:</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'><a href='https://rdrr.io/r/utils/read.table.html'>read.csv</a></span><span class='op'>(</span><span class='st'>"https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/starwars.csv"</span><span class='op'>)</span> <span class='op'>|&gt;</span> </span>
+<span>  <span class='fu'>dplyr</span><span class='fu'>::</span><span class='fu'><a href='https://pillar.r-lib.org/reference/glimpse.html'>glimpse</a></span><span class='op'>(</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  Rows: 87
+  Columns: 14
+  $ name       &lt;chr&gt; &quot;Luke Skywalker&quot;, &quot;C-3PO&quot;, &quot;R2-D2&quot;, &quot;Darth Vader&quot;, &quot;Leia Or…
+  $ height     &lt;int&gt; 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180, 2…
+  $ mass       &lt;dbl&gt; 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.…
+  $ hair_color &lt;chr&gt; &quot;blond&quot;, NA, NA, &quot;none&quot;, &quot;brown&quot;, &quot;brown, grey&quot;, &quot;brown&quot;, N…
+  $ skin_color &lt;chr&gt; &quot;fair&quot;, &quot;gold&quot;, &quot;white, blue&quot;, &quot;white&quot;, &quot;light&quot;, &quot;light&quot;, &quot;…
+  $ eye_color  &lt;chr&gt; &quot;blue&quot;, &quot;yellow&quot;, &quot;red&quot;, &quot;yellow&quot;, &quot;brown&quot;, &quot;blue&quot;, &quot;blue&quot;,…
+  $ birth_year &lt;dbl&gt; 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57.0, …
+  $ sex        &lt;chr&gt; &quot;male&quot;, &quot;none&quot;, &quot;none&quot;, &quot;male&quot;, &quot;female&quot;, &quot;male&quot;, &quot;female&quot;,…
+  $ gender     &lt;chr&gt; &quot;masculine&quot;, &quot;masculine&quot;, &quot;masculine&quot;, &quot;masculine&quot;, &quot;femini…
+  $ homeworld  &lt;chr&gt; &quot;Tatooine&quot;, &quot;Tatooine&quot;, &quot;Naboo&quot;, &quot;Tatooine&quot;, &quot;Alderaan&quot;, &quot;T…
+  $ species    &lt;chr&gt; &quot;Human&quot;, &quot;Droid&quot;, &quot;Droid&quot;, &quot;Human&quot;, &quot;Human&quot;, &quot;Human&quot;, &quot;Huma…
+  $ films      &lt;chr&gt; &quot;A New Hope, The Empire Strikes Back, Return of the Jedi, R…
+  $ vehicles   &lt;chr&gt; &quot;Snowspeeder, Imperial Speeder Bike&quot;, &quot;&quot;, &quot;&quot;, &quot;&quot;, &quot;Imperial…
+  $ starships  &lt;chr&gt; &quot;X-wing, Imperial shuttle&quot;, &quot;&quot;, &quot;&quot;, &quot;TIE Advanced x1&quot;, &quot;&quot;, …</code></pre>
+</div>
+<p>But note that this method of “click the <strong>Raw</strong> button to get the corresponding <em>raw.githubusercontent.com/…</em> url to the file contents” will not work for file formats that cannot be displayed in plain text (clicking the button will instead download the file via your browser). So sometimes (especially when you have a binary file) you have to construct this “remote-readable” url to the file manually.</p>
+<p>Fortunately, going from one link to the other is pretty formulaic. To demonstrate the difference with the url for the starwars dataset again:</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'>emphatic</span><span class='fu'>::</span><span class='fu'><a href='https://coolbutuseless.github.io/package/emphatic/reference/hl_diff.html'>hl_diff</a></span><span class='op'>(</span></span>
+<span>  <span class='st'>"https://github.com/tidyverse/dplyr/blob/main/data-raw/starwars.csv"</span>,</span>
+<span>  <span class='st'>"https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/starwars.csv"</span></span>
+<span><span class='op'>)</span></span></code></pre>
+</div>
+<pre style=''>
+<span><span>[1] &quot;https://</span></span><span style='color:#000000;'><span style='background-color:#9aff9a;'>    </span></span><span><span>github</span></span><span style='color:#000000;'><span style='background-color:#9aff9a;'>           </span></span><span><span>.com/tidyverse/dplyr</span></span><span style='color:#000000;'><span style='background-color:#ff7f50;'>/blob</span></span><span><span>/main/data-raw/starwars.csv&quot;</span></span><br/><span><span>[1] &quot;https://</span></span><span style='color:#000000;'><span style='background-color:#9aff9a;'>raw.</span></span><span><span>github</span></span><span style='color:#000000;'><span style='background-color:#9aff9a;'>usercontent</span></span><span><span>.com/tidyverse/dplyr</span></span><span style='color:#000000;'><span style='background-color:#ff7f50;'>     </span></span><span><span>/main/data-raw/starwars.csv&quot;</span></span>
+</pre>
+</div>
+<h2 id="github-gists">GitHub (gists)</h2>
+<p>It’s a similar idea with GitHub Gists, where I sometimes like to store small toy datasets for use in demos. For example, here’s a link to a simulated data for a <a href="https://en.wikipedia.org/wiki/Stroop_effect">Stroop experiment</a> <code>stroop.csv</code>: <a href="https://gist.github.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6" class="uri">https://gist.github.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6</a>.</p>
+<p>But that’s again a full-on webpage. The url which actually hosts the csv contents is <a href="https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw/c643b9760126d92b8ac100860ac5b50ba492f316/stroop.csv" class="uri">https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw/c643b9760126d92b8ac100860ac5b50ba492f316/stroop.csv</a>, which you can again get to by clicking the <strong>Raw</strong> button at the top-right corner of the gist</p>
+<div class="layout-chunk" data-layout="l-body">
+<p><img src="github-gist-stroop.jpg" width="100%" class=external style="display: block; margin: auto;" /></p>
+</div>
+<p>But actually, that long link you get by default points to the <em>current commit</em>, specifically. If you instead want the link to be kept up to date with the most recent commit, you can omit the second hash that comes after <code>raw/</code>:</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'>emphatic</span><span class='fu'>::</span><span class='fu'><a href='https://coolbutuseless.github.io/package/emphatic/reference/hl_diff.html'>hl_diff</a></span><span class='op'>(</span></span>
+<span>  <span class='st'>"https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw/c643b9760126d92b8ac100860ac5b50ba492f316/stroop.csv"</span>,</span>
+<span>  <span class='st'>"https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw/stroop.csv"</span></span>
+<span><span class='op'>)</span></span></code></pre>
+</div>
+<pre style=''>
+<span><span>[1] &quot;https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw</span></span><span style='color:#000000;'><span style='background-color:#ff7f50;'>/c643b9760126d92b8ac100860ac5b50ba492f316</span></span><span><span>/stroop.csv&quot;</span></span><br/><span><span>[1] &quot;https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw</span></span><span style='color:#000000;'><span style='background-color:#ff7f50;'>                                         </span></span><span><span>/stroop.csv&quot;</span></span>
+</pre>
+</div>
+<p>In practice, I don’t use gists to store replicability-sensitive data, so I prefer to just use the shorter link that’s not tied to a specific commit.</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'><a href='https://rdrr.io/r/utils/read.table.html'>read.csv</a></span><span class='op'>(</span><span class='st'>"https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw/stroop.csv"</span><span class='op'>)</span> <span class='op'>|&gt;</span> </span>
+<span>  <span class='fu'>dplyr</span><span class='fu'>::</span><span class='fu'><a href='https://pillar.r-lib.org/reference/glimpse.html'>glimpse</a></span><span class='op'>(</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  Rows: 240
+  Columns: 5
+  $ subj      &lt;chr&gt; &quot;S01&quot;, &quot;S01&quot;, &quot;S01&quot;, &quot;S01&quot;, &quot;S01&quot;, &quot;S01&quot;, &quot;S01&quot;, &quot;S01&quot;, &quot;S02…
+  $ word      &lt;chr&gt; &quot;blue&quot;, &quot;blue&quot;, &quot;green&quot;, &quot;green&quot;, &quot;red&quot;, &quot;red&quot;, &quot;yellow&quot;, &quot;y…
+  $ condition &lt;chr&gt; &quot;match&quot;, &quot;mismatch&quot;, &quot;match&quot;, &quot;mismatch&quot;, &quot;match&quot;, &quot;mismatch…
+  $ accuracy  &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
+  $ RT        &lt;int&gt; 400, 549, 576, 406, 296, 231, 433, 1548, 561, 1751, 286, 710…</code></pre>
+</div>
+<h2 id="github-private-repos">GitHub (private repos)</h2>
+<p>We now turn to the harder problem of accessing a file in a private GitHub repository. If you already have the GitHub webpage open and you’re signed in, you can follow the same step of copying the link that the <strong>Raw</strong> button redirects to.</p>
+<p>Except this time, when you open the file at that url (assuming it can display in plain text), you’ll see the url come with a “token” attached at the end (I’ll show an example further down). This token is necessary to remotely access the data in a private repo. Once a token is generated, the file can be accessed using that token from anywhere, but note that it <em>will expire</em> at some point as GitHub refreshes tokens periodically (so treat them as if they’re for single use).</p>
+<p>For a more robust approach, you can use the <a href="https://docs.github.com/en/rest/repos/contents">GitHub Contents API</a>. If you have your credentials set up in <a href="https://gh.r-lib.org/"><code>{gh}</code></a> (which you can check with <code>gh::gh_whoami()</code>), you can request a token-tagged url to the private file using the syntax:<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a></p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'>gh</span><span class='fu'>::</span><span class='fu'><a href='https://gh.r-lib.org/reference/gh.html'>gh</a></span><span class='op'>(</span><span class='st'>"/repos/{user}/{repo}/contents/{path}"</span><span class='op'>)</span><span class='op'>$</span><span class='va'>download_url</span></span></code></pre>
+</div>
+</div>
+<p>Note that this is actually also a general solution to getting a url to GitHub file contents. So for example, even without any credentials set up you can point to dplyr’s <code>starwars.csv</code> since that’s publicly accessible. This method produces the same “raw.githubusercontent.com/…” url we saw earlier:</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'>gh</span><span class='fu'>::</span><span class='fu'><a href='https://gh.r-lib.org/reference/gh.html'>gh</a></span><span class='op'>(</span><span class='st'>"/repos/tidyverse/dplyr/contents/data-raw/starwars.csv"</span><span class='op'>)</span><span class='op'>$</span><span class='va'>download_url</span></span></code></pre>
+</div>
+<pre><code>  [1] &quot;https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/starwars.csv&quot;</code></pre>
+</div>
+<p>Now for a demonstration with a private repo, here is one of mine that you cannot access <a href="https://github.com/yjunechoe/my-super-secret-repo" class="uri">https://github.com/yjunechoe/my-super-secret-repo</a>. But because I set up my credentials in <code>{gh}</code>, I can generate a link to a content within that repo with the access token attached (“<em>?token=…</em>”):</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'>gh</span><span class='fu'>::</span><span class='fu'><a href='https://gh.r-lib.org/reference/gh.html'>gh</a></span><span class='op'>(</span><span class='st'>"/repos/yjunechoe/my-super-secret-repo/contents/README.md"</span><span class='op'>)</span><span class='op'>$</span><span class='va'>download_url</span> <span class='op'>|&gt;</span> </span>
+<span>  <span class='co'># truncating</span></span>
+<span>  <span class='fu'><a href='https://rdrr.io/r/base/grep.html'>gsub</a></span><span class='op'>(</span>x <span class='op'>=</span> <span class='va'>_</span>, <span class='st'>"^(.{100}).*"</span>, <span class='st'>"\\1..."</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  [1] &quot;https://raw.githubusercontent.com/yjunechoe/my-super-secret-repo/main/README.md?token=AMTCUR2JPXCIX5...&quot;</code></pre>
+</div>
+<p>I can then use this url to read the private file:<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a></p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'>gh</span><span class='fu'>::</span><span class='fu'><a href='https://gh.r-lib.org/reference/gh.html'>gh</a></span><span class='op'>(</span><span class='st'>"/repos/yjunechoe/my-super-secret-repo/contents/README.md"</span><span class='op'>)</span><span class='op'>$</span><span class='va'>download_url</span> <span class='op'>|&gt;</span> </span>
+<span>  <span class='fu'><a href='https://rdrr.io/r/base/readLines.html'>readLines</a></span><span class='op'>(</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  [1] &quot;Surprise!&quot;</code></pre>
+</div>
+<h2 id="osf">OSF</h2>
+<p><a href="osf.io">OSF</a> (the Open Science Framework) is another data repository that I interact with a lot, and reading files off of OSF follows a similar strategy to fetching public files on GitHub.</p>
+<p>Consider, for example, the <code>dyestuff.arrow</code> file in the <a href="https://osf.io/a94tr/">OSF repository for MixedModels.jl</a>. Browsing the repository through the point-and-click interface can get you to the page for the file at <a href="https://osf.io/9vztj/" class="uri">https://osf.io/9vztj/</a>, where it shows:</p>
+<div class="layout-chunk" data-layout="l-body">
+<p><img src="osf-MixedModels-dyestuff.jpg" width="100%" class=external style="display: block; margin: auto;" /></p>
+</div>
+<p>The download button can be found inside the dropdown menubar to the right:</p>
+<div class="layout-chunk" data-layout="l-body">
+<p><img src="osf-MixedModels-dyestuff-download.jpg" width="50%" class=external style="display: block; margin: auto;" /></p>
+</div>
+<p>But instead of clicking on the icon (which will start a download via the browser), we can grab the embedded link address: <a href="https://osf.io/download/9vztj/" class="uri">https://osf.io/download/9vztj/</a>. That url can then be passed directly into a read function:</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'>arrow</span><span class='fu'>::</span><span class='fu'><a href='https://arrow.apache.org/docs/r/reference/read_feather.html'>read_feather</a></span><span class='op'>(</span><span class='st'>"https://osf.io/download/9vztj/"</span><span class='op'>)</span> <span class='op'>|&gt;</span> </span>
+<span>  <span class='fu'>dplyr</span><span class='fu'>::</span><span class='fu'><a href='https://pillar.r-lib.org/reference/glimpse.html'>glimpse</a></span><span class='op'>(</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  Rows: 30
+  Columns: 2
+  $ batch &lt;fct&gt; A, A, A, A, A, B, B, B, B, B, C, C, C, C, C, D, D, D, D, D, E, E…
+  $ yield &lt;int&gt; 1545, 1440, 1440, 1520, 1580, 1540, 1555, 1490, 1560, 1495, 1595…</code></pre>
+</div>
+<p>You might have already caught on to this, but the pattern is to simply point to <code>osf.io/download/</code> instead of <code>osf.io/</code>.</p>
+<p>This method also works for view-only links to anonymized OSF projects as well. For example, this is an anonymized link to a csv file from one of my projects <a href="https://osf.io/tr8qm?view_only=998ad87d86cc4049af4ec6c96a91d9ad" class="uri">https://osf.io/tr8qm?view_only=998ad87d86cc4049af4ec6c96a91d9ad</a>. Navigating to this link will show a web preview of the csv file contents.</p>
+<p>By inserting <code>/download</code> into this url, we can read the csv file contents directly:</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'><a href='https://rdrr.io/r/utils/read.table.html'>read.csv</a></span><span class='op'>(</span><span class='st'>"https://osf.io/download/tr8qm?view_only=998ad87d86cc4049af4ec6c96a91d9ad"</span><span class='op'>)</span> <span class='op'>|&gt;</span> </span>
+<span>  <span class='fu'><a href='https://rdrr.io/r/utils/head.html'>head</a></span><span class='op'>(</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>        Item  plaus_bias trans_bias
+  1 Awakened -0.29631221 -1.2200901
+  2   Calmed  0.09877074 -0.4102332
+  3   Choked  1.28401957 -1.4284905
+  4  Dressed -0.59262442 -1.2087228
+  5   Failed -0.98770736  0.1098839
+  6  Groomed -1.08647810  0.9889550</code></pre>
+</div>
+<p>See also the <a href="https://docs.ropensci.org/osfr/reference/osfr-package.html"><code>{osfr}</code></a> package for a more principled interface to OSF.</p>
+<h2 id="aside-cant-go-wrong-with-a-copy-paste">Aside: Can’t go wrong with a copy-paste!</h2>
+<p>Reading remote files aside, I think it’s severely underrated how base R has a <code>readClipboard()</code> function and a collection of <code>read.*()</code> functions which can also read directly from a <code>"clipboard"</code> connection.<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a></p>
+<p>I sometimes do this for html/markdown summary tables that a website might display, or sometimes even for entire excel/googlesheets tables after doing a select-all + copy. For such relatively small chunks of data that you just want to quickly get into R, you can lean on base R’s clipboard functionalities.</p>
+<p>For example, given this markdown table:</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'><a href='https://rdrr.io/r/stats/aggregate.html'>aggregate</a></span><span class='op'>(</span><span class='va'>mtcars</span>, <span class='va'>mpg</span> <span class='op'>~</span> <span class='va'>cyl</span>, <span class='va'>mean</span><span class='op'>)</span> <span class='op'>|&gt;</span> </span>
+<span>  <span class='fu'>knitr</span><span class='fu'>::</span><span class='fu'><a href='https://rdrr.io/pkg/knitr/man/kable.html'>kable</a></span><span class='op'>(</span><span class='op'>)</span></span></code></pre>
+</div>
+<table>
+<thead>
+<tr>
+<th style="text-align: right;">cyl</th>
+<th style="text-align: right;">mpg</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td style="text-align: right;">4</td>
+<td style="text-align: right;">26.66364</td>
+</tr>
+<tr>
+<td style="text-align: right;">6</td>
+<td style="text-align: right;">19.74286</td>
+</tr>
+<tr>
+<td style="text-align: right;">8</td>
+<td style="text-align: right;">15.10000</td>
+</tr>
+</tbody>
+</table>
+</div>
+<p>You can copy its contents and run the following code to get that data back as an R data frame:</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'><a href='https://rdrr.io/r/utils/read.table.html'>read.delim</a></span><span class='op'>(</span><span class='st'>"clipboard"</span><span class='op'>)</span></span>
+<span><span class='co'># Or, `read.delim(text = readClipboard())`</span></span></code></pre>
+</div>
+</div>
+<div class="layout-chunk" data-layout="l-body">
+<pre><code>    cyl      mpg
+  1   4 26.66364
+  2   6 19.74286
+  3   8 15.10000</code></pre>
+</div>
+<p>If you’re instead copying something flat like a list of numbers or strings, you can also use <code>scan()</code> and specify the appropriate <code>sep</code> to get that data back as a vector:<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a></p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'><a href='https://rdrr.io/r/base/paste.html'>paste</a></span><span class='op'>(</span><span class='fl'>1</span><span class='op'>:</span><span class='fl'>10</span>, collapse <span class='op'>=</span> <span class='st'>", "</span><span class='op'>)</span> <span class='op'>|&gt;</span> </span>
+<span>  <span class='fu'><a href='https://rdrr.io/r/base/cat.html'>cat</a></span><span class='op'>(</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  1, 2, 3, 4, 5, 6, 7, 8, 9, 10</code></pre>
+</div>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'><a href='https://rdrr.io/r/base/scan.html'>scan</a></span><span class='op'>(</span><span class='st'>"clipboard"</span>, sep <span class='op'>=</span> <span class='st'>","</span><span class='op'>)</span></span>
+<span><span class='co'># Or, `scan(textConnection(readClipboard()), sep = ",")`</span></span></code></pre>
+</div>
+</div>
+<div class="layout-chunk" data-layout="l-body">
+<pre><code>   [1]  1  2  3  4  5  6  7  8  9 10</code></pre>
+</div>
+<p>It should be noted though that parsing clipboard contents is not a robust feature in base R. If you want a more principled approach to reading data from clipboard, you should use <a href="https://milesmcbain.github.io/datapasta/"><code>{datapasta}</code></a>. And for printing data for others to copy-paste into R, use <a href="https://cynkra.github.io/constructive/"><code>{constructive}</code></a>. See also <a href="https://matthewlincoln.net/clipr/"><code>{clipr}</code></a> which extends clipboard read/write functionalities.</p>
+<h2 id="other-goodies">Other goodies</h2>
+<p>⚠️ What lies ahead are denser than the kinds of “low-tech” advice I wrote about above.</p>
+<h3 id="streaming-with-duckdb">Streaming with <code>{duckdb}</code></h3>
+<p>One caveat to all the “read from web” approaches I covered above is that it often does not actually circumvent the action of downloading the file onto your computer. For example, when you read a file from “raw.githubusercontent.com/…” with <code>read.csv()</code>, there is an implicit <code>download.file()</code> of the data into the current R session’s <code>tempdir()</code>.</p>
+<p>An alternative that actually reads the data straight into memory is <strong>streaming</strong>. Streaming is moreso a feature of database languages, but there’s good integration of such tools with R, so this option is available from within R as well.</p>
+<p>Here, I briefly outline what I learned from (mostly) reading <a href="https://francoismichonneau.net/2023/06/duckdb-r-remote-data/">a blog post by François Michonneau</a>, which covers how to stream remote files using <a href="https://duckdb.org/docs/api/r.html"><code>{duckdb}</code></a>. It’s pretty comprehensive but I wanted to make a template for just one method that I prefer.</p>
+<p>We start by loading the <code>{duckdb}</code> package, creating a connection to an in-memory database, installing the <code>httpfs</code> extension (if not installed already), and loading <code>httpfs</code> for the database.</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='kw'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='op'>(</span><span class='va'><a href='https://r.duckdb.org/'>duckdb</a></span><span class='op'>)</span></span>
+<span><span class='va'>con</span> <span class='op'>&lt;-</span> <span class='fu'><a href='https://dbi.r-dbi.org/reference/dbConnect.html'>dbConnect</a></span><span class='op'>(</span><span class='fu'><a href='https://r.duckdb.org/reference/duckdb.html'>duckdb</a></span><span class='op'>(</span><span class='op'>)</span><span class='op'>)</span></span>
+<span><span class='co'># dbExecute(con, "INSTALL httpfs;") # You may also need to "INSTALL parquet;"</span></span>
+<span><span class='fu'><a href='https://rdrr.io/r/base/invisible.html'>invisible</a></span><span class='op'>(</span><span class='fu'><a href='https://dbi.r-dbi.org/reference/dbExecute.html'>dbExecute</a></span><span class='op'>(</span><span class='va'>con</span>, <span class='st'>"LOAD httpfs;"</span><span class='op'>)</span><span class='op'>)</span></span></code></pre>
+</div>
+</div>
+<p>For this example I will use a <a href="https://duckdb.org/docs/data/parquet/overview">parquet file</a> from one of my projects which is hosted on GitHub: <a href="https://github.com/yjunechoe/repetition_events" class="uri">https://github.com/yjunechoe/repetition_events</a>. The data I want to read is at the relative path <code>/data/tokens_data/childID=1/part-7.parquet</code>. I went ahead and converted that into the “raw contents” url shown below:</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='co'># A parquet file of tokens from a sample of child-directed speech</span></span>
+<span><span class='va'>file</span> <span class='op'>&lt;-</span> <span class='st'>"https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID%3D1/part-7.parquet"</span></span>
+<span></span>
+<span><span class='co'># For comparison, reading its contents with {arrow}</span></span>
+<span><span class='fu'>arrow</span><span class='fu'>::</span><span class='fu'><a href='https://arrow.apache.org/docs/r/reference/read_parquet.html'>read_parquet</a></span><span class='op'>(</span><span class='va'>file</span><span class='op'>)</span> <span class='op'>|&gt;</span> </span>
+<span>  <span class='fu'><a href='https://rdrr.io/r/utils/head.html'>head</a></span><span class='op'>(</span><span class='fl'>5</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  # A tibble: 5 × 3
+    utterance_id gloss   part_of_speech
+           &lt;int&gt; &lt;chr&gt;   &lt;chr&gt;         
+  1            1 www     &quot;&quot;            
+  2            2 bye     &quot;co&quot;          
+  3            3 mhm     &quot;co&quot;          
+  4            4 Mommy&#39;s &quot;n:prop&quot;      
+  5            4 here    &quot;adv&quot;</code></pre>
+</div>
+<p>In duckdb, the <code>httpfs</code> extension we loaded above allows <code>PARQUET_SCAN</code><a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a> to read a remote parquet file.</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='va'>query1</span> <span class='op'>&lt;-</span> <span class='fu'>glue</span><span class='fu'>::</span><span class='fu'><a href='https://glue.tidyverse.org/reference/glue_sql.html'>glue_sql</a></span><span class='op'>(</span><span class='st'>"</span></span>
+<span><span class='st'>  SELECT *</span></span>
+<span><span class='st'>  FROM PARQUET_SCAN({`file`})</span></span>
+<span><span class='st'>  LIMIT 5;</span></span>
+<span><span class='st'>"</span>, .con <span class='op'>=</span> <span class='va'>con</span><span class='op'>)</span></span>
+<span><span class='fu'><a href='https://rdrr.io/r/base/cat.html'>cat</a></span><span class='op'>(</span><span class='va'>query1</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  SELECT *
+  FROM PARQUET_SCAN(&quot;https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID%3D1/part-7.parquet&quot;)
+  LIMIT 5;</code></pre>
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'><a href='https://dbi.r-dbi.org/reference/dbGetQuery.html'>dbGetQuery</a></span><span class='op'>(</span><span class='va'>con</span>, <span class='va'>query1</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>    utterance_id   gloss part_of_speech
+  1            1     www               
+  2            2     bye             co
+  3            3     mhm             co
+  4            4 Mommy&#39;s         n:prop
+  5            4    here            adv</code></pre>
+</div>
+<p>And actually, in my case, the parquet file represents one of many files that had been previously split up via <a href="https://arrow.apache.org/docs/r/reference/hive_partition.html">hive partitioning</a>. To preserve this metadata even as I read in just a single file, I need to do two things:</p>
+<ol type="1">
+<li>Specify <code>hive_partitioning=true</code> when calling <code>PARQUET_SCAN</code>.</li>
+<li>Ensure that the hive-partitioning syntax is represented in the url with <code>URLdecode()</code> (since the <code>=</code> character can sometimes be escaped, as in this case).</li>
+</ol>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'>emphatic</span><span class='fu'>::</span><span class='fu'><a href='https://coolbutuseless.github.io/package/emphatic/reference/hl_diff.html'>hl_diff</a></span><span class='op'>(</span><span class='va'>file</span>, <span class='fu'><a href='https://rdrr.io/r/utils/URLencode.html'>URLdecode</a></span><span class='op'>(</span><span class='va'>file</span><span class='op'>)</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre style=''>
+<span><span>[1] &quot;https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID</span></span><span style='color:#000000;'><span style='background-color:#97ffff;'>%</span></span><span style='color:#000000;'><span style='background-color:#ff7f50;'>3D</span></span><span><span>1/part-7.parquet&quot;</span></span><br/><span><span>[1] &quot;https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID</span></span><span style='color:#000000;'><span style='background-color:#97ffff;'>=</span></span><span style='color:#000000;'><span style='background-color:#ff7f50;'>  </span></span><span><span>1/part-7.parquet&quot;</span></span>
+</pre>
+</div>
+<p>With that, the data now shows that the observations are from child #1 in the sample.</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='va'>file</span> <span class='op'>&lt;-</span> <span class='fu'><a href='https://rdrr.io/r/utils/URLencode.html'>URLdecode</a></span><span class='op'>(</span><span class='va'>file</span><span class='op'>)</span></span>
+<span><span class='va'>query2</span> <span class='op'>&lt;-</span> <span class='fu'>glue</span><span class='fu'>::</span><span class='fu'><a href='https://glue.tidyverse.org/reference/glue_sql.html'>glue_sql</a></span><span class='op'>(</span><span class='st'>"</span></span>
+<span><span class='st'>  SELECT *</span></span>
+<span><span class='st'>  FROM PARQUET_SCAN(</span></span>
+<span><span class='st'>    {`file`},</span></span>
+<span><span class='st'>    hive_partitioning=true</span></span>
+<span><span class='st'>  )</span></span>
+<span><span class='st'>  LIMIT 5;</span></span>
+<span><span class='st'>"</span>, .con <span class='op'>=</span> <span class='va'>con</span><span class='op'>)</span></span>
+<span><span class='fu'><a href='https://rdrr.io/r/base/cat.html'>cat</a></span><span class='op'>(</span><span class='va'>query2</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  SELECT *
+  FROM PARQUET_SCAN(
+    &quot;https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID=1/part-7.parquet&quot;,
+    hive_partitioning=true
+  )
+  LIMIT 5;</code></pre>
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'><a href='https://dbi.r-dbi.org/reference/dbGetQuery.html'>dbGetQuery</a></span><span class='op'>(</span><span class='va'>con</span>, <span class='va'>query2</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>    utterance_id   gloss part_of_speech childID
+  1            1     www                      1
+  2            2     bye             co       1
+  3            3     mhm             co       1
+  4            4 Mommy&#39;s         n:prop       1
+  5            4    here            adv       1</code></pre>
+</div>
+<p>To do this more programmatically over <em>all</em> parquet files under <code>/tokens_data</code> in the repository, we need to transition to using the <a href="https://docs.github.com/en/rest/git/trees">GitHub Trees API</a>. The idea is similar to using the Contents API but now we are requesting a list of all files using the following syntax:</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'>gh</span><span class='fu'>::</span><span class='fu'><a href='https://gh.r-lib.org/reference/gh.html'>gh</a></span><span class='op'>(</span><span class='st'>"/repos/{user}/{repo}/git/trees/{branch/tag/commitSHA}?recursive=true"</span><span class='op'>)</span><span class='op'>$</span><span class='va'>tree</span></span></code></pre>
+</div>
+</div>
+<p>To get the file tree of the repo on the master branch, we use:</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='va'>files</span> <span class='op'>&lt;-</span> <span class='fu'>gh</span><span class='fu'>::</span><span class='fu'><a href='https://gh.r-lib.org/reference/gh.html'>gh</a></span><span class='op'>(</span><span class='st'>"/repos/yjunechoe/repetition_events/git/trees/master?recursive=true"</span><span class='op'>)</span><span class='op'>$</span><span class='va'>tree</span></span></code></pre>
+</div>
+</div>
+<p>With <code>recursive=true</code>, this returns all files in the repo. Then, we can filter for just the parquet files we want with a little regex:</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='va'>parquet_files</span> <span class='op'>&lt;-</span> <span class='fu'><a href='https://rdrr.io/r/base/lapply.html'>sapply</a></span><span class='op'>(</span><span class='va'>files</span>, <span class='va'>`[[`</span>, <span class='st'>"path"</span><span class='op'>)</span> <span class='op'>|&gt;</span> </span>
+<span>  <span class='fu'><a href='https://rdrr.io/r/base/grep.html'>grep</a></span><span class='op'>(</span>x <span class='op'>=</span> <span class='va'>_</span>, pattern <span class='op'>=</span> <span class='st'>".*/tokens_data/.*parquet$"</span>, value <span class='op'>=</span> <span class='cn'>TRUE</span><span class='op'>)</span></span>
+<span><span class='fu'><a href='https://rdrr.io/r/base/length.html'>length</a></span><span class='op'>(</span><span class='va'>parquet_files</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  [1] 70</code></pre>
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'><a href='https://rdrr.io/r/utils/head.html'>head</a></span><span class='op'>(</span><span class='va'>parquet_files</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  [1] &quot;data/tokens_data/childID=1/part-7.parquet&quot; 
+  [2] &quot;data/tokens_data/childID=10/part-0.parquet&quot;
+  [3] &quot;data/tokens_data/childID=11/part-6.parquet&quot;
+  [4] &quot;data/tokens_data/childID=12/part-3.parquet&quot;
+  [5] &quot;data/tokens_data/childID=13/part-1.parquet&quot;
+  [6] &quot;data/tokens_data/childID=14/part-2.parquet&quot;</code></pre>
+</div>
+<p>Finally, we complete the path using the “<a href="https://raw.githubusercontent.com/" class="uri">https://raw.githubusercontent.com/</a>…” url:</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='va'>parquet_files</span> <span class='op'>&lt;-</span> <span class='fu'><a href='https://rdrr.io/r/base/paste.html'>paste0</a></span><span class='op'>(</span></span>
+<span>  <span class='st'>"https://raw.githubusercontent.com/yjunechoe/repetition_events/master/"</span>,</span>
+<span>  <span class='va'>parquet_files</span></span>
+<span><span class='op'>)</span></span>
+<span><span class='fu'><a href='https://rdrr.io/r/utils/head.html'>head</a></span><span class='op'>(</span><span class='va'>parquet_files</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  [1] &quot;https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID=1/part-7.parquet&quot; 
+  [2] &quot;https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID=10/part-0.parquet&quot;
+  [3] &quot;https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID=11/part-6.parquet&quot;
+  [4] &quot;https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID=12/part-3.parquet&quot;
+  [5] &quot;https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID=13/part-1.parquet&quot;
+  [6] &quot;https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID=14/part-2.parquet&quot;</code></pre>
+</div>
+<p>Back on duckdb, we can use <code>PARQUET_SCAN</code> to read <em>multiple</em> files by supplying a vector <code>['file1.parquet', 'file2.parquet', ...]</code>.<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a> This time, we also ask for a quick computation to count the number of distinct <code>childID</code>s:</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='va'>query3</span> <span class='op'>&lt;-</span> <span class='fu'>glue</span><span class='fu'>::</span><span class='fu'><a href='https://glue.tidyverse.org/reference/glue_sql.html'>glue_sql</a></span><span class='op'>(</span><span class='st'>"</span></span>
+<span><span class='st'>  SELECT count(DISTINCT childID)</span></span>
+<span><span class='st'>  FROM PARQUET_SCAN(</span></span>
+<span><span class='st'>    [{parquet_files*}],</span></span>
+<span><span class='st'>    hive_partitioning=true</span></span>
+<span><span class='st'>  )</span></span>
+<span><span class='st'>"</span>, .con <span class='op'>=</span> <span class='va'>con</span><span class='op'>)</span></span>
+<span><span class='fu'><a href='https://rdrr.io/r/base/cat.html'>cat</a></span><span class='op'>(</span><span class='fu'><a href='https://rdrr.io/r/base/grep.html'>gsub</a></span><span class='op'>(</span><span class='st'>"^(.{80}).*(.{60})$"</span>, <span class='st'>"\\1 ... \\2"</span>, <span class='va'>query3</span><span class='op'>)</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  SELECT count(DISTINCT childID)
+  FROM PARQUET_SCAN(
+    [&#39;https://raw.githubusercont ... data/childID=9/part-64.parquet&#39;],
+    hive_partitioning=true
+  )</code></pre>
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'><a href='https://dbi.r-dbi.org/reference/dbGetQuery.html'>dbGetQuery</a></span><span class='op'>(</span><span class='va'>con</span>, <span class='va'>query3</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>    count(DISTINCT childID)
+  1                      70</code></pre>
+</div>
+<p>This returns <code>70</code> which matches the length of the <code>parquet_files</code> vector listing the files that had been partitioned by childID.</p>
+<p>For further analyses, we can <code>CREATE TABLE</code><a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a> our data in our in-memory database <code>con</code>:</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='va'>query4</span> <span class='op'>&lt;-</span> <span class='fu'>glue</span><span class='fu'>::</span><span class='fu'><a href='https://glue.tidyverse.org/reference/glue_sql.html'>glue_sql</a></span><span class='op'>(</span><span class='st'>"</span></span>
+<span><span class='st'>  CREATE TABLE tokens_data AS</span></span>
+<span><span class='st'>  SELECT *</span></span>
+<span><span class='st'>  FROM PARQUET_SCAN([{parquet_files*}], hive_partitioning=true)</span></span>
+<span><span class='st'>"</span>, .con <span class='op'>=</span> <span class='va'>con</span><span class='op'>)</span></span>
+<span><span class='fu'><a href='https://rdrr.io/r/base/invisible.html'>invisible</a></span><span class='op'>(</span><span class='fu'><a href='https://dbi.r-dbi.org/reference/dbExecute.html'>dbExecute</a></span><span class='op'>(</span><span class='va'>con</span>, <span class='va'>query4</span><span class='op'>)</span><span class='op'>)</span></span>
+<span><span class='fu'><a href='https://dbi.r-dbi.org/reference/dbListTables.html'>dbListTables</a></span><span class='op'>(</span><span class='va'>con</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  [1] &quot;tokens_data&quot;</code></pre>
+</div>
+<p>That lets us reference the table via <code>dplyr::tbl()</code>, at which point we can switch over to another high-level interface like <code>{dplyr}</code> to query it using its familiar functions:</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='kw'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='op'>(</span><span class='va'><a href='https://dplyr.tidyverse.org'>dplyr</a></span><span class='op'>)</span></span>
+<span><span class='va'>tokens_data</span> <span class='op'>&lt;-</span> <span class='fu'><a href='https://dplyr.tidyverse.org/reference/tbl.html'>tbl</a></span><span class='op'>(</span><span class='va'>con</span>, <span class='st'>"tokens_data"</span><span class='op'>)</span></span>
+<span></span>
+<span><span class='co'># Q: What are the most common verbs spoken to children in this sample?</span></span>
+<span><span class='va'>tokens_data</span> <span class='op'>|&gt;</span> </span>
+<span>  <span class='fu'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='op'>(</span><span class='va'>part_of_speech</span> <span class='op'>==</span> <span class='st'>"v"</span><span class='op'>)</span> <span class='op'>|&gt;</span> </span>
+<span>  <span class='fu'><a href='https://dplyr.tidyverse.org/reference/count.html'>count</a></span><span class='op'>(</span><span class='va'>gloss</span>, sort <span class='op'>=</span> <span class='cn'>TRUE</span><span class='op'>)</span> <span class='op'>|&gt;</span> </span>
+<span>  <span class='fu'><a href='https://rdrr.io/r/utils/head.html'>head</a></span><span class='op'>(</span><span class='op'>)</span> <span class='op'>|&gt;</span> </span>
+<span>  <span class='fu'><a href='https://dplyr.tidyverse.org/reference/compute.html'>collect</a></span><span class='op'>(</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  # A tibble: 6 × 2
+    gloss     n
+    &lt;chr&gt; &lt;dbl&gt;
+  1 go    13614
+  2 see   13114
+  3 do    11829
+  4 have  10794
+  5 want  10560
+  6 put    9190</code></pre>
+</div>
+<p>Combined, here’s one (hastily put together) attempt at wrapping this workflow into a function:</p>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='va'>load_dataset_from_gh</span> <span class='op'>&lt;-</span> <span class='kw'>function</span><span class='op'>(</span><span class='va'>con</span>, <span class='va'>tblname</span>, <span class='va'>user</span>, <span class='va'>repo</span>, <span class='va'>branch</span>, <span class='va'>regex</span>,</span>
+<span>                                 <span class='va'>partition</span> <span class='op'>=</span> <span class='cn'>TRUE</span>, <span class='va'>lazy</span> <span class='op'>=</span> <span class='cn'>TRUE</span><span class='op'>)</span> <span class='op'>{</span></span>
+<span>  </span>
+<span>  <span class='va'>allfiles</span> <span class='op'>&lt;-</span> <span class='fu'>gh</span><span class='fu'>::</span><span class='fu'><a href='https://gh.r-lib.org/reference/gh.html'>gh</a></span><span class='op'>(</span><span class='fu'>glue</span><span class='fu'>::</span><span class='fu'><a href='https://glue.tidyverse.org/reference/glue.html'>glue</a></span><span class='op'>(</span><span class='st'>"/repos/{user}/{repo}/git/trees/{branch}?recursive=true"</span><span class='op'>)</span><span class='op'>)</span><span class='op'>$</span><span class='va'>tree</span></span>
+<span>  <span class='va'>files_relpath</span> <span class='op'>&lt;-</span> <span class='fu'><a href='https://rdrr.io/r/base/grep.html'>grep</a></span><span class='op'>(</span><span class='va'>regex</span>, <span class='fu'><a href='https://rdrr.io/r/base/lapply.html'>sapply</a></span><span class='op'>(</span><span class='va'>allfiles</span>, <span class='va'>`[[`</span>, <span class='st'>"path"</span><span class='op'>)</span>, value <span class='op'>=</span> <span class='cn'>TRUE</span><span class='op'>)</span></span>
+<span>  <span class='co'># Use the actual Contents API here instead, if the repo is private</span></span>
+<span>  <span class='va'>files</span> <span class='op'>&lt;-</span> <span class='fu'>glue</span><span class='fu'>::</span><span class='fu'><a href='https://glue.tidyverse.org/reference/glue.html'>glue</a></span><span class='op'>(</span><span class='st'>"https://raw.githubusercontent.com/{user}/{repo}/{branch}/{files_relpath}"</span><span class='op'>)</span></span>
+<span>  </span>
+<span>  <span class='va'>type</span> <span class='op'>&lt;-</span> <span class='kw'>if</span> <span class='op'>(</span><span class='va'>lazy</span><span class='op'>)</span> <span class='fu'><a href='https://rdrr.io/r/base/substitute.html'>quote</a></span><span class='op'>(</span><span class='va'>VIEW</span><span class='op'>)</span> <span class='kw'>else</span> <span class='fu'><a href='https://rdrr.io/r/base/substitute.html'>quote</a></span><span class='op'>(</span><span class='va'>TABLE</span><span class='op'>)</span></span>
+<span>  <span class='va'>partition</span> <span class='op'>&lt;-</span> <span class='fu'><a href='https://rdrr.io/r/base/integer.html'>as.integer</a></span><span class='op'>(</span><span class='va'>partition</span><span class='op'>)</span></span>
+<span>  </span>
+<span>  <span class='fu'><a href='https://dbi.r-dbi.org/reference/dbExecute.html'>dbExecute</a></span><span class='op'>(</span><span class='va'>con</span>, <span class='st'>"LOAD httpfs;"</span><span class='op'>)</span></span>
+<span>  <span class='fu'><a href='https://dbi.r-dbi.org/reference/dbExecute.html'>dbExecute</a></span><span class='op'>(</span><span class='va'>con</span>, <span class='fu'>glue</span><span class='fu'>::</span><span class='fu'><a href='https://glue.tidyverse.org/reference/glue_sql.html'>glue_sql</a></span><span class='op'>(</span><span class='st'>"</span></span>
+<span><span class='st'>    CREATE {type} {`tblname`} AS</span></span>
+<span><span class='st'>    SELECT *</span></span>
+<span><span class='st'>    FROM PARQUET_SCAN([{parquet_files*}], hive_partitioning={partition})</span></span>
+<span><span class='st'>  "</span>, .con <span class='op'>=</span> <span class='va'>con</span><span class='op'>)</span><span class='op'>)</span></span>
+<span>  </span>
+<span>  <span class='fu'><a href='https://rdrr.io/r/base/invisible.html'>invisible</a></span><span class='op'>(</span><span class='cn'>TRUE</span><span class='op'>)</span></span>
+<span></span>
+<span><span class='op'>}</span></span>
+<span></span>
+<span><span class='va'>con2</span> <span class='op'>&lt;-</span> <span class='fu'><a href='https://dbi.r-dbi.org/reference/dbConnect.html'>dbConnect</a></span><span class='op'>(</span><span class='fu'><a href='https://r.duckdb.org/reference/duckdb.html'>duckdb</a></span><span class='op'>(</span><span class='op'>)</span><span class='op'>)</span></span>
+<span><span class='fu'>load_dataset_from_gh</span><span class='op'>(</span></span>
+<span>  con <span class='op'>=</span> <span class='va'>con2</span>,</span>
+<span>  tblname <span class='op'>=</span> <span class='st'>"tokens_data"</span>,</span>
+<span>  user <span class='op'>=</span> <span class='st'>"yjunechoe"</span>,</span>
+<span>  repo <span class='op'>=</span> <span class='st'>"repetition_events"</span>,</span>
+<span>  branch <span class='op'>=</span> <span class='st'>"master"</span>,</span>
+<span>  regex <span class='op'>=</span> <span class='st'>".*data/tokens_data/.*parquet$"</span></span>
+<span><span class='op'>)</span></span>
+<span><span class='fu'><a href='https://dplyr.tidyverse.org/reference/tbl.html'>tbl</a></span><span class='op'>(</span><span class='va'>con2</span>, <span class='st'>"tokens_data"</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  # Source:   table&lt;tokens_data&gt; [?? x 4]
+  # Database: DuckDB v1.0.0 [jchoe@Windows 10 x64:R 4.4.1/:memory:]
+     utterance_id gloss   part_of_speech childID
+            &lt;int&gt; &lt;chr&gt;   &lt;chr&gt;            &lt;dbl&gt;
+   1            1 www     &quot;&quot;                   1
+   2            2 bye     &quot;co&quot;                 1
+   3            3 mhm     &quot;co&quot;                 1
+   4            4 Mommy&#39;s &quot;n:prop&quot;             1
+   5            4 here    &quot;adv&quot;                1
+   6            5 wanna   &quot;mod:aux&quot;            1
+   7            5 sit     &quot;v&quot;                  1
+   8            5 down    &quot;adv&quot;                1
+   9            6 there   &quot;adv&quot;                1
+  10            7 let&#39;s   &quot;v&quot;                  1
+  # ℹ more rows</code></pre>
+</div>
+<h3 id="other-sources-for-data">Other sources for data</h3>
+<p>In writing this blog post, I’m indebted to all the knowledgeable folks on <a href="https://fosstodon.org/@yjunechoe/113040141392861021">Mastodon</a> who suggested their own recommended tools and workflows for various kinds of remote data. Unfortunately, I’m not familiar enough with most of them enough to do them justice, but I still wanted to record the suggestions I got from there for posterity.</p>
+<p>First, a post about reading remote files would not be complete without a mention of the wonderful <a href="https://googlesheets4.tidyverse.org/"><code>{googlesheets4}</code></a> package for reading from Google Sheets. I debated whether I should include a larger discussion of <code>{googlesheets4}</code>, and despite using it quite often myself I ultimately decided to omit it for the sake of space and because the package website is already very comprehensive. I would suggest starting from the <a href="https://googlesheets4.tidyverse.org/articles/googlesheets4.html"><em>Get Started</em></a> vignette if you are new and interested.</p>
+<p>Second, along the lines of <code>{osfr}</code>, there are other similar <a href="https://ropensci.org/">rOpensci</a> packages for retrieving data from the kinds of data sources that may be of interest to academics, such as <a href="https://docs.ropensci.org/deposits/"><code>{deposits}</code></a> for <a href="https://zenodo.org/">zenodo</a> and <a href="https://figshare.com/">figshare</a>, and <a href="https://docs.ropensci.org/piggyback/"><code>{piggyback}</code></a> for GitHub release assets (<a href="https://fosstodon.org/@maelle@mastodon.social/113044065044359603">Maëlle Salmon’s comment</a> pointed me to the first two; I responded with <a href="https://fosstodon.org/@yjunechoe/113045714727018087">some of my experiences</a>). I was also reminded that <a href="https://pins.rstudio.com/"><code>{pins}</code></a> exists - I’m not familiar with it myself so I thought I wouldn’t write anything for it here BUT <a href="https://fosstodon.org/@ivelasq3/113079721335721253">Isabella Velásquez</a> came in clutch sharing a recent talk on <a href="https://www.youtube.com/watch?v=u2OK8IWJWhk">dynamically loading up-to-date data with {pins}</a> which is a great demo of the unique strengths of <code>{pins}</code>.</p>
+<p>Lastly, I inadvertently(?) started some discussion around remotely accessing spatial files. I don’t work with spatial data <em>at all</em> but I can totally imagine how the hassle of the traditional click-download-find-load workflow would be even more pronounced for spatial data which are presumably much larger in size and more difficult to preview. On this note, I’ll just link to <a href="https://fosstodon.org/@cboettig@ecoevo.social">Carl Boettiger’s comment</a> about the fact that <a href="https://gdal.org/en/latest/user/virtual_file_systems.html">GDAL has a virtual file system</a> that you can interface with from R packages wrapping this API (ex: <a href="https://usdaforestservice.github.io/gdalraster/">{gdalraster}</a>), and to <a href="https://fosstodon.org/@mdsumner@rstats.me/113041566793211094">Michael Sumner’s comment/gist</a> + <a href="https://fosstodon.org/@ctoney/113043719551668933">Chris Toney’s comment</a> on the fact that you can even use this feature to stream non-spatial data!</p>
+<h3 id="miscellaneous-tips-and-tricks">Miscellaneous tips and tricks</h3>
+<p>I also have some random tricks that are more situational. Unfortunately, I can only recall like 20% of them at any given moment, so I’ll be updating this space as more come back to me:</p>
+<ul>
+<li><p>When reading remote <code>.rda</code> or <code>.RData</code> files with <code>load()</code>, you may need to wrap the link in <code>url()</code> first (ref: <a href="https://stackoverflow.com/questions/26108575/loading-rdata-files-from-url">stackoverflow</a>).</p></li>
+<li><p><a href="https://vroom.r-lib.org/"><code>{vroom}</code></a> can <a href="https://vroom.r-lib.org/articles/vroom.html#reading-remote-files">remotely read gzipped files</a>, without having to <code>download.file()</code> and <code>unzip()</code> first.</p></li>
+<li><p><a href="https://jeroen.cran.dev/curl/"><code>{curl}</code></a>, of course, will always have the most comprehensive set of low-level tools you need to read any arbitrary data remotely. For example, using <code>curl::curl_fetch_memory()</code> to read the <code>dplyr::storms</code> data again from the GitHub raw contents link:</p></li>
+</ul>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='va'>fetched</span> <span class='op'>&lt;-</span> <span class='fu'>curl</span><span class='fu'>::</span><span class='fu'><a href='https://rdrr.io/pkg/curl/man/curl_fetch.html'>curl_fetch_memory</a></span><span class='op'>(</span></span>
+<span>  <span class='st'>"https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/starwars.csv"</span></span>
+<span><span class='op'>)</span></span>
+<span><span class='fu'><a href='https://rdrr.io/r/utils/read.table.html'>read.csv</a></span><span class='op'>(</span>text <span class='op'>=</span> <span class='fu'><a href='https://rdrr.io/r/base/rawConversion.html'>rawToChar</a></span><span class='op'>(</span><span class='va'>fetched</span><span class='op'>$</span><span class='va'>content</span><span class='op'>)</span><span class='op'>)</span> <span class='op'>|&gt;</span> </span>
+<span>  <span class='fu'>dplyr</span><span class='fu'>::</span><span class='fu'><a href='https://pillar.r-lib.org/reference/glimpse.html'>glimpse</a></span><span class='op'>(</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  Rows: 87
+  Columns: 14
+  $ name       &lt;chr&gt; &quot;Luke Skywalker&quot;, &quot;C-3PO&quot;, &quot;R2-D2&quot;, &quot;Darth Vader&quot;, &quot;Leia Or…
+  $ height     &lt;int&gt; 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180, 2…
+  $ mass       &lt;dbl&gt; 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.…
+  $ hair_color &lt;chr&gt; &quot;blond&quot;, NA, NA, &quot;none&quot;, &quot;brown&quot;, &quot;brown, grey&quot;, &quot;brown&quot;, N…
+  $ skin_color &lt;chr&gt; &quot;fair&quot;, &quot;gold&quot;, &quot;white, blue&quot;, &quot;white&quot;, &quot;light&quot;, &quot;light&quot;, &quot;…
+  $ eye_color  &lt;chr&gt; &quot;blue&quot;, &quot;yellow&quot;, &quot;red&quot;, &quot;yellow&quot;, &quot;brown&quot;, &quot;blue&quot;, &quot;blue&quot;,…
+  $ birth_year &lt;dbl&gt; 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57.0, …
+  $ sex        &lt;chr&gt; &quot;male&quot;, &quot;none&quot;, &quot;none&quot;, &quot;male&quot;, &quot;female&quot;, &quot;male&quot;, &quot;female&quot;,…
+  $ gender     &lt;chr&gt; &quot;masculine&quot;, &quot;masculine&quot;, &quot;masculine&quot;, &quot;masculine&quot;, &quot;femini…
+  $ homeworld  &lt;chr&gt; &quot;Tatooine&quot;, &quot;Tatooine&quot;, &quot;Naboo&quot;, &quot;Tatooine&quot;, &quot;Alderaan&quot;, &quot;T…
+  $ species    &lt;chr&gt; &quot;Human&quot;, &quot;Droid&quot;, &quot;Droid&quot;, &quot;Human&quot;, &quot;Human&quot;, &quot;Human&quot;, &quot;Huma…
+  $ films      &lt;chr&gt; &quot;A New Hope, The Empire Strikes Back, Return of the Jedi, R…
+  $ vehicles   &lt;chr&gt; &quot;Snowspeeder, Imperial Speeder Bike&quot;, &quot;&quot;, &quot;&quot;, &quot;&quot;, &quot;Imperial…
+  $ starships  &lt;chr&gt; &quot;X-wing, Imperial shuttle&quot;, &quot;&quot;, &quot;&quot;, &quot;TIE Advanced x1&quot;, &quot;&quot;, …</code></pre>
+</div>
+<ul>
+<li><p>Even if you’re going the route of downloading the file first, <code>curl::multi_download()</code> can offer big performance improvements over <code>download.file()</code>.<a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a> Many <code>{curl}</code> functions can also handle <a href="https://fosstodon.org/@eliocamp@mastodon.social/111885424355264237">retries and stop/resumes</a> which is cool too.</p></li>
+<li><p><a href="https://httr2.r-lib.org/"><code>{httr2}</code></a> can capture a <em>continuous data stream</em> with <code>httr2::req_perform_stream()</code> up to a set time or size.</p></li>
+</ul>
+<h2 id="sessioninfo">sessionInfo()</h2>
+<div class="layout-chunk" data-layout="l-body">
+<div class="sourceCode">
+<pre class="sourceCode r"><code class="sourceCode r"><span><span class='fu'><a href='https://rdrr.io/r/utils/sessionInfo.html'>sessionInfo</a></span><span class='op'>(</span><span class='op'>)</span></span></code></pre>
+</div>
+<pre><code>  R version 4.4.1 (2024-06-14 ucrt)
+  Platform: x86_64-w64-mingw32/x64
+  Running under: Windows 11 x64 (build 22631)
+  
+  Matrix products: default
+  
+  
+  locale:
+  [1] LC_COLLATE=English_United States.utf8 
+  [2] LC_CTYPE=English_United States.utf8   
+  [3] LC_MONETARY=English_United States.utf8
+  [4] LC_NUMERIC=C                          
+  [5] LC_TIME=English_United States.utf8    
+  
+  time zone: America/New_York
+  tzcode source: internal
+  
+  attached base packages:
+  [1] stats     graphics  grDevices utils     datasets  methods   base     
+  
+  other attached packages:
+  [1] dplyr_1.1.4        duckdb_1.0.0       DBI_1.2.3          ggplot2_3.5.1.9000
+  
+  loaded via a namespace (and not attached):
+   [1] rappdirs_0.3.3    sass_0.4.9        utf8_1.2.4        generics_0.1.3   
+   [5] xml2_1.3.6        distill_1.6       digest_0.6.35     magrittr_2.0.3   
+   [9] evaluate_0.24.0   grid_4.4.1        blob_1.2.4        fastmap_1.1.1    
+  [13] jsonlite_1.8.8    processx_3.8.4    chromote_0.3.1    ps_1.7.5         
+  [17] promises_1.3.0    httr_1.4.7        rvest_1.0.4       purrr_1.0.2      
+  [21] fansi_1.0.6       scales_1.3.0      httr2_1.0.3.9000  jquerylib_0.1.4  
+  [25] cli_3.6.2         rlang_1.1.4       dbplyr_2.5.0      gitcreds_0.1.2   
+  [29] bit64_4.0.5       munsell_0.5.1     withr_3.0.1       cachem_1.0.8     
+  [33] yaml_2.3.8        tools_4.4.1       tzdb_0.4.0        memoise_2.0.1    
+  [37] colorspace_2.1-1  assertthat_0.2.1  curl_5.2.1        vctrs_0.6.5      
+  [41] R6_2.5.1          lifecycle_1.0.4   emphatic_0.1.8    bit_4.0.5        
+  [45] arrow_16.1.0      pkgconfig_2.0.3   pillar_1.9.0      bslib_0.7.0      
+  [49] later_1.3.2       gtable_0.3.5      glue_1.7.0        gh_1.4.0         
+  [53] Rcpp_1.0.12       xfun_0.47         tibble_3.2.1      tidyselect_1.2.1 
+  [57] highr_0.11        rstudioapi_0.16.0 knitr_1.47        htmltools_0.5.8.1
+  [61] websocket_1.4.1   rmarkdown_2.27    compiler_4.4.1    downlit_0.4.4</code></pre>
+</div>
+<div class="layout-chunk" data-layout="l-body">
+
+</div>
+<div class="sourceCode" id="cb27"><pre class="sourceCode r distill-force-highlighting-css"><code class="sourceCode r"></code></pre></div>
+<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
+<hr />
+<ol>
+<li id="fn1"><p>Thanks <a href="https://fosstodon.org/@tanho"><span class="citation" data-cites="tanho">@tanho</span></a> for pointing me to this at the <a href="https://fosstodon.org/@DSLC">R4DS/DSLC</a> slack.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn2"><p>Note that the API will actually generate a <em>new</em> token every time you send a request (and again, these tokens will expire with time).<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn3"><p>The special value <code>"clipboard"</code> works for most base-R read functions that take a <code>file</code> or <code>con</code> argument.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn4"><p>Thanks <a href="https://fosstodon.org/@coolbutuseless/113042231377588589"><span class="citation" data-cites="coolbutuseless">@coolbutuseless</span></a> for pointing me to <code>textConnection()</code>!<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn5"><p>Or <code>READ_PARQUET</code> - <a href="https://duckdb.org/docs/data/parquet/overview.html#read_parquet-function">same thing</a>.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn6"><p>We can also get this formatting with a combination of <code>shQuote()</code> and <code>toString()</code>.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn7"><p>Whereas <code>CREATE TABLE</code> results in a physical copy of the data in memory, <code>CREATE VIEW</code> will dynamically fetch the data from the source every time you query the table. If the data fits into memory (as in this case), I prefer <code>CREATE</code> as queries will be much faster (though you pay up-front for the time copying the data). If the data is larger than memory, <code>CREATE VIEW</code> will be your only option.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn8"><p>See an example implemented for <a href="https://github.com/ropensci/openalexR/pull/63"><code>{openalexR}</code></a>, an API package.<a href="#fnref8" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+</ol>
+</section>
+<!--radix_placeholder_article_footer-->
+<div class="article-footer">
+  <p class="social_footer">
+    <span class="article-sharing">
+      Share: &nbsp;
+      <a href="https://twitter.com/share?text=Read%20files%20on%20the%20web%20into%20R&amp;url=https%3A%2F%2Fyjunechoe.github.io%2Fposts%2F2024-09-22-fetch-files-web%2F" aria-label="share on twitter">
+        <i class="fab fa-twitter" aria-hidden="true"></i>
+      </a>
+    </span>
+  </p>
+</div>
+<!--/radix_placeholder_article_footer-->
+</div>
+
+<div class="d-appendix">
+</div>
+
+
+<!--radix_placeholder_site_after_body-->
+<!--/radix_placeholder_site_after_body-->
+<!--radix_placeholder_appendices-->
+<div class="appendix-bottom"></div>
+<!--/radix_placeholder_appendices-->
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
+<!--radix_placeholder_navigation_after_body--><html><body>
+<div class="distill-site-nav distill-site-footer">
+<p><span style="font-family: 'Open Sans'"> June Choe | Created with
+{<a href="https://github.com/rstudio/distill" style="color: #24addb">distill</a>}
+</span></p>
+</div>
+<!--/radix_placeholder_navigation_after_body-->
+</body></html>
+
+
+</body>
+
+</html>
diff --git a/docs/posts/2024-09-22-fetch-files-web/osf-MixedModels-dyestuff-download.jpg b/docs/posts/2024-09-22-fetch-files-web/osf-MixedModels-dyestuff-download.jpg
new file mode 100644
index 00000000..33565d0f
Binary files /dev/null and b/docs/posts/2024-09-22-fetch-files-web/osf-MixedModels-dyestuff-download.jpg differ
diff --git a/docs/posts/2024-09-22-fetch-files-web/osf-MixedModels-dyestuff.jpg b/docs/posts/2024-09-22-fetch-files-web/osf-MixedModels-dyestuff.jpg
new file mode 100644
index 00000000..e35df02b
Binary files /dev/null and b/docs/posts/2024-09-22-fetch-files-web/osf-MixedModels-dyestuff.jpg differ
diff --git a/docs/posts/posts.json b/docs/posts/posts.json
index a5832764..cdeebe72 100644
--- a/docs/posts/posts.json
+++ b/docs/posts/posts.json
@@ -1,4 +1,23 @@
 [
+  {
+    "path": "posts/2024-09-22-fetch-files-web/",
+    "title": "Read files on the web into R",
+    "description": "For the download-button-averse of us",
+    "author": [
+      {
+        "name": "June Choe",
+        "url": {}
+      }
+    ],
+    "date": "2024-09-22",
+    "categories": [
+      "tutorial"
+    ],
+    "contents": "\r\n\r\nContents\r\nGitHub (public repos)\r\nGitHub (gists)\r\nGitHub (private repos)\r\nOSF\r\nAside: Can’t go wrong with a copy-paste!\r\nOther goodies\r\nStreaming with {duckdb}\r\nOther sources for data\r\nMiscellaneous tips and tricks\r\n\r\nsessionInfo()\r\n\r\nEvery so often I’ll have a link to some file on hand and want to read it in R without going out of my way to browse the web page, find a download link, download it somewhere onto my computer, grab the path to it, and then finally read it into R.\r\nOver the years I’ve accumulated some tricks to get data into R “straight from a url”, even if the url does not point to the raw file contents itself. The method varies between data sources though, and I have a hard time keeping track of them in my head, so I thought I’d write some of these down for my own reference. This is not meant to be comprehensive though - keep in mind that I’m someone who primarily works with tabular data and interface with GitHub and OSF as data repositories.\r\nGitHub (public repos)\r\nGitHub has nice a point-and-click interface for browsing repositories and previewing files. For example, you can navigate to the dplyr::starwars dataset from tidyverse/dplyr, at https://github.com/tidyverse/dplyr/blob/main/data-raw/starwars.csv:\r\n\r\n\r\n\r\nThat url, despite ending in a .csv, does not point to the raw data - instead, the contents of the page is a full html document:\r\n\r\n\r\nrvest::read_html(\"https://github.com/tidyverse/dplyr/blob/main/data-raw/starwars.csv\")\r\n\r\n\r\n  {html_document}\r\n  <html lang=\"en\" data-color-mode=\"auto\" data-light-theme=\"light\" ...\r\n  [1] <head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8 ...\r\n  [2] <body class=\"logged-out env-production page-responsive\" style=\"word-wrap: ...\r\nTo actually point to the csv contents, we want to click on the Raw button to the top-right corner of the preview:\r\n\r\n\r\n\r\nThat gets us to the comma separated values we want, which is at a new url https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/starwars.csv:\r\n\r\n\r\n\r\nWe can then read from that URL at “raw.githubusercontent.com/…” using read.csv():\r\n\r\n\r\nread.csv(\"https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/starwars.csv\") |> \r\n  dplyr::glimpse()\r\n\r\n  Rows: 87\r\n  Columns: 14\r\n  $ name       <chr> \"Luke Skywalker\", \"C-3PO\", \"R2-D2\", \"Darth Vader\", \"Leia Or…\r\n  $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180, 2…\r\n  $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.…\r\n  $ hair_color <chr> \"blond\", NA, NA, \"none\", \"brown\", \"brown, grey\", \"brown\", N…\r\n  $ skin_color <chr> \"fair\", \"gold\", \"white, blue\", \"white\", \"light\", \"light\", \"…\r\n  $ eye_color  <chr> \"blue\", \"yellow\", \"red\", \"yellow\", \"brown\", \"blue\", \"blue\",…\r\n  $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57.0, …\r\n  $ sex        <chr> \"male\", \"none\", \"none\", \"male\", \"female\", \"male\", \"female\",…\r\n  $ gender     <chr> \"masculine\", \"masculine\", \"masculine\", \"masculine\", \"femini…\r\n  $ homeworld  <chr> \"Tatooine\", \"Tatooine\", \"Naboo\", \"Tatooine\", \"Alderaan\", \"T…\r\n  $ species    <chr> \"Human\", \"Droid\", \"Droid\", \"Human\", \"Human\", \"Human\", \"Huma…\r\n  $ films      <chr> \"A New Hope, The Empire Strikes Back, Return of the Jedi, R…\r\n  $ vehicles   <chr> \"Snowspeeder, Imperial Speeder Bike\", \"\", \"\", \"\", \"Imperial…\r\n  $ starships  <chr> \"X-wing, Imperial shuttle\", \"\", \"\", \"TIE Advanced x1\", \"\", …\r\n\r\nBut note that this method of “click the Raw button to get the corresponding raw.githubusercontent.com/… url to the file contents” will not work for file formats that cannot be displayed in plain text (clicking the button will instead download the file via your browser). So sometimes (especially when you have a binary file) you have to construct this “remote-readable” url to the file manually.\r\nFortunately, going from one link to the other is pretty formulaic. To demonstrate the difference with the url for the starwars dataset again:\r\n\r\n\r\nemphatic::hl_diff(\r\n  \"https://github.com/tidyverse/dplyr/blob/main/data-raw/starwars.csv\",\r\n  \"https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/starwars.csv\"\r\n)\r\n\r\n\r\n[1] \"https://    github           .com/tidyverse/dplyr/blob/main/data-raw/starwars.csv\"[1] \"https://raw.githubusercontent.com/tidyverse/dplyr     /main/data-raw/starwars.csv\"\r\n\r\n\r\nGitHub (gists)\r\nIt’s a similar idea with GitHub Gists, where I sometimes like to store small toy datasets for use in demos. For example, here’s a link to a simulated data for a Stroop experiment stroop.csv: https://gist.github.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6.\r\nBut that’s again a full-on webpage. The url which actually hosts the csv contents is https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw/c643b9760126d92b8ac100860ac5b50ba492f316/stroop.csv, which you can again get to by clicking the Raw button at the top-right corner of the gist\r\n\r\n\r\n\r\nBut actually, that long link you get by default points to the current commit, specifically. If you instead want the link to be kept up to date with the most recent commit, you can omit the second hash that comes after raw/:\r\n\r\n\r\nemphatic::hl_diff(\r\n  \"https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw/c643b9760126d92b8ac100860ac5b50ba492f316/stroop.csv\",\r\n  \"https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw/stroop.csv\"\r\n)\r\n\r\n\r\n[1] \"https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw/c643b9760126d92b8ac100860ac5b50ba492f316/stroop.csv\"[1] \"https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw                                         /stroop.csv\"\r\n\r\n\r\nIn practice, I don’t use gists to store replicability-sensitive data, so I prefer to just use the shorter link that’s not tied to a specific commit.\r\n\r\n\r\nread.csv(\"https://gist.githubusercontent.com/yjunechoe/17b3787fb7aec108c19b33d71bc19bc6/raw/stroop.csv\") |> \r\n  dplyr::glimpse()\r\n\r\n  Rows: 240\r\n  Columns: 5\r\n  $ subj      <chr> \"S01\", \"S01\", \"S01\", \"S01\", \"S01\", \"S01\", \"S01\", \"S01\", \"S02…\r\n  $ word      <chr> \"blue\", \"blue\", \"green\", \"green\", \"red\", \"red\", \"yellow\", \"y…\r\n  $ condition <chr> \"match\", \"mismatch\", \"match\", \"mismatch\", \"match\", \"mismatch…\r\n  $ accuracy  <int> 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, …\r\n  $ RT        <int> 400, 549, 576, 406, 296, 231, 433, 1548, 561, 1751, 286, 710…\r\n\r\nGitHub (private repos)\r\nWe now turn to the harder problem of accessing a file in a private GitHub repository. If you already have the GitHub webpage open and you’re signed in, you can follow the same step of copying the link that the Raw button redirects to.\r\nExcept this time, when you open the file at that url (assuming it can display in plain text), you’ll see the url come with a “token” attached at the end (I’ll show an example further down). This token is necessary to remotely access the data in a private repo. Once a token is generated, the file can be accessed using that token from anywhere, but note that it will expire at some point as GitHub refreshes tokens periodically (so treat them as if they’re for single use).\r\nFor a more robust approach, you can use the GitHub Contents API. If you have your credentials set up in {gh} (which you can check with gh::gh_whoami()), you can request a token-tagged url to the private file using the syntax:1\r\n\r\n\r\ngh::gh(\"/repos/{user}/{repo}/contents/{path}\")$download_url\r\n\r\n\r\nNote that this is actually also a general solution to getting a url to GitHub file contents. So for example, even without any credentials set up you can point to dplyr’s starwars.csv since that’s publicly accessible. This method produces the same “raw.githubusercontent.com/…” url we saw earlier:\r\n\r\n\r\ngh::gh(\"/repos/tidyverse/dplyr/contents/data-raw/starwars.csv\")$download_url\r\n\r\n  [1] \"https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/starwars.csv\"\r\n\r\nNow for a demonstration with a private repo, here is one of mine that you cannot access https://github.com/yjunechoe/my-super-secret-repo. But because I set up my credentials in {gh}, I can generate a link to a content within that repo with the access token attached (“?token=…”):\r\n\r\n\r\ngh::gh(\"/repos/yjunechoe/my-super-secret-repo/contents/README.md\")$download_url |> \r\n  # truncating\r\n  gsub(x = _, \"^(.{100}).*\", \"\\\\1...\")\r\n\r\n  [1] \"https://raw.githubusercontent.com/yjunechoe/my-super-secret-repo/main/README.md?token=AMTCUR2JPXCIX5...\"\r\n\r\nI can then use this url to read the private file:2\r\n\r\n\r\ngh::gh(\"/repos/yjunechoe/my-super-secret-repo/contents/README.md\")$download_url |> \r\n  readLines()\r\n\r\n  [1] \"Surprise!\"\r\n\r\nOSF\r\nOSF (the Open Science Framework) is another data repository that I interact with a lot, and reading files off of OSF follows a similar strategy to fetching public files on GitHub.\r\nConsider, for example, the dyestuff.arrow file in the OSF repository for MixedModels.jl. Browsing the repository through the point-and-click interface can get you to the page for the file at https://osf.io/9vztj/, where it shows:\r\n\r\n\r\n\r\nThe download button can be found inside the dropdown menubar to the right:\r\n\r\n\r\n\r\nBut instead of clicking on the icon (which will start a download via the browser), we can grab the embedded link address: https://osf.io/download/9vztj/. That url can then be passed directly into a read function:\r\n\r\n\r\narrow::read_feather(\"https://osf.io/download/9vztj/\") |> \r\n  dplyr::glimpse()\r\n\r\n  Rows: 30\r\n  Columns: 2\r\n  $ batch <fct> A, A, A, A, A, B, B, B, B, B, C, C, C, C, C, D, D, D, D, D, E, E…\r\n  $ yield <int> 1545, 1440, 1440, 1520, 1580, 1540, 1555, 1490, 1560, 1495, 1595…\r\n\r\nYou might have already caught on to this, but the pattern is to simply point to osf.io/download/ instead of osf.io/.\r\nThis method also works for view-only links to anonymized OSF projects as well. For example, this is an anonymized link to a csv file from one of my projects https://osf.io/tr8qm?view_only=998ad87d86cc4049af4ec6c96a91d9ad. Navigating to this link will show a web preview of the csv file contents.\r\nBy inserting /download into this url, we can read the csv file contents directly:\r\n\r\n\r\nread.csv(\"https://osf.io/download/tr8qm?view_only=998ad87d86cc4049af4ec6c96a91d9ad\") |> \r\n  head()\r\n\r\n        Item  plaus_bias trans_bias\r\n  1 Awakened -0.29631221 -1.2200901\r\n  2   Calmed  0.09877074 -0.4102332\r\n  3   Choked  1.28401957 -1.4284905\r\n  4  Dressed -0.59262442 -1.2087228\r\n  5   Failed -0.98770736  0.1098839\r\n  6  Groomed -1.08647810  0.9889550\r\n\r\nSee also the {osfr} package for a more principled interface to OSF.\r\nAside: Can’t go wrong with a copy-paste!\r\nReading remote files aside, I think it’s severely underrated how base R has a readClipboard() function and a collection of read.*() functions which can also read directly from a \"clipboard\" connection.3\r\nI sometimes do this for html/markdown summary tables that a website might display, or sometimes even for entire excel/googlesheets tables after doing a select-all + copy. For such relatively small chunks of data that you just want to quickly get into R, you can lean on base R’s clipboard functionalities.\r\nFor example, given this markdown table:\r\n\r\n\r\naggregate(mtcars, mpg ~ cyl, mean) |> \r\n  knitr::kable()\r\n\r\ncyl\r\nmpg\r\n4\r\n26.66364\r\n6\r\n19.74286\r\n8\r\n15.10000\r\n\r\nYou can copy its contents and run the following code to get that data back as an R data frame:\r\n\r\n\r\nread.delim(\"clipboard\")\r\n# Or, `read.delim(text = readClipboard())`\r\n\r\n\r\n\r\n    cyl      mpg\r\n  1   4 26.66364\r\n  2   6 19.74286\r\n  3   8 15.10000\r\n\r\nIf you’re instead copying something flat like a list of numbers or strings, you can also use scan() and specify the appropriate sep to get that data back as a vector:4\r\n\r\n\r\npaste(1:10, collapse = \", \") |> \r\n  cat()\r\n\r\n  1, 2, 3, 4, 5, 6, 7, 8, 9, 10\r\n\r\n\r\n\r\nscan(\"clipboard\", sep = \",\")\r\n# Or, `scan(textConnection(readClipboard()), sep = \",\")`\r\n\r\n\r\n\r\n   [1]  1  2  3  4  5  6  7  8  9 10\r\n\r\nIt should be noted though that parsing clipboard contents is not a robust feature in base R. If you want a more principled approach to reading data from clipboard, you should use {datapasta}. And for printing data for others to copy-paste into R, use {constructive}. See also {clipr} which extends clipboard read/write functionalities.\r\nOther goodies\r\n⚠️ What lies ahead are denser than the kinds of “low-tech” advice I wrote about above.\r\nStreaming with {duckdb}\r\nOne caveat to all the “read from web” approaches I covered above is that it often does not actually circumvent the action of downloading the file onto your computer. For example, when you read a file from “raw.githubusercontent.com/…” with read.csv(), there is an implicit download.file() of the data into the current R session’s tempdir().\r\nAn alternative that actually reads the data straight into memory is streaming. Streaming is moreso a feature of database languages, but there’s good integration of such tools with R, so this option is available from within R as well.\r\nHere, I briefly outline what I learned from (mostly) reading a blog post by François Michonneau, which covers how to stream remote files using {duckdb}. It’s pretty comprehensive but I wanted to make a template for just one method that I prefer.\r\nWe start by loading the {duckdb} package, creating a connection to an in-memory database, installing the httpfs extension (if not installed already), and loading httpfs for the database.\r\n\r\n\r\nlibrary(duckdb)\r\ncon <- dbConnect(duckdb())\r\n# dbExecute(con, \"INSTALL httpfs;\") # You may also need to \"INSTALL parquet;\"\r\ninvisible(dbExecute(con, \"LOAD httpfs;\"))\r\n\r\n\r\nFor this example I will use a parquet file from one of my projects which is hosted on GitHub: https://github.com/yjunechoe/repetition_events. The data I want to read is at the relative path /data/tokens_data/childID=1/part-7.parquet. I went ahead and converted that into the “raw contents” url shown below:\r\n\r\n\r\n# A parquet file of tokens from a sample of child-directed speech\r\nfile <- \"https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID%3D1/part-7.parquet\"\r\n\r\n# For comparison, reading its contents with {arrow}\r\narrow::read_parquet(file) |> \r\n  head(5)\r\n\r\n  # A tibble: 5 × 3\r\n    utterance_id gloss   part_of_speech\r\n           <int> <chr>   <chr>         \r\n  1            1 www     \"\"            \r\n  2            2 bye     \"co\"          \r\n  3            3 mhm     \"co\"          \r\n  4            4 Mommy's \"n:prop\"      \r\n  5            4 here    \"adv\"\r\n\r\nIn duckdb, the httpfs extension we loaded above allows PARQUET_SCAN5 to read a remote parquet file.\r\n\r\n\r\nquery1 <- glue::glue_sql(\"\r\n  SELECT *\r\n  FROM PARQUET_SCAN({`file`})\r\n  LIMIT 5;\r\n\", .con = con)\r\ncat(query1)\r\n\r\n  SELECT *\r\n  FROM PARQUET_SCAN(\"https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID%3D1/part-7.parquet\")\r\n  LIMIT 5;\r\n\r\ndbGetQuery(con, query1)\r\n\r\n    utterance_id   gloss part_of_speech\r\n  1            1     www               \r\n  2            2     bye             co\r\n  3            3     mhm             co\r\n  4            4 Mommy's         n:prop\r\n  5            4    here            adv\r\n\r\nAnd actually, in my case, the parquet file represents one of many files that had been previously split up via hive partitioning. To preserve this metadata even as I read in just a single file, I need to do two things:\r\nSpecify hive_partitioning=true when calling PARQUET_SCAN.\r\nEnsure that the hive-partitioning syntax is represented in the url with URLdecode() (since the = character can sometimes be escaped, as in this case).\r\n\r\n\r\nemphatic::hl_diff(file, URLdecode(file))\r\n\r\n\r\n[1] \"https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID%3D1/part-7.parquet\"[1] \"https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID=  1/part-7.parquet\"\r\n\r\n\r\nWith that, the data now shows that the observations are from child #1 in the sample.\r\n\r\n\r\nfile <- URLdecode(file)\r\nquery2 <- glue::glue_sql(\"\r\n  SELECT *\r\n  FROM PARQUET_SCAN(\r\n    {`file`},\r\n    hive_partitioning=true\r\n  )\r\n  LIMIT 5;\r\n\", .con = con)\r\ncat(query2)\r\n\r\n  SELECT *\r\n  FROM PARQUET_SCAN(\r\n    \"https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID=1/part-7.parquet\",\r\n    hive_partitioning=true\r\n  )\r\n  LIMIT 5;\r\n\r\ndbGetQuery(con, query2)\r\n\r\n    utterance_id   gloss part_of_speech childID\r\n  1            1     www                      1\r\n  2            2     bye             co       1\r\n  3            3     mhm             co       1\r\n  4            4 Mommy's         n:prop       1\r\n  5            4    here            adv       1\r\n\r\nTo do this more programmatically over all parquet files under /tokens_data in the repository, we need to transition to using the GitHub Trees API. The idea is similar to using the Contents API but now we are requesting a list of all files using the following syntax:\r\n\r\n\r\ngh::gh(\"/repos/{user}/{repo}/git/trees/{branch/tag/commitSHA}?recursive=true\")$tree\r\n\r\n\r\nTo get the file tree of the repo on the master branch, we use:\r\n\r\n\r\nfiles <- gh::gh(\"/repos/yjunechoe/repetition_events/git/trees/master?recursive=true\")$tree\r\n\r\n\r\nWith recursive=true, this returns all files in the repo. Then, we can filter for just the parquet files we want with a little regex:\r\n\r\n\r\nparquet_files <- sapply(files, `[[`, \"path\") |> \r\n  grep(x = _, pattern = \".*/tokens_data/.*parquet$\", value = TRUE)\r\nlength(parquet_files)\r\n\r\n  [1] 70\r\n\r\nhead(parquet_files)\r\n\r\n  [1] \"data/tokens_data/childID=1/part-7.parquet\" \r\n  [2] \"data/tokens_data/childID=10/part-0.parquet\"\r\n  [3] \"data/tokens_data/childID=11/part-6.parquet\"\r\n  [4] \"data/tokens_data/childID=12/part-3.parquet\"\r\n  [5] \"data/tokens_data/childID=13/part-1.parquet\"\r\n  [6] \"data/tokens_data/childID=14/part-2.parquet\"\r\n\r\nFinally, we complete the path using the “https://raw.githubusercontent.com/…” url:\r\n\r\n\r\nparquet_files <- paste0(\r\n  \"https://raw.githubusercontent.com/yjunechoe/repetition_events/master/\",\r\n  parquet_files\r\n)\r\nhead(parquet_files)\r\n\r\n  [1] \"https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID=1/part-7.parquet\" \r\n  [2] \"https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID=10/part-0.parquet\"\r\n  [3] \"https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID=11/part-6.parquet\"\r\n  [4] \"https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID=12/part-3.parquet\"\r\n  [5] \"https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID=13/part-1.parquet\"\r\n  [6] \"https://raw.githubusercontent.com/yjunechoe/repetition_events/master/data/tokens_data/childID=14/part-2.parquet\"\r\n\r\nBack on duckdb, we can use PARQUET_SCAN to read multiple files by supplying a vector ['file1.parquet', 'file2.parquet', ...].6 This time, we also ask for a quick computation to count the number of distinct childIDs:\r\n\r\n\r\nquery3 <- glue::glue_sql(\"\r\n  SELECT count(DISTINCT childID)\r\n  FROM PARQUET_SCAN(\r\n    [{parquet_files*}],\r\n    hive_partitioning=true\r\n  )\r\n\", .con = con)\r\ncat(gsub(\"^(.{80}).*(.{60})$\", \"\\\\1 ... \\\\2\", query3))\r\n\r\n  SELECT count(DISTINCT childID)\r\n  FROM PARQUET_SCAN(\r\n    ['https://raw.githubusercont ... data/childID=9/part-64.parquet'],\r\n    hive_partitioning=true\r\n  )\r\n\r\ndbGetQuery(con, query3)\r\n\r\n    count(DISTINCT childID)\r\n  1                      70\r\n\r\nThis returns 70 which matches the length of the parquet_files vector listing the files that had been partitioned by childID.\r\nFor further analyses, we can CREATE TABLE7 our data in our in-memory database con:\r\n\r\n\r\nquery4 <- glue::glue_sql(\"\r\n  CREATE TABLE tokens_data AS\r\n  SELECT *\r\n  FROM PARQUET_SCAN([{parquet_files*}], hive_partitioning=true)\r\n\", .con = con)\r\ninvisible(dbExecute(con, query4))\r\ndbListTables(con)\r\n\r\n  [1] \"tokens_data\"\r\n\r\nThat lets us reference the table via dplyr::tbl(), at which point we can switch over to another high-level interface like {dplyr} to query it using its familiar functions:\r\n\r\n\r\nlibrary(dplyr)\r\ntokens_data <- tbl(con, \"tokens_data\")\r\n\r\n# Q: What are the most common verbs spoken to children in this sample?\r\ntokens_data |> \r\n  filter(part_of_speech == \"v\") |> \r\n  count(gloss, sort = TRUE) |> \r\n  head() |> \r\n  collect()\r\n\r\n  # A tibble: 6 × 2\r\n    gloss     n\r\n    <chr> <dbl>\r\n  1 go    13614\r\n  2 see   13114\r\n  3 do    11829\r\n  4 have  10794\r\n  5 want  10560\r\n  6 put    9190\r\n\r\nCombined, here’s one (hastily put together) attempt at wrapping this workflow into a function:\r\n\r\n\r\nload_dataset_from_gh <- function(con, tblname, user, repo, branch, regex,\r\n                                 partition = TRUE, lazy = TRUE) {\r\n  \r\n  allfiles <- gh::gh(glue::glue(\"/repos/{user}/{repo}/git/trees/{branch}?recursive=true\"))$tree\r\n  files_relpath <- grep(regex, sapply(allfiles, `[[`, \"path\"), value = TRUE)\r\n  # Use the actual Contents API here instead, if the repo is private\r\n  files <- glue::glue(\"https://raw.githubusercontent.com/{user}/{repo}/{branch}/{files_relpath}\")\r\n  \r\n  type <- if (lazy) quote(VIEW) else quote(TABLE)\r\n  partition <- as.integer(partition)\r\n  \r\n  dbExecute(con, \"LOAD httpfs;\")\r\n  dbExecute(con, glue::glue_sql(\"\r\n    CREATE {type} {`tblname`} AS\r\n    SELECT *\r\n    FROM PARQUET_SCAN([{parquet_files*}], hive_partitioning={partition})\r\n  \", .con = con))\r\n  \r\n  invisible(TRUE)\r\n\r\n}\r\n\r\ncon2 <- dbConnect(duckdb())\r\nload_dataset_from_gh(\r\n  con = con2,\r\n  tblname = \"tokens_data\",\r\n  user = \"yjunechoe\",\r\n  repo = \"repetition_events\",\r\n  branch = \"master\",\r\n  regex = \".*data/tokens_data/.*parquet$\"\r\n)\r\ntbl(con2, \"tokens_data\")\r\n\r\n  # Source:   table<tokens_data> [?? x 4]\r\n  # Database: DuckDB v1.0.0 [jchoe@Windows 10 x64:R 4.4.1/:memory:]\r\n     utterance_id gloss   part_of_speech childID\r\n            <int> <chr>   <chr>            <dbl>\r\n   1            1 www     \"\"                   1\r\n   2            2 bye     \"co\"                 1\r\n   3            3 mhm     \"co\"                 1\r\n   4            4 Mommy's \"n:prop\"             1\r\n   5            4 here    \"adv\"                1\r\n   6            5 wanna   \"mod:aux\"            1\r\n   7            5 sit     \"v\"                  1\r\n   8            5 down    \"adv\"                1\r\n   9            6 there   \"adv\"                1\r\n  10            7 let's   \"v\"                  1\r\n  # ℹ more rows\r\n\r\nOther sources for data\r\nIn writing this blog post, I’m indebted to all the knowledgeable folks on Mastodon who suggested their own recommended tools and workflows for various kinds of remote data. Unfortunately, I’m not familiar enough with most of them enough to do them justice, but I still wanted to record the suggestions I got from there for posterity.\r\nFirst, a post about reading remote files would not be complete without a mention of the wonderful {googlesheets4} package for reading from Google Sheets. I debated whether I should include a larger discussion of {googlesheets4}, and despite using it quite often myself I ultimately decided to omit it for the sake of space and because the package website is already very comprehensive. I would suggest starting from the Get Started vignette if you are new and interested.\r\nSecond, along the lines of {osfr}, there are other similar rOpensci packages for retrieving data from the kinds of data sources that may be of interest to academics, such as {deposits} for zenodo and figshare, and {piggyback} for GitHub release assets (Maëlle Salmon’s comment pointed me to the first two; I responded with some of my experiences). I was also reminded that {pins} exists - I’m not familiar with it myself so I thought I wouldn’t write anything for it here BUT Isabella Velásquez came in clutch sharing a recent talk on dynamically loading up-to-date data with {pins} which is a great demo of the unique strengths of {pins}.\r\nLastly, I inadvertently(?) started some discussion around remotely accessing spatial files. I don’t work with spatial data at all but I can totally imagine how the hassle of the traditional click-download-find-load workflow would be even more pronounced for spatial data which are presumably much larger in size and more difficult to preview. On this note, I’ll just link to Carl Boettiger’s comment about the fact that GDAL has a virtual file system that you can interface with from R packages wrapping this API (ex: {gdalraster}), and to Michael Sumner’s comment/gist + Chris Toney’s comment on the fact that you can even use this feature to stream non-spatial data!\r\nMiscellaneous tips and tricks\r\nI also have some random tricks that are more situational. Unfortunately, I can only recall like 20% of them at any given moment, so I’ll be updating this space as more come back to me:\r\nWhen reading remote .rda or .RData files with load(), you may need to wrap the link in url() first (ref: stackoverflow).\r\n{vroom} can remotely read gzipped files, without having to download.file() and unzip() first.\r\n{curl}, of course, will always have the most comprehensive set of low-level tools you need to read any arbitrary data remotely. For example, using curl::curl_fetch_memory() to read the dplyr::storms data again from the GitHub raw contents link:\r\n\r\n\r\nfetched <- curl::curl_fetch_memory(\r\n  \"https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/starwars.csv\"\r\n)\r\nread.csv(text = rawToChar(fetched$content)) |> \r\n  dplyr::glimpse()\r\n\r\n  Rows: 87\r\n  Columns: 14\r\n  $ name       <chr> \"Luke Skywalker\", \"C-3PO\", \"R2-D2\", \"Darth Vader\", \"Leia Or…\r\n  $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180, 2…\r\n  $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.…\r\n  $ hair_color <chr> \"blond\", NA, NA, \"none\", \"brown\", \"brown, grey\", \"brown\", N…\r\n  $ skin_color <chr> \"fair\", \"gold\", \"white, blue\", \"white\", \"light\", \"light\", \"…\r\n  $ eye_color  <chr> \"blue\", \"yellow\", \"red\", \"yellow\", \"brown\", \"blue\", \"blue\",…\r\n  $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57.0, …\r\n  $ sex        <chr> \"male\", \"none\", \"none\", \"male\", \"female\", \"male\", \"female\",…\r\n  $ gender     <chr> \"masculine\", \"masculine\", \"masculine\", \"masculine\", \"femini…\r\n  $ homeworld  <chr> \"Tatooine\", \"Tatooine\", \"Naboo\", \"Tatooine\", \"Alderaan\", \"T…\r\n  $ species    <chr> \"Human\", \"Droid\", \"Droid\", \"Human\", \"Human\", \"Human\", \"Huma…\r\n  $ films      <chr> \"A New Hope, The Empire Strikes Back, Return of the Jedi, R…\r\n  $ vehicles   <chr> \"Snowspeeder, Imperial Speeder Bike\", \"\", \"\", \"\", \"Imperial…\r\n  $ starships  <chr> \"X-wing, Imperial shuttle\", \"\", \"\", \"TIE Advanced x1\", \"\", …\r\n\r\nEven if you’re going the route of downloading the file first, curl::multi_download() can offer big performance improvements over download.file().8 Many {curl} functions can also handle retries and stop/resumes which is cool too.\r\n{httr2} can capture a continuous data stream with httr2::req_perform_stream() up to a set time or size.\r\nsessionInfo()\r\n\r\n\r\nsessionInfo()\r\n\r\n  R version 4.4.1 (2024-06-14 ucrt)\r\n  Platform: x86_64-w64-mingw32/x64\r\n  Running under: Windows 11 x64 (build 22631)\r\n  \r\n  Matrix products: default\r\n  \r\n  \r\n  locale:\r\n  [1] LC_COLLATE=English_United States.utf8 \r\n  [2] LC_CTYPE=English_United States.utf8   \r\n  [3] LC_MONETARY=English_United States.utf8\r\n  [4] LC_NUMERIC=C                          \r\n  [5] LC_TIME=English_United States.utf8    \r\n  \r\n  time zone: America/New_York\r\n  tzcode source: internal\r\n  \r\n  attached base packages:\r\n  [1] stats     graphics  grDevices utils     datasets  methods   base     \r\n  \r\n  other attached packages:\r\n  [1] dplyr_1.1.4        duckdb_1.0.0       DBI_1.2.3          ggplot2_3.5.1.9000\r\n  \r\n  loaded via a namespace (and not attached):\r\n   [1] rappdirs_0.3.3    sass_0.4.9        utf8_1.2.4        generics_0.1.3   \r\n   [5] xml2_1.3.6        distill_1.6       digest_0.6.35     magrittr_2.0.3   \r\n   [9] evaluate_0.24.0   grid_4.4.1        blob_1.2.4        fastmap_1.1.1    \r\n  [13] jsonlite_1.8.8    processx_3.8.4    chromote_0.3.1    ps_1.7.5         \r\n  [17] promises_1.3.0    httr_1.4.7        rvest_1.0.4       purrr_1.0.2      \r\n  [21] fansi_1.0.6       scales_1.3.0      httr2_1.0.3.9000  jquerylib_0.1.4  \r\n  [25] cli_3.6.2         rlang_1.1.4       dbplyr_2.5.0      gitcreds_0.1.2   \r\n  [29] bit64_4.0.5       munsell_0.5.1     withr_3.0.1       cachem_1.0.8     \r\n  [33] yaml_2.3.8        tools_4.4.1       tzdb_0.4.0        memoise_2.0.1    \r\n  [37] colorspace_2.1-1  assertthat_0.2.1  curl_5.2.1        vctrs_0.6.5      \r\n  [41] R6_2.5.1          lifecycle_1.0.4   emphatic_0.1.8    bit_4.0.5        \r\n  [45] arrow_16.1.0      pkgconfig_2.0.3   pillar_1.9.0      bslib_0.7.0      \r\n  [49] later_1.3.2       gtable_0.3.5      glue_1.7.0        gh_1.4.0         \r\n  [53] Rcpp_1.0.12       xfun_0.47         tibble_3.2.1      tidyselect_1.2.1 \r\n  [57] highr_0.11        rstudioapi_0.16.0 knitr_1.47        htmltools_0.5.8.1\r\n  [61] websocket_1.4.1   rmarkdown_2.27    compiler_4.4.1    downlit_0.4.4\r\n\r\n\r\n\r\n\r\n\r\nThanks @tanho for pointing me to this at the R4DS/DSLC slack.↩︎\r\nNote that the API will actually generate a new token every time you send a request (and again, these tokens will expire with time).↩︎\r\nThe special value \"clipboard\" works for most base-R read functions that take a file or con argument.↩︎\r\nThanks @coolbutuseless for pointing me to textConnection()!↩︎\r\nOr READ_PARQUET - same thing.↩︎\r\nWe can also get this formatting with a combination of shQuote() and toString().↩︎\r\nWhereas CREATE TABLE results in a physical copy of the data in memory, CREATE VIEW will dynamically fetch the data from the source every time you query the table. If the data fits into memory (as in this case), I prefer CREATE as queries will be much faster (though you pay up-front for the time copying the data). If the data is larger than memory, CREATE VIEW will be your only option.↩︎\r\nSee an example implemented for {openalexR}, an API package.↩︎\r\n",
+    "preview": "posts/2024-09-22-fetch-files-web/github-dplyr-starwars.jpg",
+    "last_modified": "2024-09-22T18:49:08-04:00",
+    "input_file": {}
+  },
   {
     "path": "posts/2024-07-21-enumerate-possible-options/",
     "title": "Naming patterns for boolean enums",
diff --git a/docs/search.json b/docs/search.json
index 4be93676..9042ca1e 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -5,7 +5,7 @@
       "title": "Blog Posts",
       "author": [],
       "contents": "\r\n\r\n\r\n\r\n\r\n",
-      "last_modified": "2024-09-20T10:56:03-04:00"
+      "last_modified": "2024-09-22T18:50:13-04:00"
     },
     {
       "path": "index.html",
@@ -13,21 +13,21 @@
       "description": "Ph.D. Candidate in Linguistics",
       "author": [],
       "contents": "\r\n\r\n\r\n\r\n\r\n\r\n\r\n Education\r\n\r\n\r\nB.A. (hons.) Northwestern University (2016–20)\r\n\r\n\r\nPh.D. University of Pennsylvania (2020 ~)\r\n\r\n\r\n Interests\r\n\r\n\r\n(Computational) Psycholinguistics\r\n\r\n\r\nLanguage Acquisition\r\n\r\n\r\nSentence Processing\r\n\r\n\r\nProsody\r\n\r\n\r\nQuantitative Methods\r\n\r\n\r\n\r\n\r\n\r\n Methods:\r\n\r\nWeb-based experiments, eye-tracking, self-paced reading, corpus analysis\r\n\r\n\r\n\r\n Programming:\r\n\r\nR (fluent) | HTML/CSS, Javascript, Julia (proficient) | Python (coursework)\r\n\r\n\r\n\r\n\r\n\r\nI am a PhD candidate in Linguistics at the University of Pennsylvania, and a student affiliate of Penn MindCORE and the Language and Communication Sciences program. I am a psycholinguist broadly interested in experimental approaches to studying meaning, of various flavors. My advisor is Anna Papafragou and I am a member of the Language & Cognition Lab.\r\nI received my B.A. in Linguistics from Northwestern University, where I worked with Jennifer Cole, Masaya Yoshida, and Annette D’Onofrio. I also worked as a research assistant for the Language, Education, and Reading Neuroscience Lab. My thesis explored the role of prosodic focus in garden-path reanalysis.\r\nBeyond linguistics research, I have interests in data visualization, science communication, and the R programming language. I author packages in statistical computing and graphics (ex: ggtrace, jlmerclusterperm) and collaborate on other open-source software (ex: openalexR, pointblank). I also maintain a technical blog as a hobby and occasionally take on small statistical consulting projects.\r\n\r\n\r\n\r\n\r\ncontact me:  yjchoe@sas.upenn.edu\r\n\r\n\r\n\r\n\r\n\r\n\r\n",
-      "last_modified": "2024-09-20T11:03:58-04:00"
+      "last_modified": "2024-09-22T18:50:15-04:00"
     },
     {
       "path": "news.html",
       "title": "News",
       "author": [],
       "contents": "\r\n\r\n\r\nFor more of my personal news external/tangential to research\r\n2023\r\nAugust\r\nI was unfortunately not able to make it in person to JSM 2023 but have my pre-recorded talk has been uploaded!\r\nJune\r\nMy package jlmerclusterperm was published on CRAN!\r\nApril\r\nI was accepted to SMLP (Summer School on Statistical Methods for Linguistics and Psychology), to be held in September at the University of Potsdam, Germany! I will be joining the “Advanced methods in frequentist statistics with Julia” stream. Huge thanks to MindCORE for funding my travels to attend!\r\nJanuary\r\nI received the ASA Statistical Computing and Graphics student award for my paper Sublayer modularity in the Grammar of Graphics! I will be presenting my work at the 2023 Joint Statistical Meetings in Toronto in August.\r\n2022\r\nSeptember\r\nI was invited to a Korean data science podcast dataholic (데이터홀릭) to talk about my experience presenting at the RStudio and useR conferences! Part 1, Part 2\r\nAugust\r\nI led a workshop on IBEX and PCIbex with Nayoun Kim at the Seoul International Conference on Linguistics (SICOL 2022).\r\nJuly\r\nI attended my first in-person R conference at rstudio::conf(2022) and gave a talk on ggplot internals.\r\nJune\r\nI gave a talk on my package {ggtrace} at the useR! 2022 conference. I was awarded the diversity scholarship which covered my registration and workshop fees. My reflections\r\nI gave a talk at RLadies philly on using dplyr’s slice() function for row-relational operations.\r\n2021\r\nJuly\r\nMy tutorial on custom fonts in R was featured as a highlight on the R Weekly podcast!\r\nJune\r\nI gave a talk at RLadies philly on using icon fonts for data viz! I also wrote a follow-up blog post that goes deeper into font rendering in R.\r\nMay\r\nSnowGlobe, a project started in my undergrad, was featured in an article by the Northwestern University Library. We also had a workshop for SnowGlobe which drew participants from over a hundred universities!\r\nJanuary\r\nI joined Nayoun Kim for a workshop on experimental syntax conducted in Korean and held at Sungkyunkwan University (Korea). I helped design materials for a session on scripting online experiments with IBEX, including interactive slides made with R!\r\n2020\r\nNovember\r\nI joined designer Will Chase on his stream to talk about the psycholinguistics of speech production for a data viz project on Michael’s speech errors in The Office. It was a very cool and unique opportunity to bring my two interests together!\r\nOctober\r\nMy tutorial on {ggplot2} stat_*() functions was featured as a highlight on the R Weekly podcast, which curates weekly updates from the R community.\r\nI became a data science tutor at MindCORE to help researchers at Penn with data visualization and R programming.\r\nSeptember\r\nI have moved to Philadelphia to start my PhD in Linguistics at the University of Pennsylvania!\r\nJune\r\nI graduated from Northwestern University with a B.A. in Linguistics (with honors)! I was also elected into Phi Beta Kappa and appointed as the Senior Marshal for Linguistics.\r\n\r\n\r\n\r\n",
-      "last_modified": "2024-09-20T10:56:06-04:00"
+      "last_modified": "2024-09-22T18:50:19-04:00"
     },
     {
       "path": "research.html",
       "title": "Research and activities",
       "author": [],
       "contents": "\r\n\r\nContents\r\nAcademic research output\r\nPeer-reviewed Papers\r\nConference Talks\r\nConference Presentations\r\n\r\nResearch activities in FOSS\r\nPapers\r\nTalks\r\nSoftware\r\n\r\nTeaching\r\nPositions held\r\nWorkshops led\r\nGuest lectures\r\n\r\nProfessional activities\r\nEditor\r\nReviewer\r\nMembership\r\n\r\n\r\nLinks: Google Scholar, Github, OSF\r\nAcademic research output\r\nPeer-reviewed Papers\r\nJune Choe, and Anna Papafragou. (2023). The acquisition of subordinate nouns as pragmatic inference. Journal of Memory and Language, 132, 104432. DOI: https://doi.org/10.1016/j.jml.2023.104432. PDF OSF\r\nJune Choe, Yiran Chen, May Pik Yu Chan, Aini Li, Xin Gao, and Nicole Holliday. (2022). Language-specific Effects on Automatic Speech Recognition Errors for World Englishes. In Proceedings of the 29th International Conference on Computational Linguistics, 7177–7186.\r\nMay Pik Yu Chan, June Choe, Aini Li, Yiran Chen, Xin Gao, and Nicole Holliday. (2022). Training and typological bias in ASR performance for world Englishes. In Proceedings of Interspeech 2022, 1273-1277. DOI: 10.21437/Interspeech.2022-10869\r\nJune Choe, Masaya Yoshida, and Jennifer Cole. (2022). The role of prosodic focus in the reanalysis of garden path sentences: Depth of semantic processing impedes the revision of an erroneous local analysis. Glossa Psycholinguistics, 1(1). DOI: 10.5070/G601136\r\nJune Choe, and Anna Papafragou. (2022). The acquisition of subordinate nouns as pragmatic inference: Semantic alternatives modulate subordinate meanings. In Proceedings of the Annual Meeting of the Cognitive Science Society, 44, 2745-2752.\r\nSean McWeeny, Jinnie S. Choi, June Choe, Alexander LaTourette, Megan Y. Roberts, and Elizabeth S. Norton. (2022). Rapid automatized naming (RAN) as a kindergarten predictor of future reading in English: A systematic review and meta-analysis. Reading Research Quarterly, 57(4), 1187–1211. DOI: 10.1002/rrq.467\r\nConference Talks\r\nJune Choe, and Anna Papafragou. Children’s sensitivity to informativeness in naming: basic-level vs. superordinate nouns. Talk at the 101st Linguistic Society of America (LSA) conference. 9-12 January 2025. Philadelphia.\r\nJune Choe. Distributional signatures of superordinate nouns. Talk at the 10th MACSIM conference. 6 April 2024. University of Maryland, College Park, MD.\r\nJune Choe. Sub-layer modularity in the Grammar of Graphics. Talk at the 2023 Joint Statistical Meetings, 5-10 August 2023. Toronto, Canada. American Statistical Association (ASA) student paper award in Statistical Computing and Graphics. Paper\r\nJune Choe. Persona-based social expectations in sentence processing and comprehension. Talk at the Language, Stereotypes & Social Cognition workshop, 22-23 May, 2023. University of Pennsylvania, PA.\r\nJune Choe, and Anna Papafragou. Lexical alternatives and the acquisition of subordinate nouns. Talk at the 47th Boston University Conference on Language Development (BUCLD), 3-6 November, 2022. Boston University, Boston, MA. Slides\r\nJune Choe, Yiran Chen, May Pik Yu Chan, Aini Li, Xin Gao and Nicole Holliday. (2022). Language-specific Effects on Automatic Speech Recognition Errors in American English. Talk at the 28th International Conference on Computational Linguistics (CoLing), 12-17 October, 2022. Gyeongju, South Korea. Slides\r\nMay Pik Yu Chan, June Choe, Aini Li, Yiran Chen, Xin Gao and Nicole Holliday. (2022). Training and typological bias in ASR performance for world Englishes. Talk at the 23rd Conference of the International Speech Communication Association (INTERSPEECH), 18-22 September, 2022. Incheon, South Korea.\r\nConference Presentations\r\nJune Choe, and Anna Papafragou. Distributional signatures of superordinate nouns. Poster presented at the 48th Boston University Conference on Language Development (BUCLD), 2-5 November, 2023. Boston University, Boston, MA. Abstract Poster\r\nJune Choe, and Anna Papafragou. Pragmatic underpinnings of the basic-level bias. Poster presented at the 48th Boston University Conference on Language Development (BUCLD), 2-5 November, 2023. Boston University, Boston, MA. Abstract Poster\r\nJune Choe and Anna Papafragou. Discourse effects on the acquisition of subordinate nouns. Poster presented at the 9th Mid-Atlantic Colloquium of Studies in Meaning (MACSIM), 15 April 2023. University of Pennsylvania, PA.\r\nJune Choe and Anna Papafragou. Discourse effects on the acquisition of subordinate nouns. Poster presented at the 36th Annual Conference on Human Sentence Processing, 9-11 March 2022. University of Pittsburg, PA. Abstract Poster\r\nJune Choe, and Anna Papafragou. Acquisition of subordinate nouns as pragmatic inference: Semantic alternatives modulate subordinate meanings. Poster at the 2nd Experiments in Linguistic Meaning (ELM) conference, 18-20 May 2022. University of Pennsylvania, Philadelphia, PA.\r\nJune Choe, and Anna Papafragou. Beyond the basic level: Levels of informativeness and the acquisition of subordinate nouns. Poster at the 35th Annual Conference on Human Sentence Processing (HSP), 24-26 March 2022. University of California, Santa Cruz, CA.\r\nJune Choe, Jennifer Cole, and Masaya Yoshida. Prosodic Focus Strengthens Semantic Persistence. Poster at The 26th Architectures and Mechanisms for Language Processing (AMLaP), 3-5 September 2020. Potsdam, Germany. Abstract Video Slides\r\nJune Choe. Computer-assisted snowball search for meta-analysis research. Poster at The 2020 Undergraduate Research & Arts Exposition. 27-28 May 2020. Northwestern University, Evanston, IL. 2nd Place Poster Award. Abstract\r\nJune Choe. Social Information in Sentence Processing. Talk at The 2019 Undergraduate Research & Arts Exposition. 29 May 2019. Northwestern University, Evanston, IL. Abstract\r\nJune Choe, Shayne Sloggett, Masaya Yoshida and Annette D’Onofrio. Personae in syntactic processing: Socially-specific agents bias expectations of verb transitivity. Poster at The 32nd CUNY Conference on Human Sentence Processing. 29-31 March 2019. University of Colorado, Boulder, CO.\r\nD’Onofrio, Annette, June Choe and Masaya Yoshida. Personae in syntactic processing: Socially-specific agents bias expectations of verb transitivity. Poster at The 93rd Annual Meeting of the Linguistics Society of America. 3-6 January 2019. New York City, NY.\r\nResearch activities in FOSS\r\nPapers\r\nMassimo Aria, Trang Le, Corrado Cuccurullo, Alessandra Belfiore, and June Choe. (2024). openalexR: An R-tool for collecting bibliometric data from OpenAlex. The R Journal, 15(4), 166-179. Paper, Github\r\nJune Choe. (2022). Sub-layer modularity in the Grammar of Graphics. American Statistical Association (ASA) student paper award in Statistical Computing and Graphics. Paper, Github\r\nTalks\r\nJune Choe. Sub-layer modularity in the Grammar of Graphics. Talk at the 2023 Joint Statistical Meetings, 5-10 August 2023. Toronto, Canada.\r\nJune Choe. Fast cluster-based permutation test using mixed-effects models. Talk at the Integrated Language Science and Technology (ILST) seminar, 21 April 2023. University of Pennsylvania, PA.\r\nJune Choe. Cracking open ggplot internals with {ggtrace}. Talk at the 2022 RStudio Conference, 25-28 July 2022. Washington D.C. https://github.com/yjunechoe/ggtrace-rstudioconf2022\r\nJune Choe. Stepping into {ggplot2} internals with {ggtrace}. Talk at the 2022 useR! Conference, 20-23 June 2022. Vanderbilt University, TN. https://github.com/yjunechoe/ggtrace-user2022\r\nSoftware\r\nJune Choe. (2024). jlmerclusterperm: Cluster-Based Permutation Analysis for Densely Sampled Time Data. R package version 1.1.3. https://cran.r-project.org/package=jlmerclusterperm. Github\r\nRich Iannone, June Choe, Mauricio Vargas Sepulveda. (2024). pointblank: Data Validation and Organization of Metadata for Local and Remote Tables. R package version 0.12.1. https://CRAN.R-project.org/package=pointblank. Github\r\nMassimo Aria, Corrado Cuccurullo, Trang Le, June Choe. (2024). openalexR: Getting Bibliographic Records from ‘OpenAlex’ Database Using ‘DSL’ API. R package version 1.4.0. https://CRAN.R-project.org/package=openalexR. Github\r\nJune Choe. (2024). jlme: Regression Modelling with ‘GLM.jl’ and ‘MixedModels.jl’ in ‘Julia’. R package version 0.3.0. https://cran.r-project.org/package=jlme. Github\r\nSean McWeeny, June Choe, & Elizabeth S. Norton. (2021). SnowGlobe: An Iterative Search Tool for Systematic Reviews and Meta-Analyses [Computer Software]. OSF\r\nTeaching\r\nPositions held\r\nTeaching assistant for “Introduction to Linguistics”. Instructor: Aletheia Cui. Spring 2024. University of Pennsylvania.\r\nTeaching assistant for “Data science for language and the mind”. Instructor: Katie Schuler. Fall 2021, Spring, 2023, and Fall 2023. University of Pennsylvania.\r\nWorkshops led\r\nIntroduction to mixed-effects models in Julia. Workshop at Penn MindCORE. 1 December 2023. Philadelphia, PA. Github Colab notebook\r\nExperimental syntax using IBEX/PCIBEX with Dr. Nayoun Kim. Workshop at the 2022 Seoul International Conference on Linguistics. 11-12 August 2022. Seoul, South Korea. PDF\r\nExperimental syntax using IBEX: a walkthrough with Dr. Nayoun Kim. 2021 BK Winter School-Workshop on Experimental Linguistics/Syntax at Sungkyunkwan University, 19-22 January 2021. Seoul, South Korea. PDF\r\nGuest lectures\r\nHard words and (syntactic) bootstrapping. LING 5750 “The Acquisition of Meaning”. Instructor: Dr. Anna Papafragou. Spring 2024. University of Pennsylvania.\r\nIntroduction to R for psychology research. PSYC 4997 “Senior Honors Seminar in Psychology”. Instructor: Dr. Coren Apicella. Spring 2024. University of Pennsylvania. Colab notebook\r\nModel fitting and diagnosis with MixedModels.jl in Julia. LING 5670 “Quantitative Study of Linguistic Variation”. Instructor: Dr. Meredith Tamminga. Fall 2023. University of Pennsylvania.\r\nSimulation-based power analysis for mixed-effects models. LING 5670 “Quantitative Study of Linguistic Variation”. Instructor: Dr. Meredith Tamminga. Spring 2023. University of Pennsylvania.\r\nProfessional activities\r\nEditor\r\nPenn Working Papers in Linguistics (PWPL), Volumne 30, Issue 1.\r\nReviewer\r\nCognition\r\nLanguage Learning and Development\r\nJournal of Open Source Software\r\nProceedings of the Annual Meeting of the Cognitive Science Society\r\nMembership\r\nLinguistics Society of America\r\nAmerican Statistical Association\r\n\r\n\r\n\r\n",
-      "last_modified": "2024-09-20T10:56:08-04:00"
+      "last_modified": "2024-09-22T18:50:22-04:00"
     },
     {
       "path": "resources.html",
@@ -35,14 +35,14 @@
       "description": "Mostly for R and data visualization\n",
       "author": [],
       "contents": "\r\n\r\nContents\r\nLinguistics\r\nData Visualization\r\nPackages and software\r\nTutorial Blog Posts\r\nBy others\r\n\r\nLinguistics\r\nScripting online experiments with IBEX (workshop slides & materials with Nayoun Kim)\r\nData Visualization\r\n{ggplot2} style guide and showcase - most recent version (2/10/2021)\r\nCracking open the internals of ggplot: A {ggtrace} showcase - slides\r\nPackages and software\r\n{ggtrace}: R package for exploring, debugging, and manipulating ggplot internals by exposing the underlying object-oriented system in functional programming terms.\r\n{penngradlings}: R package for the University of Pennsylvania Graduate Linguistics Society.\r\n{LingWER}: R package for linguistic analysis of Word Error Rate for evaluating transcriptions and other speech-to-text output, using a deterministic matrix-based search algorithm optimized for R.\r\n{gridAnnotate}: R package for interactively annotating figures from the plot pane, using {grid} graphical objects.\r\nSnowGlobe: A tool for meta-analysis research. Developed with Jinnie Choi, Sean McWeeny, and Elizabeth Norton, with funding from the Northwestern University Library. Currently under development but basic features are functional. Validation experiments and guides at OSF repo.\r\nTutorial Blog Posts\r\n{ggplot2} stat_*() functions [post]\r\nCustom fonts in R [post]\r\n{purrr} reduce() family [post1, post2]\r\nThe correlation parameter in {lme4} mixed effects models [post]\r\nShortcuts for common chain of {dplyr} functions [post]\r\nPlotting highly-customizable treemaps with {treemap} and {ggplot2} [post]\r\nBy others\r\nTutorials:\r\nA ggplot2 Tutorial for Beautiful Plotting in R by Cédric Scherer\r\nggplot2 Wizardry Hands-On by Cédric Scherer\r\nggplot2 workshop by Thomas Lin Pedersen\r\nBooks:\r\nR for Data Science by Hadley Wickham and Garrett Grolemund\r\nR Markdown: The Definitive Guide by Yihui Xie, J. J. Allaire, and Garrett Grolemund\r\nggplot2: elegant graphics for data analysis by Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen\r\nFundamentals of Data Visualization by Claus O. Wilke\r\nEfficient R Programming by Colin Gillespie and Robin Lovelace\r\nAdvanced R by Hadley Wickham\r\n\r\n\r\n\r\n",
-      "last_modified": "2024-09-20T10:56:09-04:00"
+      "last_modified": "2024-09-22T18:50:24-04:00"
     },
     {
       "path": "software.html",
       "title": "Software",
       "author": [],
       "contents": "\r\n\r\nContents\r\nggtrace\r\njlmerclusterperm\r\npointblank\r\nopenalexR\r\nggcolormeter\r\nddplot\r\nSnowglobe (retired)\r\n\r\nMain: Github profile, R-universe profile\r\nggtrace\r\n\r\n\r\n\r\nRole: Author\r\nLanguage: R\r\nLinks: Github, website, talks (useR! 2022, rstudio::conf 2022), paper\r\n\r\nProgrammatically explore, debug, and manipulate ggplot internals. Package {ggtrace} offers a low-level interface that extends base R capabilities of trace, as well as a family of workflow functions that make interactions with ggplot internals more accessible.\r\n\r\njlmerclusterperm\r\n\r\n\r\n\r\nRole: Author\r\nLanguage: R, Julia\r\nLinks: CRAN, Github, website\r\n\r\nAn implementation of fast cluster-based permutation analysis (CPA) for densely-sampled time data developed in Maris & Oostenveld (2007). Supports (generalized, mixed-effects) regression models for the calculation of timewise statistics. Provides both a wholesale and a piecemeal interface to the CPA procedure with an emphasis on interpretability and diagnostics. Integrates Julia libraries MixedModels.jl and GLM.jl for performance improvements, with additional functionalities for interfacing with Julia from ‘R’ powered by the JuliaConnectoR package.\r\n\r\npointblank\r\n\r\n\r\n\r\nRole: Author\r\nLanguage: R, HTML/CSS, Javascript\r\nLinks: Github, website\r\n\r\nData quality assessment and metadata reporting for data frames and database tables\r\n\r\nopenalexR\r\n\r\n\r\n\r\nRole: Author\r\nLanguage: R\r\nLinks: Github, website\r\n\r\nA set of tools to extract bibliographic content from the OpenAlex database using API https://docs.openalex.org.\r\n\r\nggcolormeter\r\nRole: Author\r\nLanguage: R\r\nLinks: Github\r\n\r\n{ggcolormeter} adds guide_colormeter(), a {ggplot2} color/fill legend guide extension in the style of a dashboard meter.\r\n\r\nddplot\r\nRole: Contributor\r\nLanguage: R, JavaScript\r\nLinks: Github, website\r\n\r\nCreate ‘D3’ based ‘SVG’ (‘Scalable Vector Graphics’) graphics using a simple ‘R’ API. The package aims to simplify the creation of many ‘SVG’ plot types using a straightforward ‘R’ API. The package relies on the ‘r2d3’ ‘R’ package and the ‘D3’ ‘JavaScript’ library. See https://rstudio.github.io/r2d3/ and https://d3js.org/ respectively.\r\n\r\nSnowglobe (retired)\r\nRole: Author\r\nLanguage: R, SQL\r\nLinks: Github, OSF, poster\r\n\r\nAn iterative search tool for systematic reviews and meta-analyses, implemented as a Shiny app. Retired due to the discontinuation of the Microsoft Academic Graph service in 2021. I now contribute to {openalexR}.\r\n\r\n\r\n\r\n\r\n",
-      "last_modified": "2024-09-20T10:56:11-04:00"
+      "last_modified": "2024-09-22T18:50:25-04:00"
     },
     {
       "path": "visualizations.html",
@@ -50,7 +50,7 @@
       "description": "Select data visualizations",
       "author": [],
       "contents": "\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n",
-      "last_modified": "2024-09-20T10:56:14-04:00"
+      "last_modified": "2024-09-22T18:50:28-04:00"
     }
   ],
   "collections": ["posts/posts.json"]
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 6a76546b..aac761f7 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -28,6 +28,10 @@
     <loc>https://yjunechoe.github.io/visualizations.html</loc>
     <lastmod>2022-11-13T09:17:01-05:00</lastmod>
   </url>
+  <url>
+    <loc>https://yjunechoe.github.io/posts/2024-09-22-fetch-files-web/</loc>
+    <lastmod>2024-09-22T18:49:08-04:00</lastmod>
+  </url>
   <url>
     <loc>https://yjunechoe.github.io/posts/2024-07-21-enumerate-possible-options/</loc>
     <lastmod>2024-09-01T17:53:55-04:00</lastmod>