04-iteration-02.qmd

---
title: "useR to programmeR"
subtitle: "Iteration 2"
author: "Emma Rand and Ian Lyttle"
format: 
  revealjs:
    theme: [simple, styles.scss]
    footer: <https://pos.it/programming-r-conf-2023>
    slide-number: true
    chalkboard: true
    code-link: true
    code-line-numbers: false
    width: 1600
    height: 900
bibliography: references.bib
---

## Learning objectives

In this session, we will discuss:

:::{.incremental}
  - using `purrr::map()` to read a bunch of files
  - using `purrr::walk()` to write a bunch of files
  - functional programming, more generally
:::

. . .

<hr>

For coding, we will use `r-programming-exercises`:

 - `R/iteration-02-01-reading-files.R`, etc.
 - restart R

## Reading multiple files

Using {purrr} to iterate can help you perform many tasks repeatably and reproducibly.

. . .

### Example

Read Excel files from a directory, then combine into a single data-frame.

## Aside: {here} package

When you first call `here::here()`, (simplified):

:::{.incremental}
-   climbs your local directory until it finds a `.RProj` file
-   sets directory containing `.RProj` as reference-path
-   `here::here()` prepends reference-path to argument
:::

. . .

If project in `/Users/ian/important-project/`:

``` r
here("data/file.csv")
```

```         
"/Users/ian/important-project/data/file.csv"
```

## Our turn

In the `programming-r-exercises` repository:

-   open `iteration-02-01-reading-files.R`
-   restart R

## Our turn: reading data manually

Here's our starting code:

``` r
data1952 <- read_excel(here("data/gapminder/1952.xlsx"))
data1957 <- read_excel(here("data/gapminder/1957.xlsx"))
data1962 <- read_excel(here("data/gapminder/1952.xlsx"))
data1967 <- read_excel(here("data/gapminder/1967.xlsx"))

data_manual <- bind_rows(data1952, data1957, data1962, data1967)
```

. . .

What problems do you see?

(I see two real problems, and one philisophical problem)

Run this example code, discuss with your neighbor.

## Our turn: make list of paths

I see this as a two step problem:

:::{.incremental}
-   make a named list of paths, name is year
-   use list of paths to read data frames, combine
:::

. . .

Let's work together to improve this code to get paths:

``` r
paths <-
  # get the filepaths from the directory
  fs::dir_ls(here("data/gapminder")) |>
  # convert to list
  # extract the year as names
  print()
```

## Our turn: read data

Let's work together to improve this code to read data:

``` r
data <-
  paths |>
  # read each file from excel, into data frame
  # keep only non-null elements
  # set list-names as column `year`
  # bind into single data-frame
  # convert year to number
  print()
```

## Handling failures

If we have a failure, we may not want to stop everything.

. . .

```{r}
#| error: true
library("readr")
read_csv("not/a/file.csv")
```

## Function operators

Function operators:

  - take a function
  - return a modified function

. . .

```{r}
library("purrr")

poss_read_csv <- possibly(read_csv, otherwise = NULL, quiet = FALSE)
```

. . .

<hr>

```{r}
#| message: true
poss_read_csv("not/a/file.csv")
```

. . .

<hr>

```{r}
poss_read_csv(I("a, b\n 1, 2"), col_types = "dd")
```


## Our turn: handle failure

In the `programming-r-exercises` repository:

-   look at `data/gapminder_party/`
-   try running your script using this directory

Create a new function:

``` r
possibly_read_excel <- possibly() # we do the rest
```

Use this function in your script.

## If we have time

Functional programming has three fundamental operations:

:::{.incremental}
 - `filter()` - like spaghetti, not coffee: `purrr::keep()`
 - `map()` - do *this* to each element: `purrr::map()`
 - `reduce()` - combine into new thing: `purrr::reduce()`
:::

## Functional sandwiches

![[Anjana Vakil's Functional Sandwiches](https://observablehq.com/collection/@anjana/functional-javascript-first-steps)](images/anjana-vakil-functional-sanwiches.png){fig-alt="Shows ingredients of a sandwich: onions and pickles *filtered* out, remaining ingredients *mapped* to a slicer-function, then *reduced* to a sandwich" fig-align="center"}


## Horrible example

Implement `list_rbind()` using functional-programming techniques:

``` r
list_rbind2 <- function(df, names_to) {
  df |>
    purrr::keep(\(x) !is.null(x)) |>
    purrr::imap(\(d, name) dplyr::mutate(d, "{names_to}" := name)) |>
    purrr::reduce(rbind)
}
```

:::{.incremental}
-   *filters* in not-`NULL` values, `purrr::keep()`
-   *maps* name of element to data-column, `purrr::imap()`
-   *reduces* list to single data-frame, `purrr::reduce()`
:::

## Our turn: saving multiple outputs

**Goal**: write out a csv file *for each* value of `clarity` within ggplot2's `diamonds` dataset.

. . .

<hr>

When we read "for each", we might think of using `map()`:

-   Writing out a file is a *side effect*.

-   We aren't interested in the return value.

. . .

{purrr} has a function for that: `walk()` (and friends).

## Our turn - starter code

`iteration-02-02-writing-files.R`

``` r
# ?dplyr::group_nest(), ?stringr::str_glue()
# from diamonds, create tibble with columns: clarity, data, filename
by_clarity_csv <-
  diamonds |>
  # nest by clarity
  # create column for filename
  print()

# ?readr::write_csv()
# using the data and filename, write out csv files
walk2(
  by_clarity_csv$data,
  by_clarity_csv$filename,
  \(data, filename) NULL # replace with actual code
)
```

## Our turn: writing multiple plots

**Goal**: Save histogram for `carat` for each value of `clarity` within `diamonds` dataset.

. . .

<hr>

Create a `plot` column, where each element is a ggplot.
This will be a list-column.

. . .

You can use `map()`:

 - within `mutate()`, with all the tidy-eval goodness!
 - with additional arguments (after the function), e.g.:
   
```r
mutate(
  plot = map(data, histogram, carat)
)
```

. . .

equivalent to 

```r
plot[[1]] = histogram(data[[1]], carat)
plot[[2]] = histogram(data[[2]], carat)
...
```

## Our turn: starter-code

``` r
# from diamonds, create tibble with columns: clarity, data, plot, filename
by_clarity_plot <-
  diamonds |>
  # nest by clarity
  group_nest(clarity) |>
  # create columns for plot, filename
  mutate(
    filename = str_glue("clarity-{clarity}.png")#,
    #plot = map(),
  ) |>
  print()
```

## Our turn: more starter-code

```r
# ?ggplot2::ggsave()
ggsave_local <- function(filename, plot) {

}

# using filename and plot, write out plots to png files
walk2(
  by_clarity_plot$filename,
  by_clarity_plot$plot,
  # write plot file to data/clarity directory
  ggsave_local
)
```

## Functions as arguments

```{r}
library("tidyverse")
library("palmerpenguins")

ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point() +
  scale_color_discrete(labels = tolower) # tolower is a function
```


## If we have time (2)

Three fundamental operations in functional programming

Given a list and a function:

:::{.incremental}
-   `filter()`: make a new list, subset of old list
-   `map()`: make a new list, operating on each element
-   `reduce()`: make a new "thing"
:::

## dplyr using purrr?

We can use `map()`, `filter()`, `reduce()` to "implement", using purrr:

-   `dplyr::mutate()`
-   `dplyr::filter()`
-   `dplyr::summarise()`

. . .

I claim it's possible, I don't claim it's a good idea.

## Tabular data: two perspectives

:::{.incremental}
-   column-based: named list of column vectors

    ``` json
    {
      mpg: [21.0, 22.8, ...],
      cyl: [6, 4, ...],
      ...
    }
    ```

-   row-based: collection of rows, each a named list

    ``` json
    [
      {mpg: 21.0, cyl: 6, ...}, 
      {mpg: 22.8, cyl: 4, ...}, 
      ...
    ]
    ```
:::

## `dpurrr_filter()`

```{r}
dpurrr_filter <- function(df, predicate) {
  df |>
    as.list() |>
    purrr::list_transpose(simplify = FALSE) |>
    purrr::keep(predicate) |>
    purrr::list_transpose() |>
    as.data.frame() 
}
```

. . .

<hr>

```{r}
dpurrr_filter(mtcars, \(d) d$gear == 3) |> head()
```

## `dpurrr_mutate()`

```{r}
dpurrr_mutate <- function(df, mapper) {
  df |>
    as.list() |>
    purrr::list_transpose(simplify = FALSE) |>
    purrr::map(\(d) c(d, mapper(d))) |>
    purrr::list_transpose() |>
    as.data.frame() 
}
```

. . .

<hr>

```{r}
mtcars |> 
  dpurrr_mutate(\(d) list(wt_kg = d$wt * 1000 / 2.2)) |> 
  head()
```

## `dpurrr_summarise()`

```{r}
dpurrr_summarise <- function(df, reducer, .init) {
  df |>
    as.list() |>
    purrr::list_transpose(simplify = FALSE) |>
    purrr::reduce(reducer, .init = .init) |>
    as.data.frame()
}
```

. . .

<hr>

```{r}
mtcars |> 
  dpurrr_summarise(
    reducer = \(acc, val) list(
      wt_min = min(acc$wt_min, val$wt), 
      wt_max = max(acc$wt_max, val$wt)
    ),
    .init = list(wt_min = Inf, wt_max = -Inf)
  )
```

## With grouping

First, a little prep work:

```{r}
ireduce <- function(x, reducer, .init) {
  purrr::reduce2(x, names(x), reducer, .init = .init)
}

summariser <- purrr::partial(
  dpurrr_summarise,
  reducer = \(acc, val) list(
    wt_min = min(acc$wt_min, val$wt), 
    wt_max = max(acc$wt_max, val$wt)
  ),
  .init = list(wt_min = Inf, wt_max = -Inf)
)
```

## Et voilà

```{r}
mtcars |> 
  split(mtcars$gear) |>
  purrr::map(summariser) |> 
  ireduce( 
    reducer = \(acc, x, y) rbind(acc, c(list(gear = y), x)),
    .init = data.frame()
  ) 
```

. . .

We can agree this presents no danger to dplyr.

. . .

In JavaScript, data frames are often arrays of objects (lists), so you'll see formulations like this (e.g. **tidyjs**).

## Summary

:::{.incremental}
- you can use `purrr::map()` to read a bunch of files
- you can use `purrr::walk()` to write a bunch of files
- functional programming has three foundational operations:
  - filter (`purrr::keep()`)
  - map
  - reduce
:::

. . .

<hr>

Functional programming comes up a lot in JavaScript

## Wrap-up

Please go to [pos.it/conf-workshop-survey](https://pos.it/conf-workshop-survey). 

Your feedback is crucial! 

Data from the survey informs curriculum and format decisions for future conf workshops, and we really appreciate you taking the time to provide it.

<hr>

### Thank you!

:::{.incremental}
- Emma
- Steph, Mouna, Garrett
- Mine Çetinkaya-Rundel, Posit
- **You** 🤗
:::