updates

yjunechoe · Apr 6, 2024 · df12507 · df12507
1 parent 6f9566f
commit df12507
Show file tree

Hide file tree

Showing 7 changed files with 143 additions and 8 deletions.
diff --git a/_posts/2024-03-16-ggplot2-metaprogramming-patterns/ggplot2-metaprogramming-patterns.Rmd b/_posts/2024-03-16-ggplot2-metaprogramming-patterns/ggplot2-metaprogramming-patterns.Rmd
@@ -0,0 +1,98 @@
+---
+title: '{ggplot2} metaprogramming patterns'
+description: |
+  A ggplot blog post that only uses `aes()`
+categories:
+  - ggplot2
+  - metaprogramming
+base_url: https://yjunechoe.github.io
+author:
+  - name: June Choe
+    affiliation: University of Pennsylvania Linguistics
+    affiliation_url: https://live-sas-www-ling.pantheon.sas.upenn.edu/
+    orcid_id: 0000-0002-0701-921X
+date: "`r Sys.Date()`"
+output:
+  distill::distill_article:
+    include-after-body: "highlighting.html"
+    toc: true
+    self_contained: false
+    css: "../../styles.css"
+editor_options: 
+  chunk_output_type: console
+preview: preview.png
+---
+
+```{r setup, include=FALSE}
+library(ggplot2)
+knitr::opts_chunk$set(
+  comment = " ",
+  echo = TRUE,
+  message = FALSE,
+  warning = FALSE,
+  R.options = list(width = 80)
+)
+```
+
+## ggplot2 metaprogramming
+
+In `{ggplot2}`, aesthetic mappings are declared using `aes()`. If you want to plot a time series of daily average temperature from a data where the `day` column is mapped to the x-axis and the `temperature` column is mapped to the y-axis, you'd write something like:
+
+```{r, eval=FALSE}
+aes(day, temperature)
+```
+
+Or more explicitly,
+
+```{r, eval=FALSE}
+aes(x = day, y = temperature)
+```
+
+When we write ggplot code, we don't really do much with the `aes()` function alone. I do this when I'm teaching ggplot too - for the sake of simplicity, I actively try not to draw attention to the fact that `aes()` is itself a function. I just tell my students that the `aes()` is "a place where you write down the aesthetic mappings", and that simply mental model can get users very far.
+
+But this is a deceptively simple understanding of `aes()` - one that we have to unlearn when we start doing more advanced stuff, like writing *functions* that return ggplot objects.
+
+The world of `aes()` will look simultaneously familiar, yet at times overwhelmingly foreign. This blog post will try to showcase a little bit of that in (hopefully) a gentle way.
+
+We start with the most obvious yet most under-appreciated fact: `aes()` returns a stand-alone object.
+
+```{r}
+x <- aes(day, temperature)
+x
+```
+
+## The structure of `aes()`
+
+`aes()` returns an object of class `<uneval>`:
+
+```{r}
+class(x)
+```
+
+It means "unevaluated expression(s)". It's unevalated because it "captures" (a.k.a. ["defuses"](https://rlang.r-lib.org/reference/topic-defuse.html)) what we, as the user, provided to `aes()`.
+
+We know `time` and `temperature` are unevaluated by `aes()` because if we _were_ to evaluate them, it would error. And of course they do - they're undefined variables!
+
+```{r, error=TRUE}
+day
+temperature
+```
+
+This `<uneval>` object returned by `aes()` is actually just a list: 
+
+```{r}
+typeof(x)
+```
+
+And you can see its list-like nature when you strip away its class:
+
+```{r}
+unclass(x)
+```
+
+## sessionInfo()
+
+```{r}
+sessionInfo()
+```
+
diff --git a/docs/posts/posts.json b/docs/posts/posts.json
@@ -16,8 +16,8 @@
     ],
     "contents": "\r\n\r\nContents\r\nargs()\r\nargs(args)\r\nargs(args)(args)\r\nargs(args(args)(args))\r\nad infinitum\r\nHad enough args() yet?\r\nTL;DR: str()\r\nCoda (serious): redesigning args()\r\nTake 1) Display is the side-effect; output is trivial\r\nTake 2) Display is the side-effect; output is meaningful\r\nTake 3) Just remove the NULL\r\n\r\nsessionInfo()\r\n\r\nThe kind of blog posts that I have the most fun writing are those where I hyperfocus on a single function, like dplyr::slice(), purrr::reduce(), and ggplot2::stat_summary(). In writing blog posts of this kind, I naturally come across a point where I need to introduce the argument(s) that the function takes. I usually talk about them one at a time as needed, but I could start by front-loading that important piece of information first.\r\nIn fact, there’s a function in R that lets me do exactly that, called args().\r\nargs()\r\nargs() is, in theory, a very neat function. According to ?args:\r\n\r\nDisplays the argument names and corresponding default values of a (non-primitive or primitive) function.\r\n\r\nSo, for example, I know that sum() takes the arguments ... and na.rm (with the na.rm = FALSE default). The role of args() is to display exactly that piece of information using R code. This blog runs on rmarkdown, so surely I can use args() as a convenient and fancy way of showing information about a function’s arguments to my readers.\r\nIn this blog post, I want to talk about args(). So let’s start by looking at the argument that args() takes.\r\nOf course, I could just print args in the console:\r\n\r\n\r\nargs\r\n\r\n  function (name) \r\n  .Internal(args(name))\r\n  <bytecode: 0x0000024f98dbd180>\r\n  <environment: namespace:base>\r\n\r\nBut wouldn’t it be fun if I used args() itself to get this information?\r\nargs(args)\r\n\r\n\r\nargs(args)\r\n\r\n  function (name) \r\n  NULL\r\n\r\nOkay, so I get the function (name) piece, which is the information I wanted to show. We can see that args() takes one argument, called name, with no defaults.\r\nBut wait - what’s that NULL doing there in the second line?\r\nHmm, I wonder if they forgot to invisible()-y return the NULL. args() is a function for displaying a function’s arguments after all, so maybe the arguments are printed to the console as a side-effect and the actual output of args() is NULL.\r\nIf that is true, we should be able to suppress the printing of NULL with invisible():\r\n\r\n\r\ninvisible(args(args))\r\n\r\n\r\nUh oh, now everything is invisible.\r\nAlright, enough games! What exactly are you, output of args()?!\r\n\r\n\r\ntypeof(args(args))\r\n\r\n  [1] \"closure\"\r\n\r\nWhat?\r\nargs(args)(args)\r\nTurns out that args(args) is actually returning a whole function that’s a copy of args(), except with its body replaced with NULL.\r\nSo args(args) is itself a function that takes an argument called name and then returns NULL. Let’s assign it to a variable and call it like a function:\r\n\r\n\r\nabomination <- args(args)\r\n\r\n\r\n\r\n\r\nabomination(123)\r\n\r\n  NULL\r\n\r\nabomination(mtcars)\r\n\r\n  NULL\r\n\r\nabomination(stop())\r\n\r\n  NULL\r\n\r\nThe body is just NULL, so the function doesn’t care what it receives1 - it just returns NULL.\r\nIn fact, we could even pass it… args:\r\n\r\n\r\nargs(args)(args)\r\n\r\n  NULL\r\n\r\nargs(args(args)(args))\r\nBut wait, that’s not all! args() doesn’t just accept a function as its argument. From the documentation:\r\n\r\nValue\r\nNULL in case of a non-function.\r\n\r\nSo yeah - if args() receives a non-function, it just returns NULL:\r\n\r\n\r\nargs(123)\r\n\r\n  NULL\r\n\r\nargs(mtcars)\r\n\r\n  NULL\r\n\r\nThis applies to any non-function, including… NULL:\r\n\r\n\r\nargs(NULL)\r\n\r\n  NULL\r\n\r\nAnd recall that:\r\n\r\n\r\nis.null( args(args)(args) )\r\n\r\n  [1] TRUE\r\n\r\nTherefore, this is a valid expression in base R:\r\n\r\n\r\nargs(args(args)(args))\r\n\r\n  NULL\r\n\r\nad infinitum\r\nFor our cursed usecase of using args(f) to return a copy of f with it’s body replaced with NULL only to then immediately call args(f)(f) to return NULL, it really doesn’t matter what the identity of f is as long as it’s a function.\r\nThat function can even be … args(args)!\r\nSo let’s take our args(args(args)(args)):\r\n\r\n\r\nargs( args( args )( args ))\r\n\r\n  NULL\r\n\r\nAnd swap every args() with args(args):\r\n\r\n\r\nargs(args)( args(args)( args(args) )( args(args) ))\r\n\r\n  NULL\r\n\r\nOr better yet, swap every args() with args(args(args)):\r\n\r\n\r\nargs(args(args))( args(args(args))( args(args(args)) )( args(args(args)) ))\r\n\r\n  NULL\r\n\r\nThe above unhinged examples are a product of two patterns:\r\nThe fact that you always get function (name) NULL from wrapping args()s over args:\r\n\r\n\r\nlist(\r\n   args(          args),\r\n   args(     args(args)),\r\n   args(args(args(args)))\r\n )\r\n\r\n  [[1]]\r\n  function (name) \r\n  NULL\r\n\r\n  [[2]]\r\n  function (name) \r\n  NULL\r\n\r\n  [[3]]\r\n  function (name) \r\n  NULL\r\n\r\nThe fact that you can get this whole thing to return NULL by having function (name) NULL call the function object args. You can do this anywhere in the stack and the NULL will simply propagate:\r\n\r\n\r\nlist(\r\n   args(args(args(args))) (args)   ,\r\n   args(args(args(args))  (args) ) ,\r\n   args(args(args(args)   (args) ))\r\n )\r\n\r\n  [[1]]\r\n  NULL\r\n\r\n  [[2]]\r\n  NULL\r\n\r\n  [[3]]\r\n  NULL\r\n\r\nWe could keep going but it’s tiring to type out and read all these nested args()… but did you know that there’s this thing called the pipe %>% that’s the solution to all code readability issues?\r\nHad enough args() yet?\r\nLet’s make an args() factory ARGS() …\r\n\r\n\r\nlibrary(magrittr)\r\nARGS <- function(n) {\r\n  Reduce(\r\n    f = \\(x,y) bquote(.(x) %>% args()),\r\n    x = seq_len(n),\r\n    init = quote(args)\r\n  )\r\n}\r\n\r\n\r\n… to produce a sequence of args() …\r\n\r\n\r\nARGS(10)\r\n\r\n  args %>% args() %>% args() %>% args() %>% args() %>% args() %>% \r\n      args() %>% args() %>% args() %>% args() %>% args()\r\n\r\neval(ARGS(10))\r\n\r\n  function (name) \r\n  NULL\r\n\r\n… and tidy it up!\r\n\r\n\r\nARGS(10) %>% \r\n  deparse1() %>% \r\n  styler::style_text()\r\n\r\n  args %>%\r\n    args() %>%\r\n    args() %>%\r\n    args() %>%\r\n    args() %>%\r\n    args() %>%\r\n    args() %>%\r\n    args() %>%\r\n    args() %>%\r\n    args() %>%\r\n    args()\r\n\r\nWanna see even more unhinged?\r\nLet’s try to produce a “matrix” of args(). You get a choice of i “rows” of piped lines, and j “columns” of args()-around-args each time - all to produce a NULL.\r\nReady?\r\n\r\n\r\nARGS2 <- function(i, j) {\r\n  Reduce(\r\n    f = \\(x,y) bquote(.(x) %>% (.(y))),\r\n    x = rep(list(Reduce(\\(x,y) call(\"args\", x), seq_len(j), quote(args))), i)\r\n  )\r\n}\r\n\r\n\r\n\r\n\r\nARGS2(5, 1) %>% \r\n  deparse1() %>%\r\n  styler::style_text()\r\n\r\n  args(args) %>%\r\n    (args(args)) %>%\r\n    (args(args)) %>%\r\n    (args(args)) %>%\r\n    (args(args))\r\n\r\n\r\n\r\nARGS2(5, 3) %>% \r\n  deparse1() %>%\r\n  styler::style_text()\r\n\r\n  args(args(args(args))) %>%\r\n    (args(args(args(args)))) %>%\r\n    (args(args(args(args)))) %>%\r\n    (args(args(args(args)))) %>%\r\n    (args(args(args(args))))\r\n\r\n\r\n\r\nARGS2(10, 5) %>% \r\n  deparse1() %>%\r\n  styler::style_text()\r\n\r\n  args(args(args(args(args(args))))) %>%\r\n    (args(args(args(args(args(args)))))) %>%\r\n    (args(args(args(args(args(args)))))) %>%\r\n    (args(args(args(args(args(args)))))) %>%\r\n    (args(args(args(args(args(args)))))) %>%\r\n    (args(args(args(args(args(args)))))) %>%\r\n    (args(args(args(args(args(args)))))) %>%\r\n    (args(args(args(args(args(args)))))) %>%\r\n    (args(args(args(args(args(args)))))) %>%\r\n    (args(args(args(args(args(args))))))\r\n\r\n\r\n\r\nlist(\r\n  eval(ARGS2(5, 1)),\r\n  eval(ARGS2(5, 3)),\r\n  eval(ARGS2(10, 5))\r\n)\r\n\r\n  [[1]]\r\n  NULL\r\n  \r\n  [[2]]\r\n  NULL\r\n  \r\n  [[3]]\r\n  NULL\r\n\r\nYay!\r\nTL;DR: str()\r\nIf you want a version of args() that does what it’s supposed to, use str() instead:2\r\n\r\n\r\nstr(args)\r\n\r\n  function (name)\r\n\r\nstr(sum)\r\n\r\n  function (..., na.rm = FALSE)\r\n\r\nargs() is hereafter banned from my blog.\r\nCoda (serious): redesigning args()\r\nThe context for my absurd rant above is that I was just complaining about how I think args() is a rather poorly designed function.\r\nLet’s try to redesign args(). I’ll do three takes:\r\nTake 1) Display is the side-effect; output is trivial\r\nIf the whole point of args() is to display a function’s arguments for inspection in interactive usage, then that can simply be done as a side-effect.\r\nAs I said above, str() surprisingly has this more sensible behavior out of the box. So let’s write our first redesign of args() which just calls str():\r\n\r\n\r\nargs1 <- function(name) {\r\n  str(name)\r\n}\r\nargs1(sum)\r\n\r\n  function (..., na.rm = FALSE)\r\n\r\nIn args1()/str(), information about the function arguments are sent to the console.3 We know this because we can’t suppress this with invisible but we can grab this via capture.output:\r\n\r\n\r\ninvisible( args1(sum) )\r\n\r\n  function (..., na.rm = FALSE)\r\n\r\ncapture.output( args1(sum) )\r\n\r\n  [1] \"function (..., na.rm = FALSE)  \"\r\n\r\nFor functions whose purpose is to signal information to the console (and whose usage is limited to interactive contexts), we don’t particularly care about the output. In fact, because the focus isn’t on the output, the return value should be as trivial as possible.\r\nA recommended option is to just invisibly return NULL. This is now how args1() does it (via str()).4:\r\n\r\n\r\nprint( args1(sum) )\r\n\r\n  function (..., na.rm = FALSE)  \r\n  NULL\r\n\r\nis.null( args1(sum) )\r\n\r\n  function (..., na.rm = FALSE)\r\n  [1] TRUE\r\n\r\nAlternatively, the function could just invisibly return what it receives,5 which is another common pattern for cases like this. Again, we return invisibly to avoid distracting from the fact that the point of the function is to display as the side-effect.\r\n\r\n\r\nargs2 <- function(name) {\r\n  str(sum)\r\n  invisible(name)\r\n}\r\n\r\n\r\n\r\n\r\nargs2(rnorm)\r\n\r\n  function (..., na.rm = FALSE)\r\n\r\n\r\n\r\nargs2(rnorm)(5)\r\n\r\n  function (..., na.rm = FALSE)\r\n  [1] -0.5494891  1.2861975 -1.2755454  1.0817387 -0.7248563\r\n\r\nTake 2) Display is the side-effect; output is meaningful\r\nOne thing I neglected to mention in this blog post is that there are other ways to extract a function’s arguments. One of them is formals():6\r\n\r\n\r\nformals(args)\r\n\r\n  $name\r\n\r\nformals(rnorm)\r\n\r\n  $n\r\n  \r\n  \r\n  $mean\r\n  [1] 0\r\n  \r\n  $sd\r\n  [1] 1\r\n\r\nformals() returns the information about a function’s arguments in a list which is pretty boring, but it’s an object we can manipulate (unlike the return value of str()). So there’s some pros and cons.\r\nActually, we could just combine both formals() and str():\r\n\r\n\r\nargs3 <- function(name) {\r\n  str(name)\r\n  invisible(formals(name))\r\n}\r\n\r\n\r\n\r\n\r\narguments <- args3(rnorm)\r\n\r\n  function (n, mean = 0, sd = 1)\r\n\r\narguments\r\n\r\n  $n\r\n  \r\n  \r\n  $mean\r\n  [1] 0\r\n  \r\n  $sd\r\n  [1] 1\r\n\r\narguments$mean\r\n\r\n  [1] 0\r\n\r\nYou get the nice display as a side-effect (via str()) and then an informative output (via formals()). You could even turn this into a class with a print method, which is definitely the better way to go about this, but I’m running out of steam here and I don’t like OOP, so I won’t touch that here.\r\nTake 3) Just remove the NULL\r\nThis last redesign is the simplest of the three, and narrowly deals with the problem of that pesky NULL shown alongside the function arguments:\r\n\r\n\r\nargs(sum)\r\n\r\n  function (..., na.rm = FALSE) \r\n  NULL\r\n\r\nFine, I’ll give them that args() must, for compatibility with S whatever reason, return a whole new function object, which in turn requires a function body. But if that function is just as a placeholder and not meant to be called, can’t you just make the function body, like, empty?\r\n\r\n\r\nargs4 <- function(name) {\r\n  f <- args(name)\r\n  body(f) <- quote(expr=)\r\n  f\r\n}\r\nargs4(sum)\r\n\r\n  function (..., na.rm = FALSE)\r\n\r\nargs4(rnorm)\r\n\r\n  function (n, mean = 0, sd = 1)\r\n\r\ntypeof( args4(rnorm) )\r\n\r\n  [1] \"closure\"\r\n\r\nLike, come on!\r\nsessionInfo()\r\n\r\n\r\nsessionInfo()\r\n\r\n  R version 4.3.3 (2024-02-29 ucrt)\r\n  Platform: x86_64-w64-mingw32/x64 (64-bit)\r\n  Running under: Windows 11 x64 (build 22631)\r\n  \r\n  Matrix products: default\r\n  \r\n  \r\n  locale:\r\n  [1] LC_COLLATE=English_United States.utf8 \r\n  [2] LC_CTYPE=English_United States.utf8   \r\n  [3] LC_MONETARY=English_United States.utf8\r\n  [4] LC_NUMERIC=C                          \r\n  [5] LC_TIME=English_United States.utf8    \r\n  \r\n  time zone: America/New_York\r\n  tzcode source: internal\r\n  \r\n  attached base packages:\r\n  [1] stats     graphics  grDevices utils     datasets  methods   base     \r\n  \r\n  other attached packages:\r\n  [1] magrittr_2.0.3\r\n  \r\n  loaded via a namespace (and not attached):\r\n   [1] crayon_1.5.2      vctrs_0.6.5       cli_3.6.1         knitr_1.45       \r\n   [5] rlang_1.1.2       xfun_0.41         purrr_1.0.2       styler_1.10.2    \r\n   [9] jsonlite_1.8.8    htmltools_0.5.7   sass_0.4.7        fansi_1.0.5      \r\n  [13] rmarkdown_2.25    R.cache_0.16.0    evaluate_0.23     jquerylib_0.1.4  \r\n  [17] distill_1.6       fastmap_1.1.1     yaml_2.3.7        lifecycle_1.0.4  \r\n  [21] memoise_2.0.1     compiler_4.3.3    prettycode_1.1.0  downlit_0.4.3    \r\n  [25] rstudioapi_0.15.0 R.oo_1.25.0       R.utils_2.12.3    digest_0.6.33    \r\n  [29] R6_2.5.1          R.methodsS3_1.8.2 bslib_0.6.1       tools_4.3.3      \r\n  [33] withr_3.0.0       cachem_1.0.8\r\n\r\n\r\nYou can even see lazy evaluation in action when it receives stop() without erroring.↩︎\r\nThough you have to remove the \"srcref\" attribute if the function has one. But also don’t actually do this!↩︎\r\nTechnically, the \"output\" stream.↩︎\r\nFor the longest time, I thought args() was doing this from how its output looked.↩︎\r\nEssentially acting like identity().↩︎\r\nBut note that it has a special behavior of returning NULL for primitive functions (written in C) that clearly have user-facing arguments on the R side. See also formalArgs(), for a shortcut to names(formals())↩︎\r\n",
     "preview": "posts/2024-03-04-args-args-args-args/preview.png",
-    "last_modified": "2024-03-05T17:04:52-05:00",
-    "input_file": "args-args-args-args.knit.md",
+    "last_modified": "2024-03-05T17:04:54-05:00",
+    "input_file": {},
     "preview_width": 419,
     "preview_height": 300
   },