diff --git a/_posts/2024-07-21-enumerate-possible-options/enumerate-possible-options.Rmd b/_posts/2024-07-21-enumerate-possible-options/enumerate-possible-options.Rmd index 0be7e53..b3c8552 100644 --- a/_posts/2024-07-21-enumerate-possible-options/enumerate-possible-options.Rmd +++ b/_posts/2024-07-21-enumerate-possible-options/enumerate-possible-options.Rmd @@ -133,7 +133,7 @@ I like this strategy of "naming the scale" because it gives off the impression t Sometimes a boolean argument may encode a genuinely binary choice of a true/false, on/off, yes/no option. But refactoring the boolean options as enum may still offer some benefits. In those cases, I prefer the strategy of **name the obvious/absence**. -Some cases for improvement are easier to spot than others. An easy case is something like the `REML` argument in `lme4::lmer()`. Without going into too much detail, when `REML = TRUE` (default), the model optimizes the REML (restricted/residualized maximum likelihood) criterion in finding the best fitting model. But it's not like the model doesn't use any criteria for goodness of fit when `REML = FALSE`. Instead, when `REML = FALSE`, the function uses a different criterion of ML (maximum likelihood). So the choice is not really between toggling REML on or off, but rather betweem tje ise pf REML vs. ML. The enum version lets us spell out the assumed default and make the choice explicit (again, with room for introducing other criteria in the future): +Some cases for improvement are easier to spot than others. An easy case is something like the `REML` argument in `lme4::lmer()`. Without going into too much detail, when `REML = TRUE` (default), the model optimizes the REML (restricted/residualized maximum likelihood) criterion in finding the best fitting model. But it's not like the model doesn't use _any_ criteria for goodness of fit when `REML = FALSE`. Instead, when `REML = FALSE`, the function uses a different criterion of ML (maximum likelihood). So the choice is not really between toggling REML on or off, but rather between the choice of REML vs. ML. The enum version lets us spell out the assumed default and make the choice between the two explicit (again, with room for introducing other criteria in the future): ```{r, eval = FALSE} # Boolean options @@ -145,7 +145,7 @@ lmer::lme4(criterion = "REML") lmer::lme4(criterion = "ML") ``` -A somewhat harder cases is a true presence-or-absence kind of a situation, where setting the argument to true/false essentially boils down to triggering an `if` block inside the function. For example, say a function has an option to use an optimizer called "MyOptim". This may be implemented as: +A somewhat harder case is a true presence-or-absence kind of a situation, where setting the argument to true/false essentially boils down to triggering an `if` block inside the function. For example, say a function has an option to use an optimizer called "MyOptim". This may be implemented as: ```{r, eval = FALSE} # Boolean options diff --git a/_posts/2024-07-21-enumerate-possible-options/enumerate-possible-options.html b/_posts/2024-07-21-enumerate-possible-options/enumerate-possible-options.html index 2d86268..738f8a6 100644 --- a/_posts/2024-07-21-enumerate-possible-options/enumerate-possible-options.html +++ b/_posts/2024-07-21-enumerate-possible-options/enumerate-possible-options.html @@ -1655,7 +1655,7 @@
I like this strategy of “naming the scale” because it gives off the impression to users that the possible options are values that lie on the scale. In the example above, it could either be the extremes "all"
or "none"
, but also possibly somewhere in between if the writer of the function chooses to introduce more granular settings later.
Sometimes a boolean argument may encode a genuinely binary choice of a true/false, on/off, yes/no option. But refactoring the boolean options as enum may still offer some benefits. In those cases, I prefer the strategy of name the obvious/absence.
-Some cases for improvement are easier to spot than others. An easy case is something like the REML
argument in lme4::lmer()
. Without going into too much detail, when REML = TRUE
(default), the model optimizes the REML (restricted/residualized maximum likelihood) criterion in finding the best fitting model. But it’s not like the model doesn’t use any criteria for goodness of fit when REML = FALSE
. Instead, when REML = FALSE
, the function uses a different criterion of ML (maximum likelihood). So the choice is not really between toggling REML on or off, but rather betweem tje ise pf REML vs. ML. The enum version lets us spell out the assumed default and make the choice explicit (again, with room for introducing other criteria in the future):
Some cases for improvement are easier to spot than others. An easy case is something like the REML
argument in lme4::lmer()
. Without going into too much detail, when REML = TRUE
(default), the model optimizes the REML (restricted/residualized maximum likelihood) criterion in finding the best fitting model. But it’s not like the model doesn’t use any criteria for goodness of fit when REML = FALSE
. Instead, when REML = FALSE
, the function uses a different criterion of ML (maximum likelihood). So the choice is not really between toggling REML on or off, but rather between the choice of REML vs. ML. The enum version lets us spell out the assumed default and make the choice explicit (again, with room for introducing other criteria in the future):
# Boolean options
diff --git a/docs/posts/2024-07-21-enumerate-possible-options/index.html b/docs/posts/2024-07-21-enumerate-possible-options/index.html
index 479d2df..f2b6ee3 100644
--- a/docs/posts/2024-07-21-enumerate-possible-options/index.html
+++ b/docs/posts/2024-07-21-enumerate-possible-options/index.html
@@ -2789,7 +2789,7 @@ Is the ar
I like this strategy of “naming the scale” because it gives off the impression to users that the possible options are values that lie on the scale. In the example above, it could either be the extremes "all"
or "none"
, but also possibly somewhere in between if the writer of the function chooses to introduce more granular settings later.
Is the argument truly binary? Still prefer enum and name the obvious/absence.
Sometimes a boolean argument may encode a genuinely binary choice of a true/false, on/off, yes/no option. But refactoring the boolean options as enum may still offer some benefits. In those cases, I prefer the strategy of name the obvious/absence.
-Some cases for improvement are easier to spot than others. An easy case is something like the REML
argument in lme4::lmer()
. Without going into too much detail, when REML = TRUE
(default), the model optimizes the REML (restricted/residualized maximum likelihood) criterion in finding the best fitting model. But it’s not like the model doesn’t use any criteria for goodness of fit when REML = FALSE
. Instead, when REML = FALSE
, the function uses a different criterion of ML (maximum likelihood). So the choice is not really between toggling REML on or off, but rather betweem tje ise pf REML vs. ML. The enum version lets us spell out the assumed default and make the choice explicit (again, with room for introducing other criteria in the future):
+Some cases for improvement are easier to spot than others. An easy case is something like the REML
argument in lme4::lmer()
. Without going into too much detail, when REML = TRUE
(default), the model optimizes the REML (restricted/residualized maximum likelihood) criterion in finding the best fitting model. But it’s not like the model doesn’t use any criteria for goodness of fit when REML = FALSE
. Instead, when REML = FALSE
, the function uses a different criterion of ML (maximum likelihood). So the choice is not really between toggling REML on or off, but rather between the choice of REML vs. ML. The enum version lets us spell out the assumed default and make the choice explicit (again, with room for introducing other criteria in the future):
# Boolean options
diff --git a/docs/posts/posts.json b/docs/posts/posts.json
index 262306b..49e7474 100644
--- a/docs/posts/posts.json
+++ b/docs/posts/posts.json
@@ -13,9 +13,9 @@
"categories": [
"design"
],
- "contents": "\r\n\r\nContents\r\nTake the argument name and negate it - is the intention clear?\r\nLook at the argument name - is it verb-y without an object?\r\nIs the argument a scalar adjective? Consider naming the scale.\r\nIs the argument truly binary? Still prefer enum and name the obvious/absence.\r\nMove shared strings across options into the argument name\r\n\r\nI’ve been having a blast reading through the Tidy design principles book lately - it’s packed with just the kind of stuff I needed to hear at this stage of my developer experience. And actually, I started writing packages in the post-{devtools}/R Packages era, so I wasn’t too surprised to find that my habits already align with many of the design principles advocated for in the book.1\r\nBut there was one pattern which took me a bit to fully wrap my head around (and be fully convinced by). It’s first introduced in the chapter “Enumerate possible options” which gives a pretty convincing example of the base R function rank(). rank() has a couple options for resolving ties between values which are exposed to the user via the ties.method argument. The default value of this argument is a vector that enumerates all the possible options, and the user’s choice of (or the lack of) an option is resolved through match.arg() and then the appropriate algorithm is called via a switch() statement.\r\nThis is all good and well, but the book takes it a step further in a later chapter “Prefer an enum, even if only two choices”, which outlines what I personally consider to be one of the more controversial (and newer2) strategies advocated for in the book. It’s a specific case of the “enumerate possible options” principle applied to boolean arguments, and is best understood with an example (of sort() vs. vctrs::vec_sort(), from the book):\r\n\r\n\r\n# Booolean options\r\nsort(x, decreasing = TRUE)\r\nsort(x, decreasing = FALSE)\r\n\r\n# Enumerated options\r\nvctrs::vec_sort(x, direction = \"desc\")\r\nvctrs::vec_sort(x, direction = \"asc\")\r\n\r\n\r\nThe main argument for this pattern is one of clarity. In the case of the example above, it is unclear from reading decreasing = FALSE whether that expresses “sort in the opposite of decreasing order (i.e., increasing/ascending)” or “do not sort in decreasing order (ex: leave it alone)”. The former is the correct interpretation, and this is expressed much clearer with direction = \"asc\", which contrasts with the other option direction = \"desc\".3\r\nI’ve never used this pattern for boolean options previously, but it’s been growing on me and I’m starting to get convinced. But in thinking through its implementation for refactoring code that I own and/or use, I got walled by the hardest problem in CS: naming things. A lot has been said on how to name things, but I’ve realized that the case of “turn booleans into enums” raises a whole different naming problem, one where you have to be precise about what’s being negated, the alternatives that are being contrasted, and the scale that the enums lie on.\r\nWhat follows are my somewhat half-baked, unstructured thoughts on some heuristics that I hope can be useful for determining when to apply the “enumerate possible options” principle for boolean options, and how to rename them in the refactoring.\r\nTake the argument name and negate it - is the intention clear?\r\nOne good litmus test for whether you should convert your boolean option into an enum is to take the argument name X and turn it into “X” and “not-X” - is the intended behavior expressed clearly in the context of the function? If, conceptually, the options are truly and unambiguously binary, then it should still make sense. But if the TRUE/FALSE options assume a very particular contrast which is difficult to recover from just reading “X” vs. “not-X”, consider using an enum for the two options.\r\nTo take sort() as an example again, imagine if we were to re-write it as:\r\n\r\n\r\nsort(option = \"decreasing\")\r\nsort(option = \"not-decreasing\")\r\n\r\n\r\nIf \"decreasing\" vs. \"not-decreasing\" is ambiguous, then maybe that’s a sign to consider ditching the boolean pattern and spell out the options more explicitly with e.g., direction = \"desc\" and direction = \"asc\", as vctrs::vec_sort() does. I also think this is a useful exercise because it reflects the user’s experience when encountering boolean options.\r\nLook at the argument name - is it verb-y without an object?\r\nLet’s take a bigger offender of this principle as an example: ggplot2::facet_grid(). facet_grid() is a function that I use all the time, and it has a couple boolean arguments which makes no immediate sense to me. Admittedly, I’ve never actually used them in practice, but from all my experience with {ggplot2} and facet_grid(), shouldn’t I be able to get at least some clues as to what they do from reading the arguments?4\r\n\r\n\r\nFilter(is.logical, formals(ggplot2::facet_grid))\r\n\r\n $shrink\r\n [1] TRUE\r\n \r\n $as.table\r\n [1] TRUE\r\n \r\n $drop\r\n [1] TRUE\r\n \r\n $margins\r\n [1] FALSE\r\n\r\nTake for example the shrink argument. Right off the bat it already runs into the problem where it’s not clear what we’re shrinking. I find this to be a general problem with boolean arguments: they’re often verbs with the object omitted (presumably to save keystrokes). Using the heuristic of negating the argument, we get “shrink” vs. “don’t shrink”, which not only repeats the problem of the ambiguity of negation as we saw with sort() previously, but also exposes how serious the problem of missing the object of the verb is.\r\nAt this point you may be wondering what exactly the shrink argument does at all. From the docs:\r\n\r\nIf TRUE, will shrink scales to fit output of statistics, not raw data. If FALSE, will be range of raw data before statistical summary.\r\n\r\nThe intended contrast seems to be one of “statistics” (default) vs. “raw data”, so these are obvious candidates for our enum refactoring. But something like shrink = c(\"statistics\", \"raw-data\") doesn’t quite cut it yet, because the object of shrinking is not the data, but the scales. So to be fully informative, the argument name should complete the verb phrase (i.e., include the object).\r\nCombining the observations from above, I think the following makes more sense:\r\n\r\n\r\n# Boolean options\r\nfacet_grid(shrink = TRUE)\r\nfacet_grid(shrink = FALSE)\r\n\r\n# Enumerated options\r\nfacet_grid(shrink_scales_to = \"statistics\")\r\nfacet_grid(shrink_scales_to = \"raw-data\")\r\n\r\n\r\nThis last point is a bit of a tangent, but after tinkering with the behavior of shrink more, I don’t think “shrink” is a particularly useful description here. I might actually prefer something more neutral like fit_scales_to.\r\nIs the argument a scalar adjective? Consider naming the scale.\r\nLoosely speaking, scalar (a.k.a. gradable) adjectives are adjectives that can be strengthened (or weakened) - English grammar can express this with the suffixes “-er” and “-est”. For example, “tall” is a scalar adjective because you can say “taller” and “tallest”, and scalar adjectives are called such because they lie on a scale (in this case, the scale of height). Note that the quality of an adjective as a scalar one is not so clear though, as you can “more X” or “most X” just about any adjective X (e.g., even true vs. false can lie on a scale of more true or more false) - what matters more is if saying something like “more X” makes sense in the context of where X is found (e.g., the context of the function).5 If so, you’re dealing with a scalar adjective.\r\nThis Linguistics 101 tangent is relevant here because I often see boolean arguments named after scalar adjectives, but I feel like in those cases it’s better to just name the scale itself (which in turn makes the switch to enum more natural).\r\nA contrived example would be if a function had a boolean argument called tall. To refactor this into an enum, we can rename the argument to the scale itself (height) and enumerate the two end points:\r\n\r\n\r\n# Boolean options\r\nfun(tall = TRUE)\r\nfun(tall = FALSE)\r\n\r\n# Enumerated options\r\nfun(height = \"tall\")\r\nfun(height = \"short\")\r\n\r\n\r\nA frequent offender of the enum principle in the wild is the verbose argument. verbose is an interesting case study because it suffers from the additional problem of there possibly being more than 2 options as the function matures. The book offers some strategies for remedying these kinds of problems after-the-fact, but I think a proactive solution is to name the argument verbosity (the name of the scale) with the possible options enumerated (see also a recent Mastodon thread that has great suggestions on this topic).\r\n\r\n\r\n# Boolean options\r\nfun(verbose = TRUE)\r\nfun(verbose = FALSE)\r\n\r\n# Enumerated options\r\nfun(verbosity = \"all\")\r\nfun(verbosity = \"none\")\r\n\r\n\r\nI like this strategy of “naming the scale” because it gives off the impression to users that the possible options are values that lie on the scale. In the example above, it could either be the extremes \"all\" or \"none\", but also possibly somewhere in between if the writer of the function chooses to introduce more granular settings later.\r\nIs the argument truly binary? Still prefer enum and name the obvious/absence.\r\nSometimes a boolean argument may encode a genuinely binary choice of a true/false, on/off, yes/no option. But refactoring the boolean options as enum may still offer some benefits. In those cases, I prefer the strategy of name the obvious/absence.\r\nSome cases for improvement are easier to spot than others. An easy case is something like the REML argument in lme4::lmer(). Without going into too much detail, when REML = TRUE (default), the model optimizes the REML (restricted/residualized maximum likelihood) criterion in finding the best fitting model. But it’s not like the model doesn’t use any criteria for goodness of fit when REML = FALSE. Instead, when REML = FALSE, the function uses a different criterion of ML (maximum likelihood). So the choice is not really between toggling REML on or off, but rather betweem tje ise pf REML vs. ML. The enum version lets us spell out the assumed default and make the choice explicit (again, with room for introducing other criteria in the future):\r\n\r\n\r\n# Boolean options\r\nlmer::lme4(REML = TRUE)\r\nlmer::lme4(REML = FALSE)\r\n\r\n# Enumerated options\r\nlmer::lme4(criterion = \"REML\")\r\nlmer::lme4(criterion = \"ML\")\r\n\r\n\r\nA somewhat harder cases is a true presence-or-absence kind of a situation, where setting the argument to true/false essentially boils down to triggering an if block inside the function. For example, say a function has an option to use an optimizer called “MyOptim”. This may be implemented as:\r\n\r\n\r\n# Boolean options\r\nfun(optimize = TRUE)\r\nfun(optimize = FALSE)\r\n\r\n\r\nEven if the absence of optimization is not nameable, you could just call that option something like \"none\" for the enum pattern, which makes the choices explicit:\r\n\r\n\r\n# Enumerated options\r\nfun(optimizer = \"MyOptim\")\r\nfun(optimizer = \"none\")\r\n\r\n\r\nOf course, the more difficult case is when the thing that’s being toggled isn’t really nameable. I think this is more often the case in practice, and may be the reason why there are many verb-y names for arguments with boolean options. Like, you wrote some code that optimizes something, but you have no name for it, so the argument that toggles it simple refers to its function like “should the function optimize?”.\r\nBut not all is lost. I think one way out of this would be to enumerate over placeholders, not necessarily names. So something like:\r\n\r\n\r\n# Enumerated options (placeholders)\r\nfun(optimizer = 1) # bespoke optimizer\r\nfun(optimizer = 0) # none\r\n\r\n\r\nThen the documentation can clarify what the placeholder values 0, 1, etc. represent in longer, paragraph form, to describe what they do without the pressure of having to name the options.6 It’s not pretty, but I don’t think there will ever be a pretty solution to this problem if you want to avoid naming things entirely.\r\nMove shared strings across options into the argument name\r\nThis one is simple and easily demonstrated with an example. Consider the matrix() function for constructing a matrix. It has an argument byrow which fills the matrix by column when FALSE (default) or by row when TRUE. The argument controls the margin of fill, so we could re-write it as a fill argument like so:\r\n\r\n\r\n# Boolean options\r\nmatrix(byrow = FALSE)\r\nmatrix(byrow = TRUE)\r\n\r\n# Enumerated options\r\nmatrix(fill = \"bycolumn\")\r\nmatrix(fill = \"byrow\")\r\n\r\n\r\nThe options \"bycolumn\" and \"byrow\" share the “by” string, so we could move that into the argument name:\r\n\r\n\r\nmatrix(fill_by = \"column\")\r\nmatrix(fill_by = \"row\")\r\n\r\n\r\nAt this point I was also wondering whether the enumerated options should have the shortened \"col\" or the full \"column\" name. At the moment I’m less decided about this, but note that given the partial matching behavior in match.arg(), you could get away with matrix(fill_by = \"col\") in both cases.\r\nAt least from the book’s examples, it looks like shortening is ok for the options. To repeat the vctrs::vec_sort() example from earlier:\r\n\r\n\r\nvctrs::vec_sort(x, direction = \"desc\") # vs. \"descending\"\r\nvctrs::vec_sort(x, direction = \"asc\") # vs. \"ascending\"\r\n\r\n\r\nI was actually kind of surprised by this when I first saw it, and I have mixed feelings especially for \"asc\" since that’s not very frequent as a shorthand for “ascending” (e.g., {dplyr} has desc() but not a asc() equivalent - see also the previous section on “naming the obvious”). So I feel like I’d prefer for this to be spelled out in full in the function, and users can still loosely do partial matching in practice.7\r\n\r\nThe fun part of reading the book for me is not necessarily about discovering new patterns, but about being able to put a name to them and think more critically about their pros and cons.↩︎\r\nTo quote the book: “… this is a pattern that we only discovered relatively recently”↩︎\r\nThe book describes the awkwardness of decreasing = FALSE as “feels like a double negative”, but I think this is just a general, pervasive problem of pragmatic ambiguity with negation, and this issue of “what exactly is being negated?” is actually one of my research topics! Negation is interpreted with respect to the relevant and accessible alternatives (which “desc” vs. “asc” establishes very well) - in turn, recovering the intended meaning of the negation is difficult deprived of that context (like in the case of “direction = TRUE/FALSE”). See: Alternative Semantics.↩︎\r\nTo pre-empt the preference for short argument names, the fact that users don’t reach for these arguments in everyday use of facet_grid() should loosen that constraint for short, easy-to-type names. IMO the “too much to type” complaint since time immemorial is already obviated by auto-complete, and should frankly just be ignored for the designing these kinds of esoteric arguments that only experienced users would reach for in very specific circumstances.↩︎\r\nTry this from the view point of both the developer and the user!↩︎\r\nIMO, {collapse} does a very good job at this (see ?TRA).↩︎\r\nOf course, the degree to which you’d encourage this should depend on how sure you are about the stability of the current set of enumerated options.↩︎\r\n",
+ "contents": "\r\n\r\nContents\r\nTake the argument name and negate it - is the intention clear?\r\nLook at the argument name - is it verb-y without an object?\r\nIs the argument a scalar adjective? Consider naming the scale.\r\nIs the argument truly binary? Still prefer enum and name the obvious/absence.\r\nMove shared strings across options into the argument name\r\n\r\nI’ve been having a blast reading through the Tidy design principles book lately - it’s packed with just the kind of stuff I needed to hear at this stage of my developer experience. And actually, I started writing packages in the post-{devtools}/R Packages era, so I wasn’t too surprised to find that my habits already align with many of the design principles advocated for in the book.1\r\nBut there was one pattern which took me a bit to fully wrap my head around (and be fully convinced by). It’s first introduced in the chapter “Enumerate possible options” which gives a pretty convincing example of the base R function rank(). rank() has a couple options for resolving ties between values which are exposed to the user via the ties.method argument. The default value of this argument is a vector that enumerates all the possible options, and the user’s choice of (or the lack of) an option is resolved through match.arg() and then the appropriate algorithm is called via a switch() statement.\r\nThis is all good and well, but the book takes it a step further in a later chapter “Prefer an enum, even if only two choices”, which outlines what I personally consider to be one of the more controversial (and newer2) strategies advocated for in the book. It’s a specific case of the “enumerate possible options” principle applied to boolean arguments, and is best understood with an example (of sort() vs. vctrs::vec_sort(), from the book):\r\n\r\n\r\n# Booolean options\r\nsort(x, decreasing = TRUE)\r\nsort(x, decreasing = FALSE)\r\n\r\n# Enumerated options\r\nvctrs::vec_sort(x, direction = \"desc\")\r\nvctrs::vec_sort(x, direction = \"asc\")\r\n\r\n\r\nThe main argument for this pattern is one of clarity. In the case of the example above, it is unclear from reading decreasing = FALSE whether that expresses “sort in the opposite of decreasing order (i.e., increasing/ascending)” or “do not sort in decreasing order (ex: leave it alone)”. The former is the correct interpretation, and this is expressed much clearer with direction = \"asc\", which contrasts with the other option direction = \"desc\".3\r\nI’ve never used this pattern for boolean options previously, but it’s been growing on me and I’m starting to get convinced. But in thinking through its implementation for refactoring code that I own and/or use, I got walled by the hardest problem in CS: naming things. A lot has been said on how to name things, but I’ve realized that the case of “turn booleans into enums” raises a whole different naming problem, one where you have to be precise about what’s being negated, the alternatives that are being contrasted, and the scale that the enums lie on.\r\nWhat follows are my somewhat half-baked, unstructured thoughts on some heuristics that I hope can be useful for determining when to apply the “enumerate possible options” principle for boolean options, and how to rename them in the refactoring.\r\nTake the argument name and negate it - is the intention clear?\r\nOne good litmus test for whether you should convert your boolean option into an enum is to take the argument name X and turn it into “X” and “not-X” - is the intended behavior expressed clearly in the context of the function? If, conceptually, the options are truly and unambiguously binary, then it should still make sense. But if the TRUE/FALSE options assume a very particular contrast which is difficult to recover from just reading “X” vs. “not-X”, consider using an enum for the two options.\r\nTo take sort() as an example again, imagine if we were to re-write it as:\r\n\r\n\r\nsort(option = \"decreasing\")\r\nsort(option = \"not-decreasing\")\r\n\r\n\r\nIf \"decreasing\" vs. \"not-decreasing\" is ambiguous, then maybe that’s a sign to consider ditching the boolean pattern and spell out the options more explicitly with e.g., direction = \"desc\" and direction = \"asc\", as vctrs::vec_sort() does. I also think this is a useful exercise because it reflects the user’s experience when encountering boolean options.\r\nLook at the argument name - is it verb-y without an object?\r\nLet’s take a bigger offender of this principle as an example: ggplot2::facet_grid(). facet_grid() is a function that I use all the time, and it has a couple boolean arguments which makes no immediate sense to me. Admittedly, I’ve never actually used them in practice, but from all my experience with {ggplot2} and facet_grid(), shouldn’t I be able to get at least some clues as to what they do from reading the arguments?4\r\n\r\n\r\nFilter(is.logical, formals(ggplot2::facet_grid))\r\n\r\n $shrink\r\n [1] TRUE\r\n \r\n $as.table\r\n [1] TRUE\r\n \r\n $drop\r\n [1] TRUE\r\n \r\n $margins\r\n [1] FALSE\r\n\r\nTake for example the shrink argument. Right off the bat it already runs into the problem where it’s not clear what we’re shrinking. I find this to be a general problem with boolean arguments: they’re often verbs with the object omitted (presumably to save keystrokes). Using the heuristic of negating the argument, we get “shrink” vs. “don’t shrink”, which not only repeats the problem of the ambiguity of negation as we saw with sort() previously, but also exposes how serious the problem of missing the object of the verb is.\r\nAt this point you may be wondering what exactly the shrink argument does at all. From the docs:\r\n\r\nIf TRUE, will shrink scales to fit output of statistics, not raw data. If FALSE, will be range of raw data before statistical summary.\r\n\r\nThe intended contrast seems to be one of “statistics” (default) vs. “raw data”, so these are obvious candidates for our enum refactoring. But something like shrink = c(\"statistics\", \"raw-data\") doesn’t quite cut it yet, because the object of shrinking is not the data, but the scales. So to be fully informative, the argument name should complete the verb phrase (i.e., include the object).\r\nCombining the observations from above, I think the following makes more sense:\r\n\r\n\r\n# Boolean options\r\nfacet_grid(shrink = TRUE)\r\nfacet_grid(shrink = FALSE)\r\n\r\n# Enumerated options\r\nfacet_grid(shrink_scales_to = \"statistics\")\r\nfacet_grid(shrink_scales_to = \"raw-data\")\r\n\r\n\r\nThis last point is a bit of a tangent, but after tinkering with the behavior of shrink more, I don’t think “shrink” is a particularly useful description here. I might actually prefer something more neutral like fit_scales_to.\r\nIs the argument a scalar adjective? Consider naming the scale.\r\nLoosely speaking, scalar (a.k.a. gradable) adjectives are adjectives that can be strengthened (or weakened) - English grammar can express this with the suffixes “-er” and “-est”. For example, “tall” is a scalar adjective because you can say “taller” and “tallest”, and scalar adjectives are called such because they lie on a scale (in this case, the scale of height). Note that the quality of an adjective as a scalar one is not so clear though, as you can “more X” or “most X” just about any adjective X (e.g., even true vs. false can lie on a scale of more true or more false) - what matters more is if saying something like “more X” makes sense in the context of where X is found (e.g., the context of the function).5 If so, you’re dealing with a scalar adjective.\r\nThis Linguistics 101 tangent is relevant here because I often see boolean arguments named after scalar adjectives, but I feel like in those cases it’s better to just name the scale itself (which in turn makes the switch to enum more natural).\r\nA contrived example would be if a function had a boolean argument called tall. To refactor this into an enum, we can rename the argument to the scale itself (height) and enumerate the two end points:\r\n\r\n\r\n# Boolean options\r\nfun(tall = TRUE)\r\nfun(tall = FALSE)\r\n\r\n# Enumerated options\r\nfun(height = \"tall\")\r\nfun(height = \"short\")\r\n\r\n\r\nA frequent offender of the enum principle in the wild is the verbose argument. verbose is an interesting case study because it suffers from the additional problem of there possibly being more than 2 options as the function matures. The book offers some strategies for remedying these kinds of problems after-the-fact, but I think a proactive solution is to name the argument verbosity (the name of the scale) with the possible options enumerated (see also a recent Mastodon thread that has great suggestions on this topic).\r\n\r\n\r\n# Boolean options\r\nfun(verbose = TRUE)\r\nfun(verbose = FALSE)\r\n\r\n# Enumerated options\r\nfun(verbosity = \"all\")\r\nfun(verbosity = \"none\")\r\n\r\n\r\nI like this strategy of “naming the scale” because it gives off the impression to users that the possible options are values that lie on the scale. In the example above, it could either be the extremes \"all\" or \"none\", but also possibly somewhere in between if the writer of the function chooses to introduce more granular settings later.\r\nIs the argument truly binary? Still prefer enum and name the obvious/absence.\r\nSometimes a boolean argument may encode a genuinely binary choice of a true/false, on/off, yes/no option. But refactoring the boolean options as enum may still offer some benefits. In those cases, I prefer the strategy of name the obvious/absence.\r\nSome cases for improvement are easier to spot than others. An easy case is something like the REML argument in lme4::lmer(). Without going into too much detail, when REML = TRUE (default), the model optimizes the REML (restricted/residualized maximum likelihood) criterion in finding the best fitting model. But it’s not like the model doesn’t use any criteria for goodness of fit when REML = FALSE. Instead, when REML = FALSE, the function uses a different criterion of ML (maximum likelihood). So the choice is not really between toggling REML on or off, but rather between the choice of REML vs. ML. The enum version lets us spell out the assumed default and make the choice explicit (again, with room for introducing other criteria in the future):\r\n\r\n\r\n# Boolean options\r\nlmer::lme4(REML = TRUE)\r\nlmer::lme4(REML = FALSE)\r\n\r\n# Enumerated options\r\nlmer::lme4(criterion = \"REML\")\r\nlmer::lme4(criterion = \"ML\")\r\n\r\n\r\nA somewhat harder cases is a true presence-or-absence kind of a situation, where setting the argument to true/false essentially boils down to triggering an if block inside the function. For example, say a function has an option to use an optimizer called “MyOptim”. This may be implemented as:\r\n\r\n\r\n# Boolean options\r\nfun(optimize = TRUE)\r\nfun(optimize = FALSE)\r\n\r\n\r\nEven if the absence of optimization is not nameable, you could just call that option something like \"none\" for the enum pattern, which makes the choices explicit:\r\n\r\n\r\n# Enumerated options\r\nfun(optimizer = \"MyOptim\")\r\nfun(optimizer = \"none\")\r\n\r\n\r\nOf course, the more difficult case is when the thing that’s being toggled isn’t really nameable. I think this is more often the case in practice, and may be the reason why there are many verb-y names for arguments with boolean options. Like, you wrote some code that optimizes something, but you have no name for it, so the argument that toggles it simple refers to its function like “should the function optimize?”.\r\nBut not all is lost. I think one way out of this would be to enumerate over placeholders, not necessarily names. So something like:\r\n\r\n\r\n# Enumerated options (placeholders)\r\nfun(optimizer = 1) # bespoke optimizer\r\nfun(optimizer = 0) # none\r\n\r\n\r\nThen the documentation can clarify what the placeholder values 0, 1, etc. represent in longer, paragraph form, to describe what they do without the pressure of having to name the options.6 It’s not pretty, but I don’t think there will ever be a pretty solution to this problem if you want to avoid naming things entirely.\r\nMove shared strings across options into the argument name\r\nThis one is simple and easily demonstrated with an example. Consider the matrix() function for constructing a matrix. It has an argument byrow which fills the matrix by column when FALSE (default) or by row when TRUE. The argument controls the margin of fill, so we could re-write it as a fill argument like so:\r\n\r\n\r\n# Boolean options\r\nmatrix(byrow = FALSE)\r\nmatrix(byrow = TRUE)\r\n\r\n# Enumerated options\r\nmatrix(fill = \"bycolumn\")\r\nmatrix(fill = \"byrow\")\r\n\r\n\r\nThe options \"bycolumn\" and \"byrow\" share the “by” string, so we could move that into the argument name:\r\n\r\n\r\nmatrix(fill_by = \"column\")\r\nmatrix(fill_by = \"row\")\r\n\r\n\r\nAt this point I was also wondering whether the enumerated options should have the shortened \"col\" or the full \"column\" name. At the moment I’m less decided about this, but note that given the partial matching behavior in match.arg(), you could get away with matrix(fill_by = \"col\") in both cases.\r\nAt least from the book’s examples, it looks like shortening is ok for the options. To repeat the vctrs::vec_sort() example from earlier:\r\n\r\n\r\nvctrs::vec_sort(x, direction = \"desc\") # vs. \"descending\"\r\nvctrs::vec_sort(x, direction = \"asc\") # vs. \"ascending\"\r\n\r\n\r\nI was actually kind of surprised by this when I first saw it, and I have mixed feelings especially for \"asc\" since that’s not very frequent as a shorthand for “ascending” (e.g., {dplyr} has desc() but not a asc() equivalent - see also the previous section on “naming the obvious”). So I feel like I’d prefer for this to be spelled out in full in the function, and users can still loosely do partial matching in practice.7\r\n\r\nThe fun part of reading the book for me is not necessarily about discovering new patterns, but about being able to put a name to them and think more critically about their pros and cons.↩︎\r\nTo quote the book: “… this is a pattern that we only discovered relatively recently”↩︎\r\nThe book describes the awkwardness of decreasing = FALSE as “feels like a double negative”, but I think this is just a general, pervasive problem of pragmatic ambiguity with negation, and this issue of “what exactly is being negated?” is actually one of my research topics! Negation is interpreted with respect to the relevant and accessible alternatives (which “desc” vs. “asc” establishes very well) - in turn, recovering the intended meaning of the negation is difficult deprived of that context (like in the case of “direction = TRUE/FALSE”). See: Alternative Semantics.↩︎\r\nTo pre-empt the preference for short argument names, the fact that users don’t reach for these arguments in everyday use of facet_grid() should loosen that constraint for short, easy-to-type names. IMO the “too much to type” complaint since time immemorial is already obviated by auto-complete, and should frankly just be ignored for the designing these kinds of esoteric arguments that only experienced users would reach for in very specific circumstances.↩︎\r\nTry this from the view point of both the developer and the user!↩︎\r\nIMO, {collapse} does a very good job at this (see ?TRA).↩︎\r\nOf course, the degree to which you’d encourage this should depend on how sure you are about the stability of the current set of enumerated options.↩︎\r\n",
"preview": {},
- "last_modified": "2024-07-21T19:18:03+09:00",
+ "last_modified": "2024-07-21T20:15:10+09:00",
"input_file": {}
},
{
diff --git a/docs/search.json b/docs/search.json
index 0eb94cb..f49a5ef 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -5,7 +5,7 @@
"title": "Blog Posts",
"author": [],
"contents": "\r\n\r\n\r\n\r\n\r\n",
- "last_modified": "2024-07-21T19:18:49+09:00"
+ "last_modified": "2024-07-21T20:17:21+09:00"
},
{
"path": "index.html",
@@ -13,21 +13,21 @@
"description": "Ph.D. Candidate in Linguistics",
"author": [],
"contents": "\r\n\r\n\r\n\r\n\r\n\r\n\r\n Education\r\n\r\n\r\nB.A. (hons.) Northwestern University (2016–20)\r\n\r\n\r\nPh.D. University of Pennsylvania (2020 ~)\r\n\r\n\r\n Interests\r\n\r\n\r\n(Computational) Psycholinguistics\r\n\r\n\r\nLanguage Acquisition\r\n\r\n\r\nSentence Processing\r\n\r\n\r\nProsody\r\n\r\n\r\nQuantitative Methods\r\n\r\n\r\n\r\n\r\n\r\n Methods:\r\n\r\nWeb-based experiments, eye-tracking, self-paced reading, corpus analysis\r\n\r\n\r\n\r\n Programming:\r\n\r\nR (fluent) | HTML/CSS, Javascript, Julia (proficient) | Python (coursework)\r\n\r\n\r\n\r\n\r\n\r\nI am a PhD candidate in Linguistics at the University of Pennsylvania, and a student affiliate of Penn MindCORE and the Language and Communication Sciences program. I am a psycholinguist broadly interested in experimental approaches to studying meaning, of various flavors. My advisor is Anna Papafragou and I am a member of the Language & Cognition Lab.\r\nI received my B.A. in Linguistics from Northwestern University, where I worked with Jennifer Cole, Masaya Yoshida, and Annette D’Onofrio. I also worked as a research assistant for the Language, Education, and Reading Neuroscience Lab. My thesis explored the role of prosodic focus in garden-path reanalysis.\r\nBeyond linguistics research, I have interests in data visualization, science communication, and the R programming language. I author packages in statistical computing and graphics (ex: ggtrace, jlmerclusterperm) and collaborate on other open-source software (ex: openalexR, pointblank). I also maintain a technical blog as a hobby and occasionally take on small statistical consulting projects.\r\n\r\n\r\n\r\n\r\ncontact me: yjchoe@sas.upenn.edu\r\n\r\n\r\n\r\n\r\n\r\n\r\n",
- "last_modified": "2024-07-21T19:18:51+09:00"
+ "last_modified": "2024-07-21T20:17:22+09:00"
},
{
"path": "news.html",
"title": "News",
"author": [],
"contents": "\r\n\r\n\r\nFor more of my personal news external/tangential to research\r\n2023\r\nAugust\r\nI was unfortunately not able to make it in person to JSM 2023 but have my pre-recorded talk has been uploaded!\r\nJune\r\nMy package jlmerclusterperm was published on CRAN!\r\nApril\r\nI was accepted to SMLP (Summer School on Statistical Methods for Linguistics and Psychology), to be held in September at the University of Potsdam, Germany! I will be joining the “Advanced methods in frequentist statistics with Julia” stream. Huge thanks to MindCORE for funding my travels to attend!\r\nJanuary\r\nI received the ASA Statistical Computing and Graphics student award for my paper Sublayer modularity in the Grammar of Graphics! I will be presenting my work at the 2023 Joint Statistical Meetings in Toronto in August.\r\n2022\r\nSeptember\r\nI was invited to a Korean data science podcast dataholic (데이터홀릭) to talk about my experience presenting at the RStudio and useR conferences! Part 1, Part 2\r\nAugust\r\nI led a workshop on IBEX and PCIbex with Nayoun Kim at the Seoul International Conference on Linguistics (SICOL 2022).\r\nJuly\r\nI attended my first in-person R conference at rstudio::conf(2022) and gave a talk on ggplot internals.\r\nJune\r\nI gave a talk on my package {ggtrace} at the useR! 2022 conference. I was awarded the diversity scholarship which covered my registration and workshop fees. My reflections\r\nI gave a talk at RLadies philly on using dplyr’s slice() function for row-relational operations.\r\n2021\r\nJuly\r\nMy tutorial on custom fonts in R was featured as a highlight on the R Weekly podcast!\r\nJune\r\nI gave a talk at RLadies philly on using icon fonts for data viz! I also wrote a follow-up blog post that goes deeper into font rendering in R.\r\nMay\r\nSnowGlobe, a project started in my undergrad, was featured in an article by the Northwestern University Library. We also had a workshop for SnowGlobe which drew participants from over a hundred universities!\r\nJanuary\r\nI joined Nayoun Kim for a workshop on experimental syntax conducted in Korean and held at Sungkyunkwan University (Korea). I helped design materials for a session on scripting online experiments with IBEX, including interactive slides made with R!\r\n2020\r\nNovember\r\nI joined designer Will Chase on his stream to talk about the psycholinguistics of speech production for a data viz project on Michael’s speech errors in The Office. It was a very cool and unique opportunity to bring my two interests together!\r\nOctober\r\nMy tutorial on {ggplot2} stat_*() functions was featured as a highlight on the R Weekly podcast, which curates weekly updates from the R community.\r\nI became a data science tutor at MindCORE to help researchers at Penn with data visualization and R programming.\r\nSeptember\r\nI have moved to Philadelphia to start my PhD in Linguistics at the University of Pennsylvania!\r\nJune\r\nI graduated from Northwestern University with a B.A. in Linguistics (with honors)! I was also elected into Phi Beta Kappa and appointed as the Senior Marshal for Linguistics.\r\n\r\n\r\n\r\n",
- "last_modified": "2024-07-21T19:18:56+09:00"
+ "last_modified": "2024-07-21T20:17:24+09:00"
},
{
"path": "research.html",
"title": "Research",
"author": [],
"contents": "\r\n\r\nContents\r\nPeer-reviewed Papers\r\nConference Talks\r\nConference Presentations\r\nWorkshops led\r\nGuest lectures\r\nResearch activities in FOSS\r\nPapers\r\nTalks\r\nSoftware\r\n\r\nService\r\nEditor\r\nReviewer\r\n\r\n\r\nLinks: Google Scholar, Github, OSF\r\nPeer-reviewed Papers\r\nJune Choe, and Anna Papafragou. (2023). The acquisition of subordinate nouns as pragmatic inference. Journal of Memory and Language, 132, 104432. DOI: https://doi.org/10.1016/j.jml.2023.104432. PDF OSF\r\nJune Choe, Yiran Chen, May Pik Yu Chan, Aini Li, Xin Gao, and Nicole Holliday. (2022). Language-specific Effects on Automatic Speech Recognition Errors for World Englishes. In Proceedings of the 29th International Conference on Computational Linguistics, 7177–7186.\r\nMay Pik Yu Chan, June Choe, Aini Li, Yiran Chen, Xin Gao, and Nicole Holliday. (2022). Training and typological bias in ASR performance for world Englishes. In Proceedings of Interspeech 2022, 1273-1277. DOI: 10.21437/Interspeech.2022-10869\r\nJune Choe, Masaya Yoshida, and Jennifer Cole. (2022). The role of prosodic focus in the reanalysis of garden path sentences: Depth of semantic processing impedes the revision of an erroneous local analysis. Glossa Psycholinguistics, 1(1). DOI: 10.5070/G601136\r\nJune Choe, and Anna Papafragou. (2022). The acquisition of subordinate nouns as pragmatic inference: Semantic alternatives modulate subordinate meanings. In Proceedings of the Annual Meeting of the Cognitive Science Society, 44, 2745-2752.\r\nSean McWeeny, Jinnie S. Choi, June Choe, Alexander LaTourette, Megan Y. Roberts, and Elizabeth S. Norton. (2022). Rapid automatized naming (RAN) as a kindergarten predictor of future reading in English: A systematic review and meta-analysis. Reading Research Quarterly, 57(4), 1187–1211. DOI: 10.1002/rrq.467\r\nConference Talks\r\nJune Choe. Distributional signatures of superordinate nouns. Talk at the 10th MACSIM conference. 6 April 2024. University of Maryland, College Park, MD.\r\nJune Choe. Sub-layer modularity in the Grammar of Graphics. Talk at the 2023 Joint Statistical Meetings, 5-10 August 2023. Toronto, Canada. American Statistical Association (ASA) student paper award in Statistical Computing and Graphics. Paper\r\nJune Choe. Persona-based social expectations in sentence processing and comprehension. Talk at the Language, Stereotypes & Social Cognition workshop, 22-23 May, 2023. University of Pennsylvania, PA.\r\nJune Choe, and Anna Papafragou. Lexical alternatives and the acquisition of subordinate nouns. Talk at the 47th Boston University Conference on Language Development (BUCLD), 3-6 November, 2022. Boston University, Boston, MA. Slides\r\nJune Choe, Yiran Chen, May Pik Yu Chan, Aini Li, Xin Gao and Nicole Holliday. (2022). Language-specific Effects on Automatic Speech Recognition Errors in American English. Talk at the 28th International Conference on Computational Linguistics (CoLing), 12-17 October, 2022. Gyeongju, South Korea. Slides\r\nMay Pik Yu Chan, June Choe, Aini Li, Yiran Chen, Xin Gao and Nicole Holliday. (2022). Training and typological bias in ASR performance for world Englishes. Talk at the 23rd Conference of the International Speech Communication Association (INTERSPEECH), 18-22 September, 2022. Incheon, South Korea.\r\nConference Presentations\r\nJune Choe, and Anna Papafragou. Distributional signatures of superordinate nouns. Poster presented at the 48th Boston University Conference on Language Development (BUCLD), 2-5 November, 2023. Boston University, Boston, MA. Abstract Poster\r\nJune Choe, and Anna Papafragou. Pragmatic underpinnings of the basic-level bias. Poster presented at the 48th Boston University Conference on Language Development (BUCLD), 2-5 November, 2023. Boston University, Boston, MA. Abstract Poster\r\nJune Choe and Anna Papafragou. Discourse effects on the acquisition of subordinate nouns. Poster presented at the 9th Mid-Atlantic Colloquium of Studies in Meaning (MACSIM), 15 April 2023. University of Pennsylvania, PA.\r\nJune Choe and Anna Papafragou. Discourse effects on the acquisition of subordinate nouns. Poster presented at the 36th Annual Conference on Human Sentence Processing, 9-11 March 2022. University of Pittsburg, PA. Abstract Poster\r\nJune Choe, and Anna Papafragou. Acquisition of subordinate nouns as pragmatic inference: Semantic alternatives modulate subordinate meanings. Poster at the 2nd Experiments in Linguistic Meaning (ELM) conference, 18-20 May 2022. University of Pennsylvania, Philadelphia, PA.\r\nJune Choe, and Anna Papafragou. Beyond the basic level: Levels of informativeness and the acquisition of subordinate nouns. Poster at the 35th Annual Conference on Human Sentence Processing (HSP), 24-26 March 2022. University of California, Santa Cruz, CA.\r\nJune Choe, Jennifer Cole, and Masaya Yoshida. Prosodic Focus Strengthens Semantic Persistence. Poster at The 26th Architectures and Mechanisms for Language Processing (AMLaP), 3-5 September 2020. Potsdam, Germany. Abstract Video Slides\r\nJune Choe. Computer-assisted snowball search for meta-analysis research. Poster at The 2020 Undergraduate Research & Arts Exposition. 27-28 May 2020. Northwestern University, Evanston, IL. 2nd Place Poster Award. Abstract\r\nJune Choe. Social Information in Sentence Processing. Talk at The 2019 Undergraduate Research & Arts Exposition. 29 May 2019. Northwestern University, Evanston, IL. Abstract\r\nJune Choe, Shayne Sloggett, Masaya Yoshida and Annette D’Onofrio. Personae in syntactic processing: Socially-specific agents bias expectations of verb transitivity. Poster at The 32nd CUNY Conference on Human Sentence Processing. 29-31 March 2019. University of Colorado, Boulder, CO.\r\nD’Onofrio, Annette, June Choe and Masaya Yoshida. Personae in syntactic processing: Socially-specific agents bias expectations of verb transitivity. Poster at The 93rd Annual Meeting of the Linguistics Society of America. 3-6 January 2019. New York City, NY.\r\nWorkshops led\r\nIntroduction to mixed-effects models in Julia. Workshop at Penn MindCORE. 1 December 2023. Philadelphia, PA. Github Colab notebook\r\nExperimental syntax using IBEX/PCIBEX with Dr. Nayoun Kim. Workshop at the 2022 Seoul International Conference on Linguistics. 11-12 August 2022. Seoul, South Korea. PDF\r\nExperimental syntax using IBEX: a walkthrough with Dr. Nayoun Kim. 2021 BK Winter School-Workshop on Experimental Linguistics/Syntax at Sungkyunkwan University, 19-22 January 2021. Seoul, South Korea. PDF\r\nGuest lectures\r\nHard words and (syntactic) bootstrapping. LING 5750 “The Acquisition of Meaning”. Instructor: Dr. Anna Papafragou. Spring 2024, University of Pennsylvania.\r\nIntroduction to R for psychology research. PSYC 4997 “Senior Honors Seminar in Psychology”. Instructor: Dr. Coren Apicella. Spring 2024, University of Pennsylvania. Colab notebook\r\nModel fitting and diagnosis with MixedModels.jl in Julia. LING 5670 “Quantitative Study of Linguistic Variation”. Instructor: Dr. Meredith Tamminga. Fall 2023, University of Pennsylvania.\r\nSimulation-based power analysis for mixed-effects models. LING 5670 “Quantitative Study of Linguistic Variation”. Instructor: Dr. Meredith Tamminga. Spring 2023, University of Pennsylvania.\r\nResearch activities in FOSS\r\nPapers\r\nMassimo Aria, Trang Le, Corrado Cuccurullo, Alessandra Belfiore, and June Choe. (2024). openalexR: An R-tool for collecting bibliometric data from OpenAlex. The R Journal, 15(4), 166-179. Paper, Github\r\nJune Choe. (2022). Sub-layer modularity in the Grammar of Graphics. American Statistical Association (ASA) student paper award in Statistical Computing and Graphics. Paper, Github\r\nTalks\r\nJune Choe. Sub-layer modularity in the Grammar of Graphics. Talk at the 2023 Joint Statistical Meetings, 5-10 August 2023. Toronto, Canada.\r\nJune Choe. Fast cluster-based permutation test using mixed-effects models. Talk at the Integrated Language Science and Technology (ILST) seminar, 21 April 2023. University of Pennsylvania, PA.\r\nJune Choe. Cracking open ggplot internals with {ggtrace}. Talk at the 2022 RStudio Conference, 25-28 July 2022. Washington D.C. https://github.com/yjunechoe/ggtrace-rstudioconf2022\r\nJune Choe. Stepping into {ggplot2} internals with {ggtrace}. Talk at the 2022 useR! Conference, 20-23 June 2022. Vanderbilt University, TN. https://github.com/yjunechoe/ggtrace-user2022\r\nSoftware\r\nRich Iannone, June Choe, Mauricio Vargas Sepulveda. (2024). pointblank: Data Validation and Organization of Metadata for Local and Remote Tables. R package version 0.12.1. https://CRAN.R-project.org/package=pointblank. Github\r\nMassimo Aria, Corrado Cuccurullo, Trang Le, June Choe. (2024). openalexR: Getting Bibliographic Records from ‘OpenAlex’ Database Using ‘DSL’ API. R package version 1.2.3. https://CRAN.R-project.org/package=openalexR. Github\r\nJune Choe. (2024). jlmerclusterperm: Cluster-Based Permutation Analysis for Densely Sampled Time Data. R package version 1.1.3. https://cran.r-project.org/package=jlmerclusterperm. Github\r\nSean McWeeny, June Choe, & Elizabeth S. Norton. (2021). SnowGlobe: An Iterative Search Tool for Systematic Reviews and Meta-Analyses [Computer Software]. OSF\r\nService\r\nEditor\r\nPenn Working Papers in Linguistics (PWPL), Volumne 30, Issue 1.\r\nReviewer\r\nLanguage Learning and Development\r\nJournal of Open Source Software\r\nProceedings of the Annual Meeting of the Cognitive Science Society\r\n\r\n\r\n\r\n",
- "last_modified": "2024-07-21T19:19:00+09:00"
+ "last_modified": "2024-07-21T20:17:25+09:00"
},
{
"path": "resources.html",
@@ -35,14 +35,14 @@
"description": "Mostly for R and data visualization\n",
"author": [],
"contents": "\r\n\r\nContents\r\nLinguistics\r\nData Visualization\r\nPackages and software\r\nTutorial Blog Posts\r\nBy others\r\n\r\nLinguistics\r\nScripting online experiments with IBEX (workshop slides & materials with Nayoun Kim)\r\nData Visualization\r\n{ggplot2} style guide and showcase - most recent version (2/10/2021)\r\nCracking open the internals of ggplot: A {ggtrace} showcase - slides\r\nPackages and software\r\n{ggtrace}: R package for exploring, debugging, and manipulating ggplot internals by exposing the underlying object-oriented system in functional programming terms.\r\n{penngradlings}: R package for the University of Pennsylvania Graduate Linguistics Society.\r\n{LingWER}: R package for linguistic analysis of Word Error Rate for evaluating transcriptions and other speech-to-text output, using a deterministic matrix-based search algorithm optimized for R.\r\n{gridAnnotate}: R package for interactively annotating figures from the plot pane, using {grid} graphical objects.\r\nSnowGlobe: A tool for meta-analysis research. Developed with Jinnie Choi, Sean McWeeny, and Elizabeth Norton, with funding from the Northwestern University Library. Currently under development but basic features are functional. Validation experiments and guides at OSF repo.\r\nTutorial Blog Posts\r\n{ggplot2} stat_*() functions [post]\r\nCustom fonts in R [post]\r\n{purrr} reduce() family [post1, post2]\r\nThe correlation parameter in {lme4} mixed effects models [post]\r\nShortcuts for common chain of {dplyr} functions [post]\r\nPlotting highly-customizable treemaps with {treemap} and {ggplot2} [post]\r\nBy others\r\nTutorials:\r\nA ggplot2 Tutorial for Beautiful Plotting in R by Cédric Scherer\r\nggplot2 Wizardry Hands-On by Cédric Scherer\r\nggplot2 workshop by Thomas Lin Pedersen\r\nBooks:\r\nR for Data Science by Hadley Wickham and Garrett Grolemund\r\nR Markdown: The Definitive Guide by Yihui Xie, J. J. Allaire, and Garrett Grolemund\r\nggplot2: elegant graphics for data analysis by Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen\r\nFundamentals of Data Visualization by Claus O. Wilke\r\nEfficient R Programming by Colin Gillespie and Robin Lovelace\r\nAdvanced R by Hadley Wickham\r\n\r\n\r\n\r\n",
- "last_modified": "2024-07-21T19:19:09+09:00"
+ "last_modified": "2024-07-21T20:17:26+09:00"
},
{
"path": "software.html",
"title": "Software",
"author": [],
"contents": "\r\n\r\nContents\r\nggtrace\r\njlmerclusterperm\r\npointblank\r\nopenalexR\r\nggcolormeter\r\nddplot\r\nSnowglobe (retired)\r\n\r\nMain: Github profile, R-universe profile\r\nggtrace\r\n\r\n\r\n\r\nRole: Author\r\nLanguage: R\r\nLinks: Github, website, talks (useR! 2022, rstudio::conf 2022), paper\r\n\r\nProgrammatically explore, debug, and manipulate ggplot internals. Package {ggtrace} offers a low-level interface that extends base R capabilities of trace, as well as a family of workflow functions that make interactions with ggplot internals more accessible.\r\n\r\njlmerclusterperm\r\n\r\n\r\n\r\nRole: Author\r\nLanguage: R, Julia\r\nLinks: CRAN, Github, website\r\n\r\nAn implementation of fast cluster-based permutation analysis (CPA) for densely-sampled time data developed in Maris & Oostenveld (2007). Supports (generalized, mixed-effects) regression models for the calculation of timewise statistics. Provides both a wholesale and a piecemeal interface to the CPA procedure with an emphasis on interpretability and diagnostics. Integrates Julia libraries MixedModels.jl and GLM.jl for performance improvements, with additional functionalities for interfacing with Julia from ‘R’ powered by the JuliaConnectoR package.\r\n\r\npointblank\r\n\r\n\r\n\r\nRole: Author\r\nLanguage: R, HTML/CSS, Javascript\r\nLinks: Github, website\r\n\r\nData quality assessment and metadata reporting for data frames and database tables\r\n\r\nopenalexR\r\n\r\n\r\n\r\nRole: Contributor\r\nLanguage: R\r\nLinks: Github, website\r\n\r\nA set of tools to extract bibliographic content from the OpenAlex database using API https://docs.openalex.org.\r\n\r\nggcolormeter\r\nRole: Author\r\nLanguage: R\r\nLinks: Github\r\n\r\n{ggcolormeter} adds guide_colormeter(), a {ggplot2} color/fill legend guide extension in the style of a dashboard meter.\r\n\r\nddplot\r\nRole: Contributor\r\nLanguage: R, JavaScript\r\nLinks: Github, website\r\n\r\nCreate ‘D3’ based ‘SVG’ (‘Scalable Vector Graphics’) graphics using a simple ‘R’ API. The package aims to simplify the creation of many ‘SVG’ plot types using a straightforward ‘R’ API. The package relies on the ‘r2d3’ ‘R’ package and the ‘D3’ ‘JavaScript’ library. See https://rstudio.github.io/r2d3/ and https://d3js.org/ respectively.\r\n\r\nSnowglobe (retired)\r\nRole: Author\r\nLanguage: R, SQL\r\nLinks: Github, OSF, poster\r\n\r\nAn iterative search tool for systematic reviews and meta-analyses, implemented as a Shiny app. Retired due to the discontinuation of the Microsoft Academic Graph service in 2021. I now contribute to {openalexR}.\r\n\r\n\r\n\r\n\r\n",
- "last_modified": "2024-07-21T19:19:15+09:00"
+ "last_modified": "2024-07-21T20:17:28+09:00"
},
{
"path": "visualizations.html",
@@ -50,7 +50,7 @@
"description": "Select data visualizations",
"author": [],
"contents": "\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n",
- "last_modified": "2024-07-21T19:19:22+09:00"
+ "last_modified": "2024-07-21T20:17:30+09:00"
}
],
"collections": ["posts/posts.json"]
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index b75786e..9bc46af 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -30,7 +30,7 @@
https://yjunechoe.github.io/posts/2024-07-21-enumerate-possible-options/
- 2024-07-21T19:18:03+09:00
+ 2024-07-21T20:15:10+09:00
https://yjunechoe.github.io/posts/2024-06-09-ave-for-the-average/