Merge pull request #180 from tgerke/tg-edits

r-causal · Sep 13, 2023 · 4d397ef · 4d397ef
2 parents 75125a3 + 98234e2
commit 4d397ef
Show file tree

Hide file tree

Showing 3 changed files with 13 additions and 11 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -24,6 +24,7 @@ Imports:
     halfmoon (>= 0.0.0.9000),
     here,
     janitor,
+    kableExtra,
     lubridate,
     MatchIt,
     propensity (>= 0.0.0.9000),
@@ -44,12 +45,12 @@ Suggests:
     xml2
 Remotes: 
     gadenbuie/grkstyle,
+    hadley/emo,
     LucyMcGowan/tipr,
     LucyMcGowan/touringplans,
     malcolmbarrett/causalworkshop,
     malcolmbarrett/halfmoon,
-    malcolmbarrett/propensity,
-    hadley/emo
+    malcolmbarrett/propensity
 Encoding: UTF-8
 LazyData: true
 Roxygen: list(markdown = TRUE)

diff --git a/chapters/chapter-02.qmd b/chapters/chapter-02.qmd
@@ -34,7 +34,7 @@ Herodotus, the 5th century BC Greek author of *The Histories*, observed Egyptian
 Many modern nets are also treated with insecticide, dating back to Russian soldiers in World War II [@nevill1996], although some people still use them as fishing nets [@gettleman2015].
 
 It's easy to imagine a randomized trial that deals with this question: participants in a study are randomly assigned to use a bed net, and we follow them over time to see if there is a difference in malaria risk between groups.
-Randomization is often the best way to estimate a causal effect of an intervention because it reduces the number of assumptions we need to make for that estimate to be valid (we discussed the assumptions we need to make for causal inference in @sec-causal-question).
+Randomization is often the best way to estimate a causal effect of an intervention because it reduces the number of assumptions we need to make for that estimate to be valid (we will discuss these assumptions in @sec-assump).
 In particular, randomization addresses confounding very well, accounting for confounders about which we may not even know.
 
 Several landmark trials have studied the effects of bed net use on malaria risk, with several essential studies in the 1990s.
@@ -44,7 +44,7 @@ Still, a follow-up analysis of trials found that it has yet to impact the public
 
 Trials have also been influential in determining the economics of bed net programs.
 For instance, one trial compared free net distribution versus a cost-share program (where participants pay a subsidized fee for nets).
-The study's authors found that net uptake was similar between the groups and that free net distribution---because it was easier to access--saved more lives, and was cheaper per life saved than the cost-sharing program [@cohen2010].
+The study's authors found that net uptake was similar between the groups and that free net distribution --- because it was easier to access --- saved more lives, and was cheaper per life saved than the cost-sharing program [@cohen2010].
 
 There are several reasons we might not be able to conduct a randomized trial, including ethics, cost, and time.
 We have substantial, robust evidence in favor of bed net use.
@@ -96,7 +96,7 @@ In this particular data, [simulated by Dr. Andrew Heiss](https://evalsp21.classe
 
 Because we're using simulated data, we'll have direct access to a variable that measures the likelihood of contracting malaria, something we wouldn't likely have in real life.
 We'll stick with this measure because we know the actual effect size.
-We'll use simulated data, `net_data`, from the {causalworkshop} package, which includes ten variables:
+We'll use simulated data, `net_data`, from the {[causalworkshop](https://github.com/r-causal/causalworkshop)} package, which includes ten variables:
 
 <!-- (TODO: move this to causaldata?) -->
 
@@ -434,12 +434,13 @@ That's more in line with their observed value of `net`, but there's still some p
 ## Diagnose our models
 
 The goal of propensity score weighting is to weight the population of observations such that the distribution of confounders is balanced between the exposure groups.
+Put another way, we are, in principle, removing the associational arrows between confounders and exposure in the DAG, so that the confounding paths no longer affect our estimates.
 Here's the distribution of the propensity score by group, created by `geom_mirror_histogram()` from the halfmoon package for assessing balance in propensity score models (as well as visualizing the pseudo-population the weights simulate):
 
 ```{r}
 #| label: fig-mirror-histogram-net-data-unweighted
 #| fig.cap: >
-#|   A mirrored histogram of the propensity scores of those who used nets (top, blue) versus those who who did not use nets (bottom, light orange). The range of propensity scores is similar between groups, with those who used nets slightly to the left of those who didn't, but the shapes of the distribution are different.
+#|   A mirrored histogram of the propensity scores of those who used nets (top, blue) versus those who who did not use nets (bottom, orange). The range of propensity scores is similar between groups, with those who used nets slightly to the left of those who didn't, but the shapes of the distribution are different.
 library(halfmoon)
 ggplot(net_data_wts, aes(.fitted)) +
   geom_mirror_histogram(
@@ -455,7 +456,7 @@ The weighted propensity score creates a pseudo-population where the distribution
 ```{r}
 #| label: fig-mirror-histogram-net-data-weighted
 #| fig.cap: >
-#|   A mirrored histogram of the propensity scores of those who used nets (top, blue) versus those who who did not use nets (bottom, light orange). The shaded region represents the unweighted distribution, and the colored region represents the weighted distributions. The ATE weights up-weight the groups to be similar in range and shape of the distribution of propensity scores.
+#|   A mirrored histogram of the propensity scores of those who used nets (top, blue) versus those who who did not use nets (bottom, orange). The shaded region represents the unweighted distribution, and the colored region represents the weighted distributions. The ATE weights up-weight the groups to be similar in range and shape of the distribution of propensity scores.
 ggplot(net_data_wts, aes(.fitted)) +
   geom_mirror_histogram(
     aes(group = net),
@@ -481,14 +482,14 @@ Unfortunately, we still may have unmeasured confounding, which we'll discuss bel
 Randomization is one causal inference technique that *does* deal with unmeasured confounding, one of the reasons it is so powerful.
 :::
 
-We might also want to know how well-balanced the groups are by confounder.
+We might also want to know how well-balanced the groups are by each confounder.
 One way to do this is to calculate the standardized mean differences (SMDs) for each confounder with and without weights.
 We'll calculate the SMDs with `tidy_smd()` then plot them with `geom_love()`.
 
 ```{r}
 #| label: fig-love-plot-net-data
 #| fig.cap: >
-#|   A love plot representing the standardized mean differences (SMD) between exposure groups of three confounders: temperature, income, and health. Before weighting, there is considerable differences in the groups. After weighting, the confounders are much more balanced between groups.
+#|   A love plot representing the standardized mean differences (SMD) between exposure groups of three confounders: temperature, income, and health. Before weighting, there are considerable differences in the groups. After weighting, the confounders are much more balanced between groups.
 plot_df <- tidy_smd(
   net_data_wts,
   c(income, health, temperature),

diff --git a/chapters/chapter-03.qmd b/chapters/chapter-03.qmd
@@ -4,7 +4,7 @@
 
 ## Potential Outcomes {#sec-potential}
 
-Let's begin by thinking about the philosophical concept of a *potential outcome.* Prior to some "cause" occurring, for example receiving some exposure, the *potential outcomes* are all of the potential things that could occur depending on what you end up exposed to.
+Let's begin by thinking about the philosophical concept of a *potential outcome.* Prior to some "cause" occurring, for example receiving some exposure, the *potential outcomes* are all of the potential things that could occur depending on what you are exposed to.
 For simplicity, let's assume an exposure has two levels:
 
 -   $X=1$ if you are exposed
@@ -18,7 +18,7 @@ Under this simple scenario, there are two potential outcomes:
 -   $Y(0)$ the potential outcome if you are not exposed
 
 Only *one* of these potential outcomes will actually be realized, the one corresponding to the exposure that actually occurred, and therefore only one is observable.
-It is important to remember here that these exposures are defined at a particular instance in time, so only one can happen to any individual.
+It is important to remember that these exposures are defined at a particular instance in time, so only one can happen to any individual.
 In the case of a binary exposure, this leaves one potential outcome as *observable* and one *missing.* In fact, early causal inference methods were often framed as missing data problems; we need to make certain assumptions about the *missing counterfactuals*, the value of the potential outcome corresponding to the exposure(s) that did not occur.
 
 Our causal effect of interest is often some difference in potential outcomes $Y(1) - Y(0)$, averaged over a particular population.