Skip to content

Commit

Permalink
Merge pull request #302 from r-causal/ch5-ch6-edits
Browse files Browse the repository at this point in the history
Ch 5 & ch 6 edits
  • Loading branch information
malcolmbarrett authored Dec 24, 2024
2 parents caf73a4 + 64a88f4 commit f3f880f
Show file tree
Hide file tree
Showing 2 changed files with 82 additions and 11 deletions.
91 changes: 81 additions & 10 deletions chapters/04-dags.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -64,13 +64,13 @@ dag_data |>
)
```

The type of causal diagrams we use are also called directed acyclic graphs (DAGs)[^05-dags-1].
The type of causal diagrams we use are also called directed acyclic graphs (DAGs)[^04-dags-1].
These graphs are directed because they include arrows going in a specific direction.
They're acyclic because they don't go in circles; a variable can't cause itself, for instance.
DAGs are used for various problems, but we're specifically concerned with *causal* DAGs.
This class of DAGs is sometimes called Structural Causal Models (SCMs) because they are a model of the causal structure of a question [@hernan2021; @Pearl_Glymour_Jewell_2021].

[^05-dags-1]: An essential but rarely observed detail of DAGs is that dag is also an [affectionate Australian insult](https://en.wikipedia.org/wiki/Dag_(slang)) referring to the dung-caked fur of a sheep, a *daglock*.
[^04-dags-1]: An essential but rarely observed detail of DAGs is that dag is also an [affectionate Australian insult](https://en.wikipedia.org/wiki/Dag_(slang)) referring to the dung-caked fur of a sheep, a *daglock*.

DAGs depict causal relationships between variables.
Visually, the way they depict variables is as *edges* and *nodes*.
Expand Down Expand Up @@ -752,26 +752,97 @@ sim_data <- podcast_dag |>
sim_data
```

Since we have simulated this data, we know that this is a case where we can estimate the causal effect using a basic linear regression model.
@fig-dag-sim shows a forest plot of the simulated data based on our DAG.
Notice the model that only included the exposure resulted in a spurious effect (an estimate of -0.1 when we know the truth is 0).
In contrast, the model that adjusted for the two variables as suggested by `ggdag_adjustment_set()` is not spurious (much closer to 0).
@fig-dag-sim shows a forest plot of estimates using the simulated data based on our DAG.
One estimate is unadjusted and the other is adjusted for `mood` and `prepared`.
Notice the unadjusted estimate resulted in a spurious effect (an estimate of -0.1 when we know the truth is 0).
In contrast, the estimate that adjusted for the two variables as suggested by `ggdag_adjustment_set()` is not spurious (it's much closer to 0).

```{r}
#| label: fig-dag-sim
#| fig-cap: "Forest plot of simulated data based on the DAG described in @fig-dag-podcast."
#| code-fold: true
## Model that does not close backdoor paths
library(broom)
unadjusted_model <- lm(exam ~ podcast, sim_data) |>
tidy(conf.int = TRUE) |>
filter(term == "podcast") |>
mutate(formula = "podcast")
mutate(formula = "unadjusted")
## Model that closes backdoor paths
adjusted_model <- lm(exam ~ podcast + mood + prepared, sim_data) |>
tidy(conf.int = TRUE) |>
filter(term == "podcast") |>
mutate(formula = "podcast + mood + prepared")
mutate(formula = "mood + prepared")
bind_rows(
unadjusted_model,
adjusted_model
) |>
ggplot(aes(x = estimate, y = formula, xmin = conf.low, xmax = conf.high)) +
geom_vline(xintercept = 0, linewidth = 1, color = "grey80") +
geom_pointrange(fatten = 3, size = 1) +
theme_minimal(18) +
labs(
y = NULL,
caption = "correct effect size: 0"
)
```

Of course, we know we're working with the true DAG.
Let's say that, not knowing the true DAG (@fig-dag-podcast), we drew @fig-dag-podcast-wrong.

```{r}
#| label: fig-dag-podcast-wrong
#| fig-cap: "Proposed DAG to answer the question: Does listening to a comedy podcast the morning before an exam improve graduate students' test scores? This time, we proposed the wrong DAG."
#| fig-width: 4
#| fig-height: 4
#| warning: false
podcast_dag_wrong <- dagify(
podcast ~ humor + prepared,
exam ~ prepared,
coords = time_ordered_coords(
list(
# time point 1
c("prepared", "humor"),
# time point 2
"podcast",
# time point 3
"exam"
)
),
exposure = "podcast",
outcome = "exam",
labels = c(
podcast = "podcast",
exam = "exam score",
humor = "humor",
prepared = "prepared"
)
)
ggdag(podcast_dag_wrong, use_labels = "label", text = FALSE) +
theme_dag()
```

Since the DAG is wrong, it doesn't help us get the right answer.
It says we only need to adjust for `prepared`, but we are missing a causal pathway that is confounding the relationship.
Now, neither estimate is right.

```{r}
#| label: fig-dag-sim-wrong
#| fig-cap: "Forest plot of simulated data based on the DAG described in @fig-dag-podcast. However, we've analyzed it using the adjustment set from @fig-dag-podcast-wrong, giving us the wrong answer."
#| code-fold: true
## Model that does not close backdoor paths
library(broom)
unadjusted_model <- lm(exam ~ podcast, sim_data) |>
tidy(conf.int = TRUE) |>
filter(term == "podcast") |>
mutate(formula = "unadjusted")
## Model that closes backdoor paths
adjusted_model <- lm(exam ~ podcast + prepared, sim_data) |>
tidy(conf.int = TRUE) |>
filter(term == "podcast") |>
mutate(formula = "prepared")
bind_rows(
unadjusted_model,
Expand Down Expand Up @@ -1237,7 +1308,7 @@ That's a good thing: you know now where there is uncertainty in your DAG.
You can then examine the results from multiple plausible DAGs or address the uncertainty with sensitivity analyses.

If you have more than one candidate DAG, check their adjustment sets.
If two DAGs have overlapping adjustment sets, focus on those sets; then, you can move forward in a way that satisfies the plausible assumptions you have.
If two DAGs have any adjustment sets that are identical between them, focus on those sets; then, you can move forward in a way that satisfies the plausible assumptions you have.

### Consider your question

Expand Down Expand Up @@ -1276,7 +1347,7 @@ It's tempting to visualize that relationship like this:
#| label: fig-feedback-loop
#| fig-width: 4.5
#| fig-height: 3.5
#| fig-cap: "A DAG representing the reciprocal relationship between A/C use and global temperature because of global warming. Feedback loops are useful mental shorthands to describe variables that impact each other over time compactly, but they are not true causal diagrams."
#| fig-cap: "A conceptual diagram representing the reciprocal relationship between A/C use and global temperature because of global warming. Feedback loops are useful mental shorthands to describe variables that impact each other over time compactly, but they are not true causal diagrams."
dagify(
ac_use ~ global_temp,
global_temp ~ ac_use,
Expand Down
2 changes: 1 addition & 1 deletion chapters/05-not-just-a-stats-problem.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ causal_quartet |>

Standardizing numeric variables to have a mean of 0 and standard deviation of 1, as implemented in `scale()`, is a common technique in statistics.
It's useful for a variety of reasons, but we chose to scale the variables here to emphasize the identical correlation between `covariate` and `exposure` in each dataset.
If we didn't scale the variables, the correlation would be the same, but the plots would look different because their standard deviation are different.
If we didn't scale the variables, the correlation would be the same, but the plots would look different because their standard deviations are different.
The beta coefficient in an OLS model is calculated with information about the covariance and the standard deviation of the variable, so scaling it makes the coefficient identical to the Pearson's correlation.

@fig-causal_quartet_covariate_unscaled shows the unscaled relationship between `covariate` and `exposure`.
Expand Down

0 comments on commit f3f880f

Please sign in to comment.