Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PS Models Chapter #186

Merged
merged 2 commits into from
Sep 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions _freeze/chapters/chapter-01/execute-results/html.json

Large diffs are not rendered by default.

Binary file modified _freeze/chapters/chapter-01/figure-html/fig-ft-chart-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions _freeze/chapters/chapter-02/execute-results/html.json

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _freeze/chapters/chapter-02/figure-html/fig-tip-coef-net-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 5 additions & 3 deletions _freeze/chapters/chapter-03/execute-results/html.json

Large diffs are not rendered by default.

8 changes: 3 additions & 5 deletions _freeze/chapters/chapter-08/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
{
"hash": "1ca1c05cbaab44028148a78232fdfcbc",
"hash": "568bf0c2251ac748fd5b2fde89e14851",
"result": {
"markdown": "# Using the propensity score {#sec-using-ps}\n\n\n\n\n\n## Matching\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrnorm(5)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 0.41966 0.04805 0.08088 0.25936 0.34247\n```\n\n\n:::\n:::\n\n\n## Weighting\n",
"supporting": [
"chapter-08_files/figure-html"
],
"markdown": "# Using the propensity score {#sec-using-ps}\n\n\n\n\n\nThe propensity score is a *balancing* tool -- we use it to help us make our exposure groups *exchangeable*. There are many ways to incorporate the propensity score into an analysis. Commonly used techniques include stratification (estimating the causal effect within propensity score stratum), matching, weighting, and direct covariate adjustment. In this section, we will focus on *matching* and *weighting*; other techniques will be discussed once we introduce the *outcome model*. Recall at this point in the book we are still in the *design* phase. We have not yet incorporated the outcome into our analysis at all. \n\n## Matching\n\nUltimately, we want the exposed and unexposed observations to be *exchangeable* with respect to the confounders we have proposed in our DAG (so we can use the observed effect for one to estimate the counterfactual for the other). One way to do this is to ensure that each observation in our analysis sample has at least one observation of the opposite exposure that has *match*ing values for each of these confounders. If we had a small number of binary confounders, for example, we might be able to construct an *exact match* for observations (and only include those for whom such a match exists), but as the number and continuity of confounders increases, exact matching becomes less feasible. This is where the propensity score, a summary measure of all of the confounders, comes in to play. \n\nLet's setup the data as we did in @sec-building-models.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(broom)\nlibrary(touringplans)\n\nseven_dwarfs_9 <- seven_dwarfs_train_2018 |> filter(wait_hour == 9)\n```\n:::\n\n\nWe can re-fit the propensity score using the `{MatchIt}` package, as below. Notice here the `matchit` function fit a logistic regression model for our propensity score, as we had in @sec-building-models. There were 60 days in 2018 where the Magic Kingdom had extra magic morning hours. For each of these 60 exposed days, `matchit` found a comparable unexposed day, by implementing a nearest-neighbor match using the constructed propensity score. Examining the output, we also see that the target estimand is an \"ATT\" (do not worry about this yet, we will discuss this and several other estimands in @sec-estimands).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(MatchIt)\nm <- matchit(\n park_extra_magic_morning ~ park_ticket_season + park_close + park_temperature_high,\n data = seven_dwarfs_9)\nm\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nA matchit object\n - method: 1:1 nearest neighbor matching without replacement\n - distance: Propensity score\n - estimated with logistic regression\n - number of obs.: 354 (original), 120 (matched)\n - target estimand: ATT\n - covariates: park_ticket_season, park_close, park_temperature_high\n```\n\n\n:::\n:::\n\nWe can use the `get_matches` function to create a data frame with the original variables that only consists of those who were matched. Notice here our sample size has been reduced from the original 354 days to 120. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nmatched_data <- get_matches(m)\nglimpse(matched_data)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 120\nColumns: 18\n$ id <chr> \"5\", \"340\", \"12\", \"1…\n$ subclass <fct> 1, 1, 2, 2, 3, 3, 4,…\n$ weights <dbl> 1, 1, 1, 1, 1, 1, 1,…\n$ park_date <date> 2018-01-05, 2018-12…\n$ wait_hour <int> 9, 9, 9, 9, 9, 9, 9,…\n$ attraction_name <chr> \"Seven Dwarfs Mine T…\n$ wait_minutes_actual_avg <dbl> 33.0, 8.0, 114.0, 32…\n$ wait_minutes_posted_avg <dbl> 70.56, 80.62, 79.29,…\n$ attraction_park <chr> \"Magic Kingdom\", \"Ma…\n$ attraction_land <chr> \"Fantasyland\", \"Fant…\n$ park_open <time> 09:00:00, 09:00:00,…\n$ park_close <time> 24:00:00, 23:00:00,…\n$ park_extra_magic_morning <dbl> 1, 0, 1, 0, 1, 0, 1,…\n$ park_extra_magic_evening <dbl> 0, 0, 0, 0, 0, 0, 0,…\n$ park_ticket_season <chr> \"regular\", \"regular\"…\n$ park_temperature_average <dbl> 43.56, 57.61, 70.91,…\n$ park_temperature_high <dbl> 54.29, 65.44, 78.26,…\n$ distance <dbl> 0.18410, 0.18381, 0.…\n```\n\n\n:::\n:::\n\n\n## Weighting\n\nOne way to think about matching is as a crude \"weight\" where everyone who was matched gets a weight of 1 and everyone who was not matched gets a weight of 0 in the final sample. Another option is to allow this weight to be smooth, applying a weight to allow, on average, the covariates of interest to be balanced in the weighted population. To do this, we will construct a weight using the propensity score. There are many different weights that can be applied, depending on your target estimand of interest (see @sec-estimand for details). For this section, we will focus on the \"Average Treatment Effect\" weights, commonly referred to as an \"inverse probability weight\". The weight is constructed as follows, where each observation is weighted by the *inverse* of the probability of receiving the exposure they received. \n\n$$w_{ATE} = \\frac{X}{p} + \\frac{(1 - X)}{1 - p}$$\n\nFor example, if observation 1 had a very high likelihood of being exposed given their pre-exposure covariates ($p = 0.9$), but they in fact were *not* exposed, their weight would be 10 ($w_1 = 1 / (1 - 0.9)$). Likewise, if observation 2 had a very high likelihood of being exposed given their pre-exposure covariates ($p = 0.9$), and they *were* exposed, their weight would be 1.1 ($w_2 = 1 / 0.9$). Intuitively, we give *more* weight to observations who, based on their measured confounders, appear to have useful information for constructing a counterfactual -- we would have predicted that they were exposed and but by chance they were not, or vice-versa. The `{propensity}` package is useful for implementing propensity score weighting.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(propensity)\n\nseven_dwarfs_9_with_ps <-\n glm(\n park_extra_magic_morning ~ park_ticket_season + park_close + park_temperature_high,\n data = seven_dwarfs_9,\n family = binomial()\n ) |>\n augment(type.predict = \"response\", data = seven_dwarfs_9)\nseven_dwarfs_9_with_wt <- seven_dwarfs_9_with_ps |>\n mutate(w_ate = wt_ate(.fitted, park_extra_magic_morning)) \n```\n:::\n\n\n@tbl-df-wt shows the weights in the first column.\n\n\n::: {#tbl-df-wt .cell tbl-cap='The first six observations in the `seven_dwarfs_9_with_wt` dataset, including their propensity scores in the `.fitted` column and weight in the `w_ate` column.'}\n\n```{.r .cell-code}\nseven_dwarfs_9_with_wt |>\n select(\n w_ate,\n .fitted,\n park_date,\n park_extra_magic_morning,\n park_ticket_season,\n park_close,\n park_temperature_high\n ) |>\n head() |>\n knitr::kable()\n```\n\n::: {.cell-output-display}\n\n\n| w_ate| .fitted|park_date | park_extra_magic_morning|park_ticket_season |park_close | park_temperature_high|\n|-----:|-------:|:----------|------------------------:|:------------------|:----------|---------------------:|\n| 1.433| 0.3019|2018-01-01 | 0|peak |23:00:00 | 58.63|\n| 1.392| 0.2815|2018-01-02 | 0|peak |24:00:00 | 53.65|\n| 1.409| 0.2900|2018-01-03 | 0|peak |24:00:00 | 51.11|\n| 1.232| 0.1881|2018-01-04 | 0|regular |24:00:00 | 52.66|\n| 5.432| 0.1841|2018-01-05 | 1|regular |24:00:00 | 54.29|\n| 1.262| 0.2074|2018-01-06 | 0|regular |23:00:00 | 56.25|\n\n\n:::\n:::\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
Expand Down
Loading
Loading