diff --git a/_freeze/chapters/01-casual-to-causal/execute-results/html.json b/_freeze/chapters/01-casual-to-causal/execute-results/html.json new file mode 100644 index 0000000..c2e457f --- /dev/null +++ b/_freeze/chapters/01-casual-to-causal/execute-results/html.json @@ -0,0 +1,16 @@ +{ + "hash": "f86458ded0dd32dee70ad7ff6157ed07", + "result": { + "markdown": "\\mainmatter\n\n# From casual to causal {#sec-causal-question}\n\n\n\n\n\n## Casual inference\n\nThe heart of causal analysis is the causal question; it dictates what data we analyze, how we analyze it, and to which populations our inferences apply.\nThis book, being applied in nature, deals primarily with the analysis stage of causal inference.\nRelative to the complexity of specifying a good causal question, the analysis stage is considerably more straightforward.\nIn the first six chapters of this book, we'll discuss what a causal question is, how to improve our questions, and consider some examples.\n\nCausal questions are part of a broader set of questions we can ask with statistical techniques related to the primary tasks of data science: description, prediction, and causal inference [@hernan2019].\nUnfortunately, these tasks are often muddled by the techniques we use (regression, for instance, is helpful for all three tasks) and how we talk about them.\nWhen researchers are interested in causal inference from non-randomized data, we often use euphemistic language like \"association\" instead of declaring our intent to estimate a causal effect [@Hernan2018].\n\nIn a recent study of the language of analyses in epidemiologic research, for instance, the most common root word describing the estimated effect was \"associate,\" but many researchers also felt that \"associate\" implied at least *some* causal effect (@fig-word-ranking) [@haber_causal_language].\nOnly around 1% of the studies analyzed used the root word \"cause\" at all.\nYet, a third of studies had action recommendations, and researchers rated 80% of these recommendations as having at least some causal implication.\nOften, these studies have stronger action recommendations (alluding to causal effects) than those implied by the description of the effect (root words like \"associate\" and \"compare\").\nDespite how many studies implied that the goal was causal inference, only about 4% used formal causal models like those discussed in this book.\nHowever, most discussed how such a cause might be justified by previous research or theory.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Rankings of causal strength of root words used by researchers. Root words with more Strong rankings have stronger causal implications than those with many None or Weak rankings. Data from Haber et al.](01-casual-to-causal_files/figure-html/fig-word-ranking-1.png){#fig-word-ranking width=672}\n:::\n:::\n\n\nInstead of clear questions with obvious assumptions and goals, we end up with \"Schrödinger's causal inference\":\n\n> Our results suggest that \"Schrödinger's causal inference,\" — where studies avoid stating (or even explicitly deny) an interest in estimating causal effects yet are otherwise embedded with causal intent, inference, implications, and recommendations — is common.\n>\n> --- @haber_causal_language\n\nThis approach is one instance to *casual* inference: making inferences without doing the necessary work to understand causal questions and deal with the assumptions arround answering them.\n\n## Description, prediction, and explanation\n\nAn excellent first step to address this problem is recognizing that questions about description, prediction, and explanation are fundamentally different.\nData science in industry isn't quite as burdened by Schrödinger's causal inference as the academic sciences, but *casual* inference happens in a lot of other ways.\nFor instance, when a stakeholder asks for \"drivers\" of a particular event, what are they asking?\nFor a model to predict the event?\nFor a deeper understanding of what causes the event?\nIt's a vague request, but it smacks of causal interest to us; yet, many data scientists will be tempted to answer this question with a predictive model.\nWhen we're clear about our goals, we can use all three approaches more effectively (and, as we'll see, both descriptive analysis and prediction models are still helpful when the goal is to make causal inferences).\nMoreover, all three approaches are useful decision-making tools.\n\n### Description\n\nDescriptive analysis aims to describe the distribution of variables, often stratified by key variables of interest.\nA closely related idea is exploratory data analysis (EDA), although descriptive studies often have more explicit goals than those in EDA.\n\nDescriptive analyses are usually based on statistical summaries such as measures of centrality (means, medians) and spread (minimums, maximums, quartiles), but they also occasionally use techniques like regression modeling.\nThe goal of applying more advanced techniques like regression is different in descriptive analyses than in either predictive or causal studies.\n\"Adjusting\" for a variable in descriptive analyses means that we are removing its associational effect (and thus changing our question), *not* that we are controlling for confounding.\n\nIn epidemiology, a valuable concept for descriptive analyses is \"person, place, and time\" -- who has what disease, where, and when.\nThis concept is also a good template for descriptive analyses in other fields.\nUsually, we want to be clear about what population we're trying to describe, so we need to be as specific as possible.\nFor human health, describing the people involved, the location, and the period are all critical.\nIn other words, focus on the first principles of generating understanding of your data and describe your data accordingly.\n\n#### Examples\n\nCounting things is one of the best things we can do with data.\nEDA benefits both predictive and causal analyses, but descriptive analyses are valuable independent of the other analysis tasks.\nAsk any data scientist who thought they'd be developing complex machine learning models and found themselves spending most of their time on dashboards.\nUnderstanding the distributions of the data, particularly for key analysis goals (say, KPIs in industry or disease incidence in epidemiology), is critical for many types of decision-making.\n\nOne of the best recent examples of descriptive analyses arose from the COVID-19 pandemic [@Fox2022].\nIn 2020, particularly in the early months of the pandemic, descriptive analyses were vital to understanding risk and allocating resources.\nSince the coronavirus is similar to other respiratory diseases, we had many public health tools to reduce risk (e.g., distancing and, later, face masks).\nDescriptive statistics of cases by region were vital for deciding local policies and the strength of those policies.\n\nA great example of a more complex descriptive analysis during the pandemic was an [ongoing report by the Financial Times of expected deaths vs. observed deaths](https://www.ft.com/content/a2901ce8-5eb7-4633-b89c-cbdf5b386938) in various countries and regions[^3].\nWhile the calculation of expected deaths is slightly more sophisticated than most descriptive statistics, it provided a tremendous amount of information about current deaths without needing to untangle causal effects (e.g., were they due to COVID-19 directly? Inaccessible healthcare? Cardiovascular events post-COVID?).\nIn this (simplified) recreation of their plot from July 2020, you can see the staggering effect of the pandemic's early months.\n\n[^3]: John Burn-Murdoch was responsible for many of these presentations and gave a [fascinating talk on the subject](https://cloud.rstudio.com/resources/rstudioglobal-2021/reporting-on-and-visualising-the-pandemic/).\n\n\n::: {.cell}\n::: {.cell-output-display}\n![2020 excess deaths vs. historical expected deaths from any cause. Data from the Financial Times.](01-casual-to-causal_files/figure-html/fig-ft-chart-1.png){#fig-ft-chart width=960}\n:::\n:::\n\n\nHere are some other great examples of descriptive analyses.\n\n- Deforestation around the world. Our World in Data [@owidforestsanddeforestation] is a data journalism organization that produces thoughtful, usually descriptive reports on various topics. In this report, they present data visualizations of both absolute change in forest coverage (forest transitions) and relative change (deforestation or reforestation), using basic statistics and forestry theory to present helpful information about the state of forests over time.\n- The prevalence of chlamydial and gonococcal infections [@Miller2004]. Measuring the prevalence of disease (how many people currently have a disease, usually expressed as a rate per number of people) is helpful for basic public health (resources, prevention, education) and scientific understanding. In this study, the authors conducted a complex survey meant to be representative of all high schools in the United States (the target population); they used survey weights to address a variety of factors related to their question, then calculated prevalence rates and other statistics. As we'll see, weights are helpful in causal inference for the same reason: targeting a particular population. That said, not all weighting techniques are causal in nature, and they were not here.\n- Estimating race and ethnicity-specific hysterectomy inequalities [@Gartner2020]. Descriptive techniques also help us understand disparities in areas like economics and epidemiology. In this study, the authors asked: Does the risk of hysterectomy differ by racial or ethnic background? Although the analysis is stratified by a key variable, it's still descriptive. Another interesting aspect of this paper is the authors' work ensuring the research answered questions about the right target population. Their analysis combined several data sources to better estimate the true population prevalence (instead of the prevalence among those in hospitals, as commonly presented). They also adjusted for the prevalence of hysterectomy, e.g., they calculated the incidence (new case) rate only among those who could actually have a hysterectomy (e.g., they hadn't had one yet).\n\n#### Validity\n\nThere are two critical validity issues in descriptive analyses: measurement and sampling errors.\n\nMeasurement error is when we have mismeasured one or more variables in some capacity.\nFor descriptive analyses, mismeasuring things means we may not get the answer to our question.\nHowever, the degree to which that is the case depends on both the severity of the measurement error and the question itself.\n\nSampling error is a more nuanced topic in descriptive analyses.\nIt's related to the population we're analyzing (who should the analysis produce descriptions about) and uncertainty (how certain are we that the descriptions of those we have data for represent the population we're trying to describe.).\n\nThe population from which our data come and the population we're trying to describe must be the same for us to provide valid descriptions.\nConsider a dataset generated by an online survey.\nWho are the people who are answering these questions, and how do they relate to the people we want to describe?\nFor many analyses, the people who take the time to answer surveys are different than the people we want to describe, e.g., the group of people who fill out surveys may have a different distribution of variables than those who don't.\nResults from data like these are not technically biased because, outside of sample size-related uncertainty and measurement error, the descriptions are accurate---they're just not for the right group of people!\nIn other words, *we've gotten an answer to the wrong question*.\n\nNotably, sometimes our data represent the entire population (or close enough) that sampling error is irrelevant.\nConsider a company with certain data on every customer using their service.\nFor many analyses, this represents the entire population (current customers) about whom we want information.\nSimilarly, in countries with population-wide health registries, the data available for specific practical purposes is close enough to the entire population that no sampling is needed (although researchers might use sampling for simpler computations).\nIn these cases, there's not really such a thing as uncertainty.\nAssuming everything is well-measured, the summary statistics we generate *are inherently unbiased and precise* because we have information from everyone in the population.\nOf course, in practice, we usually have some mixture of measurement error, missing data, and so on, even in the best of circumstances.\n\nOne crucial detail of descriptive analysis is that confounding bias, one of the chief concerns of this book, is undefined.\nThat's because confounding is a causal concern.\nDescriptive analyses cannot be confounded because they are a statistical description of relationships *as-is*, not the mechanisms behind those relationships.\n\n#### Relationship to causal inference\n\nHumans see patterns very well.\nPattern-finding is a helpful feature of our brains, but it can also lead down a slippery slope of inference when we're not working with data or methods that can allow us to do that validly.\nThe biggest thing to be cautious of when your goal is to describe is making the leap from description to causation (implicitly or explicitly).\n\nBut, of course, descriptive analysis is helpful when we *are* estimating causal effects.\nIt helps us understand the population we're working with, the distribution of the outcomes, exposures (variables we think might be causal), and confounders (variables we need to account for to get unbiased causal effects for the exposure).\nIt also helps us be sure that the data structure we're using matches the question we're trying to answer, as we'll see in [Chapter -@sec-data-causal].\nYou should always do descriptive analyses of your data when conducting causal research.\n\nFinally, as we'll see in [Chapter -@sec-trials-std], there are certain circumstances where we can make causal inferences with basic statistics.\nBe cautious about the distinction between the causal question and the descriptive component here, too: just because we're using the same calculation (e.g., a difference in means) doesn't mean that all descriptions you can generate are causal. \nWhether a descriptive analysis overlaps with a causal analysis is a function of the data and the question.\n\n### Prediction\n\nThe goal of prediction is to use data to make accurate predictions about variables, usually on new data.\nWhat this means depends on the question, domain, and so on.\nPrediction models are used in about every setting imaginable, from peer-reviewed clinical models to bespoke machine learning models embedded in consumer devices.\nEven large language models like the ones ChatGPT is based on are prediction models: they predict what a response to a prompt would look like.\n\nPredictive modeling generally uses a different workflow than the workflow for causal modeling we'll present in this book.\nSince the goal of prediction is usually related to making predictions on new data, the workflow of this type of modeling focuses on maximizing predictive accuracy while retaining generalization to new data, sometimes called the bias-variance trade-off.\nIn practice, this means splitting your data into training sets (the part of the data you build your model on) and test sets (the part you assess your model with, a proxy for how it would perform on new data).\nUsually, data scientists use cross-validation or other sampling techniques to reduce further the risk of overfitting your model to the training set.\n\nThere are many excellent texts on predictive modeling, and so we refer you to those for a deeper exploration of the goals and methods of these techniques [@kuhn2013a; @harrell2001; @Kuhn_Silge_2022; @James_Witten_Hastie_Tibshirani_2022].\n\n#### Examples\n\nPrediction is the most popular topic in data science, largely thanks to machine learning applications in industry.\nPrediction, of course, has a long history in statistics, and many models popular today have been used for decades in and outside academia.\n\nLet's look at an example of prediction about COVID-19 [^chapter-01-1].\nIn 2021, researchers published the ISARIC 4C Deterioration model, a clinical prognostic model for predicting severe adverse outcomes for acute COVID-19 [@Gupta2021].\nThe authors included a descriptive analysis to understand the population from which this model was developed, particularly the distribution of the outcome and candidate predictors.\nOne helpful aspect of this model is that it uses items commonly measured on day one of COVID-related hospitalization.\nThe authors built this model using cross-validation by region of the UK and then tested the model on data from a hold-out region.\nThe final model included eleven items and a description of their model attributes, relation with the outcome, and so on.\nNotably, the authors used clinical domain knowledge to select candidate variables but did not fall into the temptation of interpreting the model coefficients as causal.\nWithout question, some of the predictive value of this model stems from the causal structure of the variables as they relate to the outcome, but the researchers had a different goal entirely for this model and stuck to it.\n\n[^chapter-01-1]: A natural model here is predicting cases, but infectious disease modeling is complex and usually uses techniques outside the usual predictive modeling workflow.\n\nHere are other good examples from the predictive space:\n\n- Some of the most exciting work in predictive modeling is in industry.\n Netflix regularly shares details on their modeling success and novel strategies in their [research blog](https://research.netflix.com/).\n They also recently published a paper reviewing their use of deep learning models for recommender systems (in this case, recommending shows and movies to users) [@steck2021].\n The authors explain their experimentation with models, the details of those models, and many of the challenges they faced, resulting in a practical guide on using such models.\n\n- In early 2020, researchers experienced with predictive and prognostic modeling in health research published a review of models for diagnosis and prognosis of COVID-19 [@Wynants2020].\n This review is interesting not just for its breadth but also the astounding number of models that were rated as poor quality: \"\\[232\\] models were rated at high or unclear risk of bias, mostly because of non-representative selection of control patients, exclusion of patients who had not experienced the event of interest by the end of the study, high risk of model overfitting, and unclear reporting.\" This research is also a [living review](https://www.covprecise.org/).\n\n#### Validity\n\nThe key measure of validity in prediction modeling is predictive accuracy, which can be measured in several ways, such as root mean squared error (RMSE), mean absolute error (MAE), area under the curve (AUC), and many more.\nA convenient detail about predictive modeling is that we can often assess if we're right, which is not true of descriptive statistics for which we only have a subset of data or causal inference for which we don't know the true causal structure.\nWe aren't always able to assess against the truth, but it's almost always required for fitting the initial predictive model [^chapter-01-2].\n\n[^chapter-01-2]: We say model singular, but usually data scientists fit many models for experimentation, and often the best prediction models are some combination of predictions from several models, called a stacked model\n\nMeasurement error is also a concern for predictive modeling because we usually need accurate data for accurate predictions.\nInterestingly, measurement error and missingness can be informative in predictive settings.\nIn a causal setting, this might induce bias, but predictive models can consume that information with no issue.\nFor instance, in the famous Netflix Prize, winning models leveraged information about whether or not a customer rated a movie at all to improve recommendation systems.\n\nLike descriptive analysis, confounding is undefined for predictive modeling.\nA coefficient in a prediction model cannot be confounded; we only care about whether or not the variable provides predictive information, not if that information is because of a causal relationship or something else.\n\n#### Relationship to causal inference\n\nThe single biggest risk in prediction is to assume that a given coefficient in a model has a causal interpretation.\nThere is a good chance that this isn't so.\nA model may predict well but may also have completely biased coefficients from a causal point of view.\nWe'll see more about this in @sec-pred-or-explain and the rest of the book.\n\nOften, people mistakenly use methods for selecting features (variables) for prediction models to select confounders in causal models.\nAside from their risk of overfitting, these methods are appropriate for prediction models but not for causal models.\nPrediction metrics cannot determine the causal structure of your question, and predictive value for the outcome does not make a variable a confounder.\nIn general, background knowledge (not prediction or associational statistics) should help you select variables for causal models @robinsImpossible; we'll discuss this process in detail in [Chapter -@sec-dags] and [Chapter -@sec-building-models].\n\nPrediction is nevertheless crucial to causal inference.\nFrom a philosophical perspective, we're comparing predictions from different *what if* scenarios: What would the outcome had one thing happened vs. if another thing happened?\nWe'll spend much time on this subject, particularly in [Chapter -@sec-counterfactuals].\nWe'll also talk a lot about prediction from a practical perspective: just like in prediction and some description, we'll use modeling techniques to answer causal questions.\nTechniques like propensity score methods and g-computation use model predictions to answer causal questions, but the workflow for building and interpreting the models themselves are quite different.\n\n### Causal Inference\n\nThe goal of causal inference is to understand the impact that a variable, sometimes called an exposure, has on another variable, sometimes called an outcome.\n\"Exposure\" and \"outcome\" are the terms we'll use in this book to describe the causal relationship we're interested in.\nImportantly, our goal is to answer this question clearly and precisely.\nIn practice, this means using techniques like study design (e.g., a randomized trial) or statistical methods (like propensity scores) to calculate an unbiased effect of the exposure on the outcome.\n\nAs with prediction and description, it's better to start with a clear, precise question to get a clear, precise answer.\nIn statistics and data science, particularly as we swim through the ocean of data of the modern world, we often end up with an answer without a question (e.g., `42`).\nThis, of course, makes interpretation of the answer difficult.\nIn @sec-diag, we'll discuss the structure of causal questions.\nWe'll discuss philosophical and practical ways to sharpen our questions in [Chapter -@sec-counterfactuals] and [Chapter -@sec-trials-std].\n\n::: callout-note\n## Causal inference and explanation\n\nSome authors use the phrases \"causal inference\" and \"explanation\" interchangeably.\nWe're a little more cautious about that.\nCausal inference always has a relationship to explanation, but we can accurately estimate the effect of one thing on another without understanding how it happens.\n\nConsider John Snow, the so-called father of epidemiology.\nIn 1854, Snow famously investigated a cholera outbreak in London and identified that specific water sources were to blame for the disease.\nHe was right: contaminated water was a mechanism for cholera transmission.\nYet, he didn't have enough information to explain how: *Vibrio cholerae*, the bacteria responsible for cholera, wasn't identified until nearly thirty years later.\n:::\n\n#### Examples\n\nWe'll see many examples of causal inference in this book, but let's continue with an example related to COVID-19.\nAs the pandemic continued and tools like vaccines and anti-viral treatments became available, policies like universal masking also began to change.\nIn February 2022, the US state of Massachusetts rescinded a statewide policy that required universal masking in public schools [@Cowger2022].\nIn the greater Boston area, some school districts continued the policy while others discontinued it; those that discontinued it also did so at different times over the following weeks after the policy change.\nThis difference in policy allowed researchers to take advantage of the differences in district policies over this period to study the impact of universal masking on COVID-19 cases.\nThe researchers included a descriptive analysis of the school districts to understand the distribution of factors related to COVID-19 and other determinants of health.\nTo estimate the effect of universal masking on cases, the authors used a technique common in policy-related causal inference called difference-in-differences to estimate this effect.\nTheir design alleviates some problematic assumptions needed for causal inference, but they also wisely controlled for potential confounders despite that advantage.\nThe authors found that districts that continued masking saw a drastically lower caseload than those that didn't; their analysis concluded that almost 12,000 additional cases occurred due to the policy change, nearly 30% of the cases in the districts during the 15 weeks of the study.\n\n\n\nHere are a few other interesting examples:\n\n- Netflix regularly uses causal inference in their work.\n In 2022, they published a [blog post summarizing some causal tasks](https://netflixtechblog.com/a-survey-of-causal-inference-applications-at-netflix-b62d25175e6f) they have engaged with.\n One interesting example is localization.\n Netflix, being worldwide, localizes content through subtitles and dubbing.\n Randomized experiments were a bad idea because they meant withholding content from users, so researchers at Netflix used several approaches to understand the value of localization while addressing potential confounding.\n One example is studying the impact of pandemic-related delays in dubbing.\n Researchers used synthetic controls to simulate the impact on viewership with and without these delays.\n Presumably, the timing of the pandemic-related delays was unrelated to many factors that would typically be related to dubbing processes, thus reducing some of the potential confounding.\n\n- The Tuskegee Study is one of modern history's most infamous examples of medical abuse.\n It is commonly pointed to as a source of distrust in the medical community from Black Americans.\n Health economics researchers used a variation of difference-in-difference techniques to assess the effect of the Tuskegee Study on distrust and life expectancy in older Black men [@Alsan2018].\n The results are important and disturbing: \"We find that the disclosure of the study in 1972 is correlated with increases in medical mistrust and mortality and decreases in both outpatient and inpatient physician interactions for older black men. Our estimates imply life expectancy at age 45 for black men fell by up to 1.5 years in response to the disclosure, accounting for approximately 35% of the 1980 life expectancy gap between black and white men and 25% of the gap between black men and women.\"\n\n#### Validity\n\nMaking valid causal inferences requires several assumptions that we'll discuss in @sec-assump.\nUnlike prediction, we generally cannot confirm that our causal models are correct.\nIn other words, most assumptions we need to make are unverifiable.\nWe'll come back to this topic time and time again in the book---from the basics of these assumptions to practical decision-making to probing our models for problems.\n\n### Why isn't the right causal model just the best prediction model? {#sec-pred-or-explain}\n\nAt this point, you may wonder why the right causal model isn't just the best prediction model.\nIt makes sense that the two would be related: naturally, things that cause other things would be predictors.\nIt's causality all the way down, so any predictive information *is* related, in some capacity, to the causal structure of the thing we're predicting. \nDoesn't it stand to reason that a model that predicts well is causal, too?\nIt's true that *some* predictive models can be great causal models and vice versa.\nUnfortunately, this is not always the case; causal effects needn't predict particularly well, and good predictors needn't be causally unbiased [@shmueli2010a]. \nThere is no way to know using data alone.\n\nLet's look at the causal perspective first because it's a bit simpler. \nConsider a causally unbiased model for an exposure but only includes variables related to the outcome *and* the exposure. \nIn other words, this model provides us with the correct answer for the exposure of interest but doesn't include other predictors of the outcome (which can sometimes be a good idea, as discussed in @sec-data-causal).\nIf an outcome has many causes, a model that accurately describes the relationship with the exposure likely won't predict the outcome very well.\nLikewise, if a true causal effect of the exposure on the outcome is small, it will bring little predictive value.\nIn other words, the predictive ability of a model, whether high or small, can't help us distinguish if the model is giving us the correct answer.\nOf course, low predictive power might also indicate that a causal effect isn't much use from an applied perspective, although that depends on several statistical factors.\n\nThere are two more complex reasons that predictive models won't always be unbiased causal models.\nFor the first reason, let's consider an accurate model from a causal perspective: it estimates effects on an outcome, and all of these effects are unbiased.\nEven in this ideal setting, you might get better predictions using a different model.\nThe reason has to do with the bias-variance trade-off in predictive modeling.\nWhen effects are small, data are noisy, predictors are highly correlated, or there's not much data, using a biased model like penalized regression might make sense.\nThese models intentionally introduce bias in favor of improved variance in out-of-data predictions.\nSince the goals of prediction and causal inference are different (accurate predictions, usually for out-of-data observations vs. an unbiased effect), the best model for inference is not necessarily the best prediction model.\n\n\n\nSecondly, variables that are biased from a causal perspective often bring along with them predictive power.\nWe'll discuss which variables to include and *not* include in your models in @sec-dags, but let's consider a simple example.\nOne of the famous examples of confounded relationships is ice cream sales and crime in the summer.\n[Descriptively, ice cream sales and crime are related](https://slate.com/news-and-politics/2013/07/warm-weather-homicide-rates-when-ice-cream-sales-rise-homicides-rise-coincidence.html), but this relationship is confounded by weather, e.g., both ice cream sales and crime increases when it's warmer.\n(This is simplistic, of course, as weather itself doesn't cause crime, but it's on the causal pathway.)\n\nConsider a thought experiment where you are in a dark room.\nYour goal is to predict crime, but you don't know the weather or time of year.\nYou do, however, have information on ice cream sales.\nA model with ice cream sales on crime would be biased from a causal perspective---ice cream sales do not cause crime, even though the model would show an effect---but would provide some predictive value to your crime model.\nThe reason for both of these conditions is the same: weather and ice cream sales are correlated, and so are weather and crime.\nIce cream sales can successfully, if imperfectly, serve as a proxy for weather.\nThat results in a biased effect estimate of the causal impact of ice cream sales on crime but a partially effective prediction of crime.\nOther variables, too, which are invalid from a causal perspective, either by being biased themselves or by introducing bias into the causal effect estimate, often bring good predictive value.\nThus, predictive accuracy is not a good measure of causality.\n\nA closely related idea is the *Table Two Fallacy*, so-called because, in health research papers, descriptive analyses are often presented in Table 1, and regression models are often presented in Table 2 [@Westreich2013].\nThe Table Two Fallacy is when a researcher presents confounders and other non-effect variables, particularly when they interpret those coefficients as if they, too, were causal.\nThe problem is that in some situations, the model to estimate an unbiased effect of one variable may not be the same model to estimate an unbiased effect of another variable.\nIn other words, we can't interpret the effects of confounders as causal because they might *themselves* be confounded by another variable unrelated to the original exposure.\n\nDescriptive, predictive, and causal analyses will always contain some aspect of each other.\nA predictive model gains some of its predictive power from the causal structure of the outcome, and a causal model has some predictive power because it contains information about the outcome.\nHowever, the same model in the same data with different goals will have different usefulness depending on those goals.\n\n\n## Diagraming a causal claim {#sec-diag}\n\nEach analysis task, whether descriptive, predictive, or inferential, should start with a clear, precise question. \nLet's diagram them to understand better the structure of causal questions (to which we'll return our focus). \nDiagramming sentences is a grammatical method used to visually represent the structure of a sentence, occasionally taught in grammar school.\nIn this technique, sentences are deconstructed into their constituent parts, such as subjects, verbs, objects, and modifiers, and then displayed using a series of lines and symbols.\nThe arrangement of these elements on the diagram reflects their syntactic roles and how they interact within the sentence's overall structure.\nBy breaking down sentences into these visual representations, diagramming can help learners grasp the nuances of sentence construction, identify grammatical errors, and appreciate the intricate connections between words.\nWe can apply a similar idea to *causal claims*.\nHere is an example of how one might diagram a causal claim.\nWe've pulled out the *cause*, the *effect*, the *subject* (for whom?), and the *timing* (when?).\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Example of diagraming a causal claim.](../images/sentence-diagram-1.png){#fig-diagram-1 width=2219}\n:::\n:::\n\n\nLet's start with a basic causal question: **Does smoking cause lung cancer?**\n\nThe causal claim here could be that *smoking causes lung cancer*.\n@fig-diagram-2 shows a potential diagram of this causal claim.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Diagram of the causal claim \"smoking causes lung cancer\".](../images/sentence-diagram-2.png){#fig-diagram-2 width=2219}\n:::\n:::\n\n\nLet's get more specific.\nA study was published in *JAMA* (the Journal of the American Medical Association) in 2005 titled \"Effect of Smoking Reduction on Lung Cancer Risk.\"\nThis study concluded: \"Among individuals who smoke 15 or more cigarettes per day, smoking reduction by 50% significantly reduces the risk of lung cancer\".\n[@godtfredsen2005effect] The study describes the time frame studied as 5-10 years.\nLet's diagram this causal claim.\nHere, we assume that the eligibility criteria and the target population for the estimated causal effect are the same (individuals who smoke 15 or more cigarettes per day); this need not always be the case.\nIn @sec-estimands, we will discuss other potential target populations.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Example diagram of a more specific causal claim based on results from @godtfredsen2005effect.](../images/sentence-diagram-3.png){#fig-diagram-3 width=2222}\n:::\n:::\n\n\nTranslating this idea into asking good causal questions, we can map the following terms that you will see throughout this book to these diagrams: *exposure* (the cause), *outcome* (the effect), *eligibility criteria* (for whom?), *time zero* (when did the participants begin to be followed?), *target population*, (who can we estimate an outcome effect for?) and *follow-up period* (when?).\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Example diagram mapped to causal analysis terminology](../images/sentence-diagram-4.png){#fig-diagram-4 width=2219}\n:::\n:::\n\n\nAsking good causal questions means we map the *question* to the observable *evidence*.\nLet's return to the smoking example.\nOur initial question was: *Does smoking cause lung cancer?*; The evidence in the study shows: *For people who smoke 15+ cigarettes a day, reducing smoking by 50% reduces the risk of lung cancer over 5-10 years*.\nDoes the answer match the question?\nNot quite.\nLet's update our question to match what the study actually showed: *For people who smoke 15+ cigarettes a day, does reducing smoking by 50% reduce the lung cancer risk over 5-10 years?*\nHoning this skill — asking answerable causal questions — is essential and one we will discuss throughout this book.\n", + "supporting": [ + "01-casual-to-causal_files" + ], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/chapters/01-casual-to-causal/figure-html/fig-ft-chart-1.png b/_freeze/chapters/01-casual-to-causal/figure-html/fig-ft-chart-1.png new file mode 100644 index 0000000..d3c45d2 Binary files /dev/null and b/_freeze/chapters/01-casual-to-causal/figure-html/fig-ft-chart-1.png differ diff --git a/_freeze/chapters/01-casual-to-causal/figure-html/fig-word-ranking-1.png b/_freeze/chapters/01-casual-to-causal/figure-html/fig-word-ranking-1.png new file mode 100644 index 0000000..a5da5da Binary files /dev/null and b/_freeze/chapters/01-casual-to-causal/figure-html/fig-word-ranking-1.png differ diff --git a/_freeze/chapters/11-estimands/execute-results/html.json b/_freeze/chapters/11-estimands/execute-results/html.json index 97da39b..b071c51 100644 --- a/_freeze/chapters/11-estimands/execute-results/html.json +++ b/_freeze/chapters/11-estimands/execute-results/html.json @@ -1,7 +1,7 @@ { - "hash": "057101694c7f228678755b6bfd2ac9ae", + "hash": "e70c4ba35279098afed83200894c75d3", "result": { - "markdown": "# Causal estimands {#sec-estimands}\n\n\n\n\n\n## Estimands, Estimators, Estimates\n\nWhen analyzing to make causal inferences, we need to keep the causal question that we're interested in close to our chest.\nThe causal question helps guide not just the assumptions we need to make but also how we will go about answering the question.\nThis chapter deals with three ideas closely related to answering causal questions: the estimand, the estimator, and the estimate.\nThe **estimand** is the *target of interest*, the **estimator** is the method by which we approximate this estimand using data, and the **estimate** is the value we get when we plug our data into the estimator.\nYou can think of the **estimand** as a glossy picture of a beautiful cake we are trying to bake, the **estimator** as the recipe, and the **estimate** as the cake we pull out of our oven.\n\nSo far, we have been estimating the average treatment effect, the effect of the exposure of interest averaged across the whole population.\nThe **estimand** here is the expected value of the difference in potential outcomes across all individuals:\n\n$$E[Y(1) - Y(0)]$$\n\nThe **estimator** we use depends on the method we've chosen.\nFor example, in an A/B test or randomized controlled trial, our estimator could just be the average outcome among those who received the exposure A minus the average outcome among those who receive exposure B.\n\n$$\\sum_{i=1}^{N}\\frac{Y_i\\times X_i}{N_{\\textrm{A}}} - \\frac{Y_i\\times (1-X_i)}{N_{\\textrm{B}}}$$\n\nWhere $X$ is just an indicator for exposure A ($X = 1$ for exposure A and $X = 0$ for exposure B), $N_A$ is the total number in group A and $N_B$ is the total number in group B such that $N_A + N_B = N$.\nLet's motivate this example a bit more.\nSuppose we have two different designs for a call-to-action (often abbreviated as CTA, a written directive for a marketing campaign) and want to assess whether they lead to a different average number of items a user purchases.\nWe can create an A/B test where we randomly assign a subset of users to see design A or design B.\nSuppose we have A/B testing data for 100 participants in a dataset called `ab`.\nHere is some code to simulate such a data set.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nset.seed(928)\nab <- tibble(\n # generate the exposure, x, from a binomial distribution\n # with the probability exposure A of 0.5\n x = rbinom(100, 1, 0.5),\n # generate the \"true\" average treatment effect of 1\n # we use rnorm(100) to add the random error term that is normally\n # distributed with a mean of 0 and a standard deviation of 1\n y = x + rnorm(100)\n)\n```\n:::\n\n\nHere the exposure is `x`, and the outcome is `y`.\nThe true average treatment effect is 1, implying that on average seeing call-to-action A increases the number of items purchased by 1.\n\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nab\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 100 × 2\n x y\n \n 1 0 -0.280\n 2 1 1.87 \n 3 1 1.36 \n 4 0 -0.208\n 5 0 1.13 \n 6 1 0.352\n 7 1 -0.148\n 8 1 2.08 \n 9 0 2.09 \n10 0 -1.41 \n# ℹ 90 more rows\n```\n\n\n:::\n:::\n\n\nBelow are two ways to estimate this in R.\nUsing a formula approach, we calculate the difference in `y` between exposure values in the first example.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nab |>\n summarise(\n n_a = sum(x),\n n_b = sum(1 - x),\n estimate = sum(\n (y * x) / n_a -\n y * (1 - x) / n_b\n )\n )\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 1 × 3\n n_a n_b estimate\n \n1 54 46 1.15\n```\n\n\n:::\n:::\n\n\nAlternatively, we can `group_by()` `x` and `summarize()` the means of `y` for each group, then pivot the results to take their difference.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nab |>\n group_by(x) |>\n summarise(avg_y = mean(y)) |>\n pivot_wider(\n names_from = x,\n values_from = avg_y,\n names_prefix = \"x_\"\n ) |>\n summarise(estimate = x_1 - x_0)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 1 × 1\n estimate\n \n1 1.15\n```\n\n\n:::\n:::\n\n\nBecause $X$, the exposure, was randomly assigned, this estimator results in an unbiased estimate of our estimand of interest.\n\nWhat do we mean by unbiased, though?\nNotice that while the \"true\" causal effect is 1, this estimate is not precisely 1; it is 1.149.\nWhy the difference?\nThis A/B test included a sample of 100 participants, not the whole population.\nAs that sample increases, our estimate will get closer to the truth.\nLet's try that.\nLet's simulate the data again but increase the sample size from 100 to 100,000:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(928)\nab <- tibble(\n x = rbinom(100000, 1, 0.5),\n y = x + rnorm(100000)\n)\n\nab |>\n summarise(\n n_a = sum(x),\n n_b = sum(1 - x),\n estimate = sum(\n (y * x) / n_a -\n y * (1 - x) / n_b\n )\n )\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 1 × 3\n n_a n_b estimate\n \n1 49918 50082 1.01\n```\n\n\n:::\n:::\n\n\nNotice how the estimate is 1.01, much closer to the actual average treatment effect, 1.\n\nSuppose $X$ had not been randomly assigned.\nIn that case, we could use the pre-exposure covariates to estimate the conditional probability of $X$ (the propensity score) and incorporate this probability in a *weight* $(w_i)$ to estimate this causal effect.\nFor example, we could use the following *estimator* to *estimate* our average treatment effect *estimand*:\n\n$$\\frac{\\sum_{i=1}^NY_i\\times X_i\\times w_i}{\\sum_{i=1}^NX_i\\times w_i}-\\frac{\\sum_{i=1}^NY_i\\times(1-X_i)\\times w_i}{\\sum_{i=1}^N(1-X_i)\\times w_i}$$\n\n## Estimating treatment effects with specific targets in mind\n\nDepending on the study's goal, or the causal question, we may want to estimate different *estimands*.\nHere, we will outline the most common causal estimands, their target populations, the causal questions they may help answer, and the methods used to estimate them [@greifer2021choosing].\n\n### Average treatment effect\n\nA common estimand is the average treatment effect (ATE).\nThe target population is the *total sample* or population of interest.\nThe **estimand** here is the expected value of the difference in potential outcomes across all individuals:\n\n$$E[Y(1) - Y(0)]$$\n\nAn example research question is \"Should a policy be applied to all eligible patients?\" [@greifer2021choosing].\n\nMost randomized controlled trials are designed with the ATE as the target estimand.\nIn a non-randomized setting, you can estimate the ATE using full matching.\nEach observation in the exposed group is matched to at least one observation in the control group (and vice versa) without replacement.\nAlternatively, the following inverse probability weight will allow you to estimate the ATE using propensity score weighting.\n\n$$w_{ATE} = \\frac{X}{p} + \\frac{(1 - X)}{1 - p}$$\n\nIn other words, the weight is one over the propensity score for those in the exposure group and one over one minus the propensity score for the control group.\nIntuitively, this weight equates to the inverse probability of being in the exposure group in which you actually were.\n\nLet's dig deeper into this causal estimand using the Touring Plans data.\nRecall that in @sec-using-ps, we examined the relationship between whether there were \"Extra Magic Hours\" in the morning and the average wait time for the Seven Dwarfs Mine Train the same day between 9 am and 10 am.\nLet's reconstruct our data set, `seven_dwarfs` and fit the same propensity score model specified previously.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(broom)\nlibrary(touringplans)\n\nseven_dwarfs <- seven_dwarfs_train_2018 |>\n filter(wait_hour == 9) |>\n mutate(park_extra_magic_morning = factor(\n park_extra_magic_morning,\n labels = c(\"No Magic Hours\", \"Extra Magic Hours\")\n ))\n\nseven_dwarfs_with_ps <- glm(\n park_extra_magic_morning ~ park_ticket_season + park_close + park_temperature_high,\n data = seven_dwarfs,\n family = binomial()\n) |>\n augment(type.predict = \"response\", data = seven_dwarfs)\n```\n:::\n\n\nLet's examine a table of the variables of interest in this data frame.\nTo do so, we are going to use the `tbl_summary()` function from the `{gtsummary}` package.\n(We'll also use the `{labelled}` package to clean up the variable names for the table.)\n\n\n::: {#tbl-unweighted-gtsummary .cell tbl-cap='A descriptive table of Extra Magic Morning in the touringplans dataset. This table shows the distributions of these variables in the observed population.'}\n\n```{.r .cell-code}\nlibrary(gtsummary)\nlibrary(labelled)\nseven_dwarfs_with_ps <- seven_dwarfs_with_ps |>\n set_variable_labels(\n park_ticket_season = \"Ticket Season\",\n park_close = \"Close Time\",\n park_temperature_high = \"Historic High Temperature\"\n )\n\ntbl_summary(\n seven_dwarfs_with_ps,\n by = park_extra_magic_morning,\n include = c(park_ticket_season, park_close, park_temperature_high)\n) |>\n # add an overall column to the table\n add_overall(last = TRUE)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n \n \n \n \n \n \n
CharacteristicNo Magic Hours, N = 2941Extra Magic Hours, N = 601Overall, N = 3541
Ticket Season


    peak60 (20%)18 (30%)78 (22%)
    regular158 (54%)35 (58%)193 (55%)
    value76 (26%)7 (12%)83 (23%)
Close Time


    16:30:001 (0.3%)0 (0%)1 (0.3%)
    18:00:0037 (13%)18 (30%)55 (16%)
    20:00:0018 (6.1%)2 (3.3%)20 (5.6%)
    21:00:0028 (9.5%)0 (0%)28 (7.9%)
    22:00:0091 (31%)11 (18%)102 (29%)
    23:00:0078 (27%)11 (18%)89 (25%)
    24:00:0040 (14%)17 (28%)57 (16%)
    25:00:001 (0.3%)1 (1.7%)2 (0.6%)
Historic High Temperature84 (78, 89)83 (76, 87)84 (78, 89)
1 n (%); Median (IQR)
\n
\n```\n\n:::\n:::\n\n\nFrom @tbl-unweighted-gtsummary, 294 days did not have extra magic hours in the morning, and 60 did.\nWe also see that 30% of the extra magic mornings were during peak season compared to 20% of the non-extra magic mornings.\nAdditionally, days with extra magic mornings were more likely to close at 6 pm (18:00:00) compared to non-magic hour mornings.\nThe median high temperature on days with extra magic hour mornings is slightly lower (83 degrees) compared to non-extra magic hour morning days.\n\nNow let's construct our propensity score weight to estimate the ATE.\nThe `{propensity}` package has helper functions to estimate weights.\nThe `wt_ate()` function will calculate ATE weights.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(propensity)\nseven_dwarfs_wts <- seven_dwarfs_with_ps |>\n mutate(w_ate = wt_ate(.fitted, park_extra_magic_morning))\n```\n:::\n\n\nLet's look at a distribution of these weights.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(seven_dwarfs_wts, aes(x = w_ate)) +\n geom_histogram(bins = 50)\n```\n\n::: {.cell-output-display}\n![A histogram of the average treatment effect (ATE) weights for whether or not a day had extra magic hours. ATE weights can range from 1 to infinity, so paying attention to the actual range of weights is important.](11-estimands_files/figure-html/fig-sd-ate-hist-1.png){#fig-sd-ate-hist width=672}\n:::\n:::\n\n\nMany weights in @fig-sd-ate-hist are close to 1 (the smallest possible value for an ATE weight), with a long tail leading to the largest weight (around 24).\nThis distribution highlights a potential problem with ATE weights.\nThey range from 1 to infinity; if your weights are too large in small samples, this can lead to bias in your estimates (known as finite sample bias).\n\n::: {.callout-warning}\n## Finite sample bias\n\nWe know that not accounting for confounders can bias our causal estimates, but it turns out that even after accounting for all confounders, we may still get a biased estimate in finite samples.\nMany of the properties we tout in statistics rely on *large samples*---how \"large\" is defined can be opaque.\nLet's look at a quick simulation.\nHere, we have an exposure, $X$, an outcome, $Y$, and one confounder, $Z$.\nWe will simulate $Y$, which is only dependent on $Z$ (so the true treatment effect is 0), and $X$, which also depends on $Z$.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(928)\nn <- 100\nfinite_sample <- tibble(\n # z is normally distributed with a mean: 0 and sd: 1\n z = rnorm(n),\n # x is defined from a probit selection model with normally distributed errors\n x = case_when(\n 0.5 + z + rnorm(n) > 0 ~ 1,\n TRUE ~ 0\n ),\n # y is continuous, dependent only on z with normally distrbuted errors\n y = z + rnorm(n)\n)\n```\n:::\n\n\nIf we fit a propensity score model using the one confounder $Z$ and calculate the weighted estimator, we should get an unbiased result (which in this case would be 0).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## fit the propensity score model\nfinite_sample_wts <- glm(\n x ~ z,\n data = finite_sample,\n family = binomial(\"probit\")\n) |>\n augment(newdata = finite_sample, type.predict = \"response\") |>\n mutate(wts = wt_ate(.fitted, x))\n\nfinite_sample_wts |>\n summarize(\n effect = sum(y * x * wts) / sum(x * wts) -\n sum(y * (1 - x) * wts) / sum((1 - x) * wts)\n )\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 1 × 1\n effect\n \n1 0.197\n```\n\n\n:::\n:::\n\n\nOk, here, our effect of $X$ is pretty far from 0, although it's hard to know if this is really bias, or something we are just seeing by chance in this particular simulated sample.\nTo explore the potential for finite sample bias, we can rerun this simulation many times at different sample sizes.\nLet's do that.\n\n\n::: {.cell hash='11-estimands_cache/html/sim-finite-sample-bias_fc62762d9112d2aa58d8f14393ce0072'}\n\n```{.r .cell-code}\nsim <- function(n) {\n ## create a simulated dataset\n finite_sample <- tibble(\n z = rnorm(n),\n x = case_when(\n 0.5 + z + rnorm(n) > 0 ~ 1,\n TRUE ~ 0\n ),\n y = z + rnorm(n)\n )\n finite_sample_wts <- glm(\n x ~ z,\n data = finite_sample,\n family = binomial(\"probit\")\n ) |>\n augment(newdata = finite_sample, type.predict = \"response\") |>\n mutate(wts = wt_ate(.fitted, x))\n bias <- finite_sample_wts |>\n summarize(\n effect = sum(y * x * wts) / sum(x * wts) -\n sum(y * (1 - x) * wts) / sum((1 - x) * wts)\n ) |>\n pull()\n tibble(\n n = n,\n bias = bias\n )\n}\n\n## Examine 5 different sample sizes, simulate each 1000 times\nset.seed(1)\nfinite_sample_sims <- map_df(\n rep(\n c(50, 100, 500, 1000, 5000, 10000),\n each = 1000\n ),\n sim\n)\n\nbias <- finite_sample_sims %>%\n group_by(n) %>%\n summarise(bias = mean(bias))\n```\n:::\n\n\nLet's plot our results:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(bias, aes(x = n, y = bias)) +\n geom_point() +\n geom_line() +\n geom_hline(yintercept = 0, lty = 2)\n```\n\n::: {.cell-output-display}\n![Finite sample bias present with ATE weights created using correctly specified propensity score model, varying the sample size from n = 50 to n = 10,000](11-estimands_files/figure-html/fig-finite-sample-bias-1.png){#fig-finite-sample-bias width=672}\n:::\n:::\n\n\nThis is an example of finite sample bias.\nNotice here, even when the sample size is quite large (5,000) we still see some bias away from the \"true\" effect of 0.\nIt isn't until a sample size larger than 10,000 that we see this bias disappear.\n\nEstimands that utilize weights that are unbounded (i.e. that theoretically can be infinitely large) are more likely to suffer from finite sample bias.\nThe likelihood of falling into finite sample bias depends on:\\\n(1) the estimand you have chosen (i.e. are the weights bounded?)\\\n(2) the distribution of the covariates in the exposed and unexposed groups (i.e. is there good overlap? Potential positivity violations, when there is poor overlap, are the regions where weights can become too large)\\\n(3) the sample size.\n:::\n\nThe `{gtsummary}` package allows the creation of *weighted* tables, which can help us build an intuition for the psuedopopulation created by the weights.\nFirst, we must load the `{survey}` package and create a survey design object.\nThe `{survey}` package supports calculating statistics and models using weights.\nHistorically, many researchers applied techniques incorporating weights to survey analyses.\nEven though we are not doing a survey analysis, the same techniques are helpful for our propensity score weights.\n\n\n::: {#tbl-weighted-gtsummary .cell tbl-cap='A descriptive table of Extra Magic Morning hours weighted by the Average Treatment Effect Weights. This table shows the distributions of these variables in the psuedopopulation created by these weights.'}\n\n```{.r .cell-code}\nlibrary(survey)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\nWarning: package 'Matrix' was built under R version\n4.2.3\n```\n\n\n:::\n\n```{.r .cell-code}\nseven_dwarfs_svy <- svydesign(\n ids = ~1,\n data = seven_dwarfs_wts,\n weights = ~w_ate\n)\ntbl_svysummary(\n seven_dwarfs_svy,\n by = park_extra_magic_morning,\n include = c(park_ticket_season, park_close, park_temperature_high)\n) |>\n add_overall(last = TRUE)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n \n \n \n \n \n \n
CharacteristicNo Magic Hours, N = 3541Extra Magic Hours, N = 3571Overall, N = 7111
Ticket Season


    peak78 (22%)81 (23%)160 (22%)
    regular193 (54%)187 (52%)380 (53%)
    value83 (23%)89 (25%)172 (24%)
Close Time


    16:30:002 (0.4%)0 (0%)2 (0.2%)
    18:00:0050 (14%)72 (20%)122 (17%)
    20:00:0020 (5.6%)19 (5.3%)39 (5.5%)
    21:00:0031 (8.7%)0 (0%)31 (4.3%)
    22:00:00108 (30%)86 (24%)193 (27%)
    23:00:0095 (27%)81 (23%)176 (25%)
    24:00:0048 (14%)94 (26%)142 (20%)
    25:00:001 (0.3%)6 (1.6%)7 (1.0%)
Historic High Temperature84 (78, 89)83 (78, 87)84 (78, 88)
1 n (%); Median (IQR)
\n
\n```\n\n:::\n:::\n\n\nNotice in @tbl-weighted-gtsummary that the variables are more **balanced** between the groups.\nFor example, looking at the variable `park_ticket_season`, in @tbl-unweighted-gtsummary we see that a higher percentage of the days with extra magic hours in the mornings were during the peak season (30%) compared to the percent of days without extra magic hours in the morning (23%).\nIn @tbl-weighted-gtsummary this is balanced, with a weighted percent of 22% peak days in the extra magic morning group and 22% in the no extra magic morning group.\nThe weights change the variables' distribution by weighting each observation so that the two groups appear balanced.\nAlso, notice that the distribution of variables in @tbl-weighted-gtsummary now matches the Overall column of @tbl-unweighted-gtsummary.\nThat is what ATE weights do: they make the distribution of variables included in the propensity score for each exposure group look like the overall distribution across the whole sample.\n\nLet's look at another visualization that can help us understand the weighted psuedopopulation created by the ATE weights: a mirrored histogram.\nWe will plot the distribution of the propensity score for the days in the exposure group (days with extra magic hours in the morning) on the top half of the plot; then, we will mirror the distribution of the propensity score for the days in the\"unexposed group below it. This visualization allows us to compare the two distributions quickly. Then we will overlay weighted histograms to demonstrate how these distributions differ after incorporating the weights. The {halfmoon} package includes a function `geom_mirror_histogram()` to assist with this visualization.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(halfmoon)\nggplot(seven_dwarfs_wts, aes(.fitted, group = park_extra_magic_morning)) +\n geom_mirror_histogram(bins = 50) +\n geom_mirror_histogram(\n aes(fill = park_extra_magic_morning, weight = w_ate),\n bins = 50,\n alpha = 0.5\n ) +\n scale_y_continuous(labels = abs) +\n labs(\n x = \"propensity score\",\n fill = \"Extra Magic Morning\"\n )\n```\n\n::: {.cell-output-display}\n![A mirrored histogram of the distribution of propensity scores between exposure groups. The dark bars represent the unweighted distribution, and the colored bars represent the distribution weighted by the average treatment effect (ATE) weight.](11-estimands_files/figure-html/fig-sd-mirror-hist-ate-1.png){#fig-sd-mirror-hist-ate width=672}\n:::\n:::\n\n\nWe can learn a few things from @fig-sd-mirror-hist-ate.\nFirst, we can see that there are more \"unexposed\" days---days without extra magic hours in the morning---compared to \"exposed\" in the observed population.\nWe can see this by examining the darker histograms, which show the distributions in the observed sample.\nSimilarly, the exposed days are upweighted more substantially, as evidenced by the light blue we see on the top.\nWe can also see that after weighting, the two distributions look similar; the shape of the blue weighted distribution on top looks comparable to the orange weighted distribution below.\n\n### Average treatment effect among the treated\n\nThe target population to estimate the average treatment effect among the treated (ATT) is the *exposed* (treated) population.\nThis causal estimand conditions on those in the exposed group:\n\n$$E[Y(1) - Y(0) | X = 1]$$\n\nExample research questions where the ATT is of interest could include \"Should we stop our marketing campaign to those currently receiving it?\" or \"Should medical providers stop recommending treatment for those currently receiving it?\" [@greifer2021choosing]\n\nThe ATT is a common target estimand when using matching; here, all exposed observations are included and \"matched\" to control observations, some of which may be discarded.\nAlternatively, the ATT can be estimated via weighting.\nThe ATT weight is estimated by:\n\n$$w_{ATT} = X + \\frac{(1 - X)p}{1 - p}$$\n\nLet's add the ATT weights to the `seven_dwarfs_wts` data frame and look at their distribution.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nseven_dwarfs_wts <- seven_dwarfs_wts |>\n mutate(w_att = wt_att(.fitted, park_extra_magic_morning))\n\nggplot(seven_dwarfs_wts, aes(w_att)) +\n geom_histogram(bins = 50)\n```\n\n::: {.cell-output-display}\n![A histogram of the average treatment effect among the treated (ATT) weights. The range of the ATT weights in this example is more stable than the ATE weights: the range is much smaller.](11-estimands_files/figure-html/fig-sd-att-hist-1.png){#fig-sd-att-hist width=672}\n:::\n:::\n\n\nCompared to the ATE weights, the weights in @fig-sd-att-hist look more *stable*; the distribution of the weights is not heavily skewed, and the range is small, from ground zero to a bit over one, with many at precisely one.\nThese weights are exactly one for all days in the exposed group.\nTheoretically, for unexposed days these weights can range from zero to infinity.\nStill, because we have many more unexposed days compared to exposed in this particular sample, the range is much smaller, meaning the vast majority of unexposed days are \"down-weighted\" to match the variable distribution of the exposed days.\nLet's take a look at the weighted table to see this a bit more clearly.\n\n\n::: {#tbl-weighted-att .cell tbl-cap='A descriptive table of Extra Magic Morning hours weighted by the Average Treatment Effect among the Treated Weights. This table shows the distributions of these variables in the psuedopopulation created by these weights.'}\n\n```{.r .cell-code}\nseven_dwarfs_svy <- svydesign(\n ids = ~1,\n data = seven_dwarfs_wts,\n weights = ~w_att\n)\ntbl_svysummary(\n seven_dwarfs_svy,\n by = park_extra_magic_morning,\n include = c(park_ticket_season, park_close, park_temperature_high)\n) |>\n add_overall(last = TRUE)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\nWarning: There were 5 warnings in `mutate()`.\nThe first warning was:\nℹ In argument: `df_stats = pmap(...)`.\nCaused by warning in `svymean.survey.design2()`:\n! Sample size greater than population size: are weights correctly scaled?\nℹ Run `dplyr::last_dplyr_warnings()` to see the 4\n remaining warnings.\nThere were 5 warnings in `mutate()`.\nThe first warning was:\nℹ In argument: `df_stats = pmap(...)`.\nCaused by warning in `svymean.survey.design2()`:\n! Sample size greater than population size: are weights correctly scaled?\nℹ Run `dplyr::last_dplyr_warnings()` to see the 4\n remaining warnings.\n```\n\n\n:::\n\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n \n \n \n \n \n \n
CharacteristicNo Magic Hours, N = 601Extra Magic Hours, N = 601Overall, N = 1201
Ticket Season


    peak18 (30%)18 (30%)36 (30%)
    regular35 (58%)35 (58%)70 (58%)
    value7 (12%)7 (12%)14 (12%)
Close Time


    16:30:001 (0.9%)0 (0%)1 (0.4%)
    18:00:0013 (21%)18 (30%)31 (26%)
    20:00:002 (3.3%)2 (3.3%)4 (3.3%)
    21:00:003 (4.6%)0 (0%)3 (2.3%)
    22:00:0017 (28%)11 (18%)28 (23%)
    23:00:0017 (28%)11 (18%)28 (23%)
    24:00:008 (14%)17 (28%)25 (21%)
    25:00:000 (0.3%)1 (1.7%)1 (1.0%)
Historic High Temperature83 (74, 88)83 (75, 87)83 (75, 88)
1 n (%); Median (IQR)
\n
\n```\n\n:::\n:::\n\n\nWe again achieve a balance between groups, but the target values in @tbl-weighted-att differ from the previous tables.\nRecall in our ATE weighted table (@tbl-weighted-gtsummary), when looking at the `wdw_ticket_season` variable, we saw the weighted percent in the peak season balanced close to 22%, the overall percent of peak days in the observed sample.\nWe again see balance, but the weighted percent of peak season days is 30% in the exposed and unexposed groups, reflecting the percent in the unweighted exposure group.\nIf you compare @tbl-weighted-att to our unweighted table (@tbl-unweighted-gtsummary), you might notice that the columns are most similar to the exposed column from the unweighted table.\nThis is because the \"target\" population is the exposed group, the days with extra magic hours in the morning, so we're trying to make the comparison group (no magic hours) as similar to that group as possible.\n\nWe can again create a mirrored histogram to observe the weighted psuedopopulation.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(seven_dwarfs_wts, aes(.fitted, group = park_extra_magic_morning)) +\n geom_mirror_histogram(bins = 50) +\n geom_mirror_histogram(\n aes(fill = park_extra_magic_morning, weight = w_att),\n bins = 50,\n alpha = 0.5\n ) +\n scale_y_continuous(labels = abs) +\n labs(\n x = \"propensity score\",\n fill = \"Extra Magic Morning\"\n )\n```\n\n::: {.cell-output-display}\n![A mirrored histogram of the distribution of propensity scores between exposure groups. The dark bars represent the unweighted distribution, and the colored bars represent the distribution weighted by the average treatment effect among the treated (ATT) weight.](11-estimands_files/figure-html/fig-sd-mirror-hist-att-1.png){#fig-sd-mirror-hist-att width=672}\n:::\n:::\n\n\nSince every day in the exposed group has a weight of 1, the distribution of the propensity scores (the dark bars above 0 on the graph) in @fig-sd-mirror-hist-att overlaps exactly with the weighted distribution overlaid in blue.\nIn the unexposed population, we see that almost all observations are *down weighted*; the dark orange distribution is smaller than the distribution of the propensity scores and more closely matches the exposed distribution above it.\nThe ATT is easier to estimate when there are many more observations in the unexposed group compared to the exposed one.\n\n### Average treatment effect among the unexposed\n\nThe target population to estimate the average treatment effect among the unexposed (control) (ATU / ATC) is the *unexposed* (control) population.\nThis causal estimand conditions on those in the unexposed group.\n\n$$E[Y(1) - Y(0) | X = 0]$$\n\nExample research questions where the ATU is of interest could include \"Should we send our marketing campaign to those not currently receiving it?\" or \"Should medical providers begin recommending treatment for those not currently receiving it?\" [@greifer2021choosing]\n\nThe ATU can be estimated via matching; here, all unexposed observations are included and \"matched\" to exposed observations, some of which may be discarded.\nAlternatively, the ATU can be estimated via weighting.\nThe ATU weight is estimated by:\n\n$$w_{ATU} = \\frac{X(1-p)}{p}+ (1-X)$$ Let's add the ATU weights to the `seven_dwarfs_wts` data frame and look at their distribution.\n\nWhen fitting an outcome model on matched data sets, we can simply subset the original data to only those who were matched and then fit a model on these data as we would otherwise. For example, re-performing the matching as we did in @sec-using-ps, we can extract the matched observations in a dataset called `matched_data` as follows.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(broom)\nlibrary(touringplans)\nlibrary(MatchIt)\n\nseven_dwarfs_9 <- seven_dwarfs_train_2018 |>\n filter(wait_hour == 9)\n\nm <- matchit(\n park_extra_magic_morning ~ park_ticket_season + park_close + park_temperature_high,\n data = seven_dwarfs_9\n)\nmatched_data <- get_matches(m)\n```\n:::\n\n\nWe can then fit the outcome model on this data. For this analysis, we are interested in the impact of extra magic morning hours on the average posted wait time between 9 and 10am. The linear model below will estimate this in the matched cohort.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlm(wait_minutes_posted_avg ~ park_extra_magic_morning, data = matched_data) |>\n tidy(conf.int = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 7\n term estimate std.error statistic p.value conf.low\n \n1 (Inte… 67.0 2.37 28.3 1.94e-54 62.3 \n2 park_… 7.87 3.35 2.35 2.04e- 2 1.24\n# ℹ 1 more variable: conf.high \n```\n\n\n:::\n:::\n\n\nRecall that by default `{MatchIt}` estimates an average treatment effect among the treated. This means among days that have extra magic hours, the expected impact of having extra magic hours on the average posted wait time between 9 and 10am is 7.9 minutes (95% CI: 1.2-14.5). \n\n## Using weights in outcome models\n\nNow let's use propensity score weights to estimate this same estimand. We will use the ATT weights so the analysis matches that for matching above.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(propensity)\n\nseven_dwarfs_9_with_ps <-\n glm(\n park_extra_magic_morning ~ park_ticket_season + park_close + park_temperature_high,\n data = seven_dwarfs_9,\n family = binomial()\n ) |>\n augment(type.predict = \"response\", data = seven_dwarfs_9)\nseven_dwarfs_9_with_wt <- seven_dwarfs_9_with_ps |>\n mutate(w_att = wt_att(.fitted, park_extra_magic_morning))\n```\n:::\n\n\nWe can fit a *weighted* outcome model by using the `weights` argument. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nlm(\n wait_minutes_posted_avg ~ park_extra_magic_morning,\n data = seven_dwarfs_9_with_wt,\n weights = w_att\n) |>\n tidy()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 5\n term estimate std.error statistic p.value\n \n1 (Intercept) 68.7 1.45 47.3 1.69e-154\n2 park_extra_ma… 6.23 2.05 3.03 2.62e- 3\n```\n\n\n:::\n:::\n\n\nUsing weighting, we estimate that among days that have extra magic hours, the expected impact of having extra magic hours on the average posted wait time between 9 and 10am is 6.2 minutes. \nWhile this approach will get us the desired estimate for the point estimate, the default output using the `lm` function for the uncertainty (the standard errors and confidence intervals) are not correct.\n\n\n## Estimating uncertainty\n\nThere are three ways to estimate the uncertainty:\n\n1. A bootstrap\n2. A sandwich estimator that only takes into account the outcome model\n3. A sandwich estimator that takes into account the propensity score model\n\nThe first option can be computationally intensive, but should get you the correct estimates.\nThe second option is computationally the easiest, but tends to overestimate the variability. There are not many current solutions in R (aside from coding it up yourself) for the third; however, the `{PSW}` package will do this. \n\n### The bootstrap\n\n1. Create a function to run your analysis once on a sample of your data\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfit_ipw <- function(split, ...) {\n .df <- analysis(split)\n\n # fit propensity score model\n propensity_model <- glm(\n park_extra_magic_morning ~ park_ticket_season + park_close + park_temperature_high,\n data = seven_dwarfs_9,\n family = binomial()\n )\n\n # calculate inverse probability weights\n .df <- propensity_model |>\n augment(type.predict = \"response\", data = .df) |>\n mutate(wts = wt_att(\n .fitted,\n park_extra_magic_morning,\n exposure_type = \"binary\"\n ))\n\n # fit correctly bootstrapped ipw model\n lm(\n wait_minutes_posted_avg ~ park_extra_magic_morning,\n data = .df,\n weights = wts\n ) |>\n tidy()\n}\n```\n:::\n\n\n\n2. Use {rsample} to bootstrap our causal effect\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(rsample)\n\n# fit ipw model to bootstrapped samples\nbootstrapped_seven_dwarfs <- bootstraps(\n seven_dwarfs_9,\n times = 1000,\n apparent = TRUE\n)\n\nipw_results <- bootstrapped_seven_dwarfs |>\n mutate(boot_fits = map(splits, fit_ipw))\n\nipw_results\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# Bootstrap sampling with apparent sample \n# A tibble: 1,001 × 3\n splits id boot_fits \n \n 1 Bootstrap0001 \n 2 Bootstrap0002 \n 3 Bootstrap0003 \n 4 Bootstrap0004 \n 5 Bootstrap0005 \n 6 Bootstrap0006 \n 7 Bootstrap0007 \n 8 Bootstrap0008 \n 9 Bootstrap0009 \n10 Bootstrap0010 \n# ℹ 991 more rows\n```\n\n\n:::\n:::\n\n\n\nLet's look at the results.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nipw_results |>\n mutate(\n estimate = map_dbl(\n boot_fits,\n \\(.fit) .fit |>\n filter(term == \"park_extra_magic_morning\") |>\n pull(estimate)\n )\n ) |>\n ggplot(aes(estimate)) +\n geom_histogram(bins = 30, fill = \"#D55E00FF\", color = \"white\", alpha = 0.8) +\n theme_minimal()\n```\n\n::: {.cell-output-display}\n![](11-estimands_files/figure-html/unnamed-chunk-27-1.png){width=672}\n:::\n:::\n\n\n\n3. Pull out the causal effect\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# get t-based CIs\nboot_estimate <- int_t(ipw_results, boot_fits) |>\n filter(term == \"park_extra_magic_morning\")\nboot_estimate\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 1 × 6\n term .lower .estimate .upper .alpha .method\n \n1 park_extra_m… -0.0732 6.86 11.6 0.05 studen…\n```\n\n\n:::\n:::\n\n\nWe estimate that among days that have extra magic hours, the expected impact of having extra magic hours on the average posted wait time between 9 and 10am is 6.9 minutes, 95% CI (-0.1, 11.6). \n\n### The outcome model sandwich\n\nThere are two ways to get the sandwich estimator. The first is to use the same weighted outcome model as above along with the `{sandwich}` package. Using the `sandwich` function, we can get the robust estimate for the variance for the parameter of interest, as shown below.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(sandwich)\nweighted_mod <- lm(\n wait_minutes_posted_avg ~ park_extra_magic_morning,\n data = seven_dwarfs_9_with_wt,\n weights = w_att\n)\n\nsandwich(weighted_mod)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n (Intercept)\n(Intercept) 1.488\npark_extra_magic_morning -1.488\n park_extra_magic_morning\n(Intercept) -1.488\npark_extra_magic_morning 8.727\n```\n\n\n:::\n:::\n\n\nHere, our robust variance estimate is 8.727. We can then use this to construct a robust confidence interval.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrobust_var <- sandwich(weighted_mod)[2, 2]\npoint_est <- coef(weighted_mod)[2]\nlb <- point_est - 1.96 * sqrt(robust_var)\nub <- point_est + 1.96 * sqrt(robust_var)\nlb\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\npark_extra_magic_morning \n 0.4383 \n```\n\n\n:::\n\n```{.r .cell-code}\nub\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\npark_extra_magic_morning \n 12.02 \n```\n\n\n:::\n:::\n\nWe estimate that among days that have extra magic hours, the expected impact of having extra magic hours on the average posted wait time between 9 and 10am is 6.2 minutes, 95% CI (0.4, 12). \n\nAlternatively, we could fit the model using the `{survey}` package. To do this, we need to create a design object, like we did when fitting weighted tables.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(survey)\n\ndes <- svydesign(\n ids = ~1,\n weights = ~w_att,\n data = seven_dwarfs_9_with_wt\n)\n```\n:::\n\n\nThen we can use `svyglm` to fit the outcome model.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsvyglm(wait_minutes_posted_avg ~ park_extra_magic_morning, des) |>\n tidy(conf.int = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 7\n term estimate std.error statistic p.value conf.low\n \n1 (Int… 68.7 1.22 56.2 6.14e-178 66.3 \n2 park… 6.23 2.96 2.11 3.60e- 2 0.410\n# ℹ 1 more variable: conf.high \n```\n\n\n:::\n:::\n\n### Sandwich estimator that takes into account the propensity score model\n\nThe correct sandwich estimator will also take into account the propensity score model. The `{PSW}` will allow us to do this. This package has some quirks, for example it doesn't work well with categorical variables, so we need to create dummy variables for `park_ticket_season` to pass into the model. *Actually, the code below isn't working because it seems there is a bug in the package. Stay tuned!*\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(PSW)\n\nseven_dwarfs_9 <- seven_dwarfs_9 |>\n mutate(\n park_ticket_season_regular = ifelse(park_ticket_season == \"regular\", 1, 0),\n park_ticket_season_value = ifelse(park_ticket_season == \"value\", 1, 0)\n )\npsw(\n data = seven_dwarfs_9,\n form.ps = \"park_extra_magic_morning ~ park_ticket_season_regular + park_ticket_season_value + park_close + park_temperature_high\",\n weight = \"ATT\",\n wt = TRUE,\n out.var = \"wait_minutes_posted_avg\"\n)\n```\n:::\n", + "markdown": "# Causal estimands {#sec-estimands}\n\n\n\n\n\n## Estimands, Estimators, Estimates\n\nWhen analyzing to make causal inferences, we need to keep the causal question that we're interested in close to our chest.\nThe causal question helps guide not just the assumptions we need to make but also how we will go about answering the question.\nThis chapter deals with three ideas closely related to answering causal questions: the estimand, the estimator, and the estimate.\nThe **estimand** is the *target of interest*, the **estimator** is the method by which we approximate this estimand using data, and the **estimate** is the value we get when we plug our data into the estimator.\nYou can think of the **estimand** as a glossy picture of a beautiful cake we are trying to bake, the **estimator** as the recipe, and the **estimate** as the cake we pull out of our oven.\n\nSo far, we have been estimating the average treatment effect, the effect of the exposure of interest averaged across the whole population.\nThe **estimand** here is the expected value of the difference in potential outcomes across all individuals:\n\n$$E[Y(1) - Y(0)]$$\n\nThe **estimator** we use depends on the method we've chosen.\nFor example, in an A/B test or randomized controlled trial, our estimator could just be the average outcome among those who received the exposure A minus the average outcome among those who receive exposure B.\n\n$$\\sum_{i=1}^{N}\\frac{Y_i\\times X_i}{N_{\\textrm{A}}} - \\frac{Y_i\\times (1-X_i)}{N_{\\textrm{B}}}$$\n\nWhere $X$ is just an indicator for exposure A ($X = 1$ for exposure A and $X = 0$ for exposure B), $N_A$ is the total number in group A and $N_B$ is the total number in group B such that $N_A + N_B = N$.\nLet's motivate this example a bit more.\nSuppose we have two different designs for a call-to-action (often abbreviated as CTA, a written directive for a marketing campaign) and want to assess whether they lead to a different average number of items a user purchases.\nWe can create an A/B test where we randomly assign a subset of users to see design A or design B.\nSuppose we have A/B testing data for 100 participants in a dataset called `ab`.\nHere is some code to simulate such a data set.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nset.seed(928)\nab <- tibble(\n # generate the exposure, x, from a binomial distribution\n # with the probability exposure A of 0.5\n x = rbinom(100, 1, 0.5),\n # generate the \"true\" average treatment effect of 1\n # we use rnorm(100) to add the random error term that is normally\n # distributed with a mean of 0 and a standard deviation of 1\n y = x + rnorm(100)\n)\n```\n:::\n\n\nHere the exposure is `x`, and the outcome is `y`.\nThe true average treatment effect is 1, implying that on average seeing call-to-action A increases the number of items purchased by 1.\n\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nab\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 100 × 2\n x y\n \n 1 0 -0.280\n 2 1 1.87 \n 3 1 1.36 \n 4 0 -0.208\n 5 0 1.13 \n 6 1 0.352\n 7 1 -0.148\n 8 1 2.08 \n 9 0 2.09 \n10 0 -1.41 \n# ℹ 90 more rows\n```\n\n\n:::\n:::\n\n\nBelow are two ways to estimate this in R.\nUsing a formula approach, we calculate the difference in `y` between exposure values in the first example.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nab |>\n summarise(\n n_a = sum(x),\n n_b = sum(1 - x),\n estimate = sum(\n (y * x) / n_a -\n y * (1 - x) / n_b\n )\n )\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 1 × 3\n n_a n_b estimate\n \n1 54 46 1.15\n```\n\n\n:::\n:::\n\n\nAlternatively, we can `group_by()` `x` and `summarize()` the means of `y` for each group, then pivot the results to take their difference.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nab |>\n group_by(x) |>\n summarise(avg_y = mean(y)) |>\n pivot_wider(\n names_from = x,\n values_from = avg_y,\n names_prefix = \"x_\"\n ) |>\n summarise(estimate = x_1 - x_0)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 1 × 1\n estimate\n \n1 1.15\n```\n\n\n:::\n:::\n\n\nBecause $X$, the exposure, was randomly assigned, this estimator results in an unbiased estimate of our estimand of interest.\n\nWhat do we mean by unbiased, though?\nNotice that while the \"true\" causal effect is 1, this estimate is not precisely 1; it is 1.149.\nWhy the difference?\nThis A/B test included a sample of 100 participants, not the whole population.\nAs that sample increases, our estimate will get closer to the truth.\nLet's try that.\nLet's simulate the data again but increase the sample size from 100 to 100,000:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(928)\nab <- tibble(\n x = rbinom(100000, 1, 0.5),\n y = x + rnorm(100000)\n)\n\nab |>\n summarise(\n n_a = sum(x),\n n_b = sum(1 - x),\n estimate = sum(\n (y * x) / n_a -\n y * (1 - x) / n_b\n )\n )\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 1 × 3\n n_a n_b estimate\n \n1 49918 50082 1.01\n```\n\n\n:::\n:::\n\n\nNotice how the estimate is 1.01, much closer to the actual average treatment effect, 1.\n\nSuppose $X$ had not been randomly assigned.\nIn that case, we could use the pre-exposure covariates to estimate the conditional probability of $X$ (the propensity score) and incorporate this probability in a *weight* $(w_i)$ to estimate this causal effect.\nFor example, we could use the following *estimator* to *estimate* our average treatment effect *estimand*:\n\n$$\\frac{\\sum_{i=1}^NY_i\\times X_i\\times w_i}{\\sum_{i=1}^NX_i\\times w_i}-\\frac{\\sum_{i=1}^NY_i\\times(1-X_i)\\times w_i}{\\sum_{i=1}^N(1-X_i)\\times w_i}$$\n\n## Estimating treatment effects with specific targets in mind\n\nDepending on the study's goal, or the causal question, we may want to estimate different *estimands*.\nHere, we will outline the most common causal estimands, their target populations, the causal questions they may help answer, and the methods used to estimate them [@greifer2021choosing].\n\n### Average treatment effect\n\nA common estimand is the average treatment effect (ATE).\nThe target population is the *total sample* or population of interest.\nThe **estimand** here is the expected value of the difference in potential outcomes across all individuals:\n\n$$E[Y(1) - Y(0)]$$\n\nAn example research question is \"Should a policy be applied to all eligible patients?\" [@greifer2021choosing].\n\nMost randomized controlled trials are designed with the ATE as the target estimand.\nIn a non-randomized setting, you can estimate the ATE using full matching.\nEach observation in the exposed group is matched to at least one observation in the control group (and vice versa) without replacement.\nAlternatively, the following inverse probability weight will allow you to estimate the ATE using propensity score weighting.\n\n$$w_{ATE} = \\frac{X}{p} + \\frac{(1 - X)}{1 - p}$$\n\nIn other words, the weight is one over the propensity score for those in the exposure group and one over one minus the propensity score for the control group.\nIntuitively, this weight equates to the inverse probability of being in the exposure group in which you actually were.\n\nLet's dig deeper into this causal estimand using the Touring Plans data.\nRecall that in @sec-using-ps, we examined the relationship between whether there were \"Extra Magic Hours\" in the morning and the average wait time for the Seven Dwarfs Mine Train the same day between 9 am and 10 am.\nLet's reconstruct our data set, `seven_dwarfs` and fit the same propensity score model specified previously.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(broom)\nlibrary(touringplans)\n\nseven_dwarfs <- seven_dwarfs_train_2018 |>\n filter(wait_hour == 9) |>\n mutate(park_extra_magic_morning = factor(\n park_extra_magic_morning,\n labels = c(\"No Magic Hours\", \"Extra Magic Hours\")\n ))\n\nseven_dwarfs_with_ps <- glm(\n park_extra_magic_morning ~ park_ticket_season + park_close + park_temperature_high,\n data = seven_dwarfs,\n family = binomial()\n) |>\n augment(type.predict = \"response\", data = seven_dwarfs)\n```\n:::\n\n\nLet's examine a table of the variables of interest in this data frame.\nTo do so, we are going to use the `tbl_summary()` function from the `{gtsummary}` package.\n(We'll also use the `{labelled}` package to clean up the variable names for the table.)\n\n\n::: {#tbl-unweighted-gtsummary .cell tbl-cap='A descriptive table of Extra Magic Morning in the touringplans dataset. This table shows the distributions of these variables in the observed population.'}\n\n```{.r .cell-code}\nlibrary(gtsummary)\nlibrary(labelled)\nseven_dwarfs_with_ps <- seven_dwarfs_with_ps |>\n set_variable_labels(\n park_ticket_season = \"Ticket Season\",\n park_close = \"Close Time\",\n park_temperature_high = \"Historic High Temperature\"\n )\n\ntbl_summary(\n seven_dwarfs_with_ps,\n by = park_extra_magic_morning,\n include = c(park_ticket_season, park_close, park_temperature_high)\n) |>\n # add an overall column to the table\n add_overall(last = TRUE)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n \n \n \n \n \n \n
CharacteristicNo Magic Hours, N = 2941Extra Magic Hours, N = 601Overall, N = 3541
Ticket Season


    peak60 (20%)18 (30%)78 (22%)
    regular158 (54%)35 (58%)193 (55%)
    value76 (26%)7 (12%)83 (23%)
Close Time


    16:30:001 (0.3%)0 (0%)1 (0.3%)
    18:00:0037 (13%)18 (30%)55 (16%)
    20:00:0018 (6.1%)2 (3.3%)20 (5.6%)
    21:00:0028 (9.5%)0 (0%)28 (7.9%)
    22:00:0091 (31%)11 (18%)102 (29%)
    23:00:0078 (27%)11 (18%)89 (25%)
    24:00:0040 (14%)17 (28%)57 (16%)
    25:00:001 (0.3%)1 (1.7%)2 (0.6%)
Historic High Temperature84 (78, 89)83 (76, 87)84 (78, 89)
1 n (%); Median (IQR)
\n
\n```\n\n:::\n:::\n\n\nFrom @tbl-unweighted-gtsummary, 294 days did not have extra magic hours in the morning, and 60 did.\nWe also see that 30% of the extra magic mornings were during peak season compared to 20% of the non-extra magic mornings.\nAdditionally, days with extra magic mornings were more likely to close at 6 pm (18:00:00) compared to non-magic hour mornings.\nThe median high temperature on days with extra magic hour mornings is slightly lower (83 degrees) compared to non-extra magic hour morning days.\n\nNow let's construct our propensity score weight to estimate the ATE.\nThe `{propensity}` package has helper functions to estimate weights.\nThe `wt_ate()` function will calculate ATE weights.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(propensity)\nseven_dwarfs_wts <- seven_dwarfs_with_ps |>\n mutate(w_ate = wt_ate(.fitted, park_extra_magic_morning))\n```\n:::\n\n\nLet's look at a distribution of these weights.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(seven_dwarfs_wts, aes(x = w_ate)) +\n geom_histogram(bins = 50)\n```\n\n::: {.cell-output-display}\n![A histogram of the average treatment effect (ATE) weights for whether or not a day had extra magic hours. ATE weights can range from 1 to infinity, so paying attention to the actual range of weights is important.](11-estimands_files/figure-html/fig-sd-ate-hist-1.png){#fig-sd-ate-hist width=672}\n:::\n:::\n\n\nMany weights in @fig-sd-ate-hist are close to 1 (the smallest possible value for an ATE weight), with a long tail leading to the largest weight (around 24).\nThis distribution highlights a potential problem with ATE weights.\nThey range from 1 to infinity; if your weights are too large in small samples, this can lead to bias in your estimates (known as finite sample bias).\n\n::: {.callout-warning}\n## Finite sample bias\n\nWe know that not accounting for confounders can bias our causal estimates, but it turns out that even after accounting for all confounders, we may still get a biased estimate in finite samples.\nMany of the properties we tout in statistics rely on *large samples*---how \"large\" is defined can be opaque.\nLet's look at a quick simulation.\nHere, we have an exposure, $X$, an outcome, $Y$, and one confounder, $Z$.\nWe will simulate $Y$, which is only dependent on $Z$ (so the true treatment effect is 0), and $X$, which also depends on $Z$.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(928)\nn <- 100\nfinite_sample <- tibble(\n # z is normally distributed with a mean: 0 and sd: 1\n z = rnorm(n),\n # x is defined from a probit selection model with normally distributed errors\n x = case_when(\n 0.5 + z + rnorm(n) > 0 ~ 1,\n TRUE ~ 0\n ),\n # y is continuous, dependent only on z with normally distrbuted errors\n y = z + rnorm(n)\n)\n```\n:::\n\n\nIf we fit a propensity score model using the one confounder $Z$ and calculate the weighted estimator, we should get an unbiased result (which in this case would be 0).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## fit the propensity score model\nfinite_sample_wts <- glm(\n x ~ z,\n data = finite_sample,\n family = binomial(\"probit\")\n) |>\n augment(newdata = finite_sample, type.predict = \"response\") |>\n mutate(wts = wt_ate(.fitted, x))\n\nfinite_sample_wts |>\n summarize(\n effect = sum(y * x * wts) / sum(x * wts) -\n sum(y * (1 - x) * wts) / sum((1 - x) * wts)\n )\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 1 × 1\n effect\n \n1 0.197\n```\n\n\n:::\n:::\n\n\nOk, here, our effect of $X$ is pretty far from 0, although it's hard to know if this is really bias, or something we are just seeing by chance in this particular simulated sample.\nTo explore the potential for finite sample bias, we can rerun this simulation many times at different sample sizes.\nLet's do that.\n\n\n::: {.cell hash='11-estimands_cache/html/sim-finite-sample-bias_fc62762d9112d2aa58d8f14393ce0072'}\n\n```{.r .cell-code}\nsim <- function(n) {\n ## create a simulated dataset\n finite_sample <- tibble(\n z = rnorm(n),\n x = case_when(\n 0.5 + z + rnorm(n) > 0 ~ 1,\n TRUE ~ 0\n ),\n y = z + rnorm(n)\n )\n finite_sample_wts <- glm(\n x ~ z,\n data = finite_sample,\n family = binomial(\"probit\")\n ) |>\n augment(newdata = finite_sample, type.predict = \"response\") |>\n mutate(wts = wt_ate(.fitted, x))\n bias <- finite_sample_wts |>\n summarize(\n effect = sum(y * x * wts) / sum(x * wts) -\n sum(y * (1 - x) * wts) / sum((1 - x) * wts)\n ) |>\n pull()\n tibble(\n n = n,\n bias = bias\n )\n}\n\n## Examine 5 different sample sizes, simulate each 1000 times\nset.seed(1)\nfinite_sample_sims <- map_df(\n rep(\n c(50, 100, 500, 1000, 5000, 10000),\n each = 1000\n ),\n sim\n)\n\nbias <- finite_sample_sims %>%\n group_by(n) %>%\n summarise(bias = mean(bias))\n```\n:::\n\n\nLet's plot our results:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(bias, aes(x = n, y = bias)) +\n geom_point() +\n geom_line() +\n geom_hline(yintercept = 0, lty = 2)\n```\n\n::: {.cell-output-display}\n![Finite sample bias present with ATE weights created using correctly specified propensity score model, varying the sample size from n = 50 to n = 10,000](11-estimands_files/figure-html/fig-finite-sample-bias-1.png){#fig-finite-sample-bias width=672}\n:::\n:::\n\n\nThis is an example of finite sample bias.\nNotice here, even when the sample size is quite large (5,000) we still see some bias away from the \"true\" effect of 0.\nIt isn't until a sample size larger than 10,000 that we see this bias disappear.\n\nEstimands that utilize weights that are unbounded (i.e. that theoretically can be infinitely large) are more likely to suffer from finite sample bias.\nThe likelihood of falling into finite sample bias depends on:\\\n(1) the estimand you have chosen (i.e. are the weights bounded?)\\\n(2) the distribution of the covariates in the exposed and unexposed groups (i.e. is there good overlap? Potential positivity violations, when there is poor overlap, are the regions where weights can become too large)\\\n(3) the sample size.\n:::\n\nThe `{gtsummary}` package allows the creation of *weighted* tables, which can help us build an intuition for the psuedopopulation created by the weights.\nFirst, we must load the `{survey}` package and create a survey design object.\nThe `{survey}` package supports calculating statistics and models using weights.\nHistorically, many researchers applied techniques incorporating weights to survey analyses.\nEven though we are not doing a survey analysis, the same techniques are helpful for our propensity score weights.\n\n\n\n\n::: {#tbl-weighted-gtsummary .cell tbl-cap='A descriptive table of Extra Magic Morning hours weighted by the Average Treatment Effect Weights. This table shows the distributions of these variables in the psuedopopulation created by these weights.'}\n\n```{.r .cell-code}\nlibrary(survey)\nseven_dwarfs_svy <- svydesign(\n ids = ~1,\n data = seven_dwarfs_wts,\n weights = ~w_ate\n)\ntbl_svysummary(\n seven_dwarfs_svy,\n by = park_extra_magic_morning,\n include = c(park_ticket_season, park_close, park_temperature_high)\n) |>\n add_overall(last = TRUE)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n \n \n \n \n \n \n
CharacteristicNo Magic Hours, N = 3541Extra Magic Hours, N = 3571Overall, N = 7111
Ticket Season


    peak78 (22%)81 (23%)160 (22%)
    regular193 (54%)187 (52%)380 (53%)
    value83 (23%)89 (25%)172 (24%)
Close Time


    16:30:002 (0.4%)0 (0%)2 (0.2%)
    18:00:0050 (14%)72 (20%)122 (17%)
    20:00:0020 (5.6%)19 (5.3%)39 (5.5%)
    21:00:0031 (8.7%)0 (0%)31 (4.3%)
    22:00:00108 (30%)86 (24%)193 (27%)
    23:00:0095 (27%)81 (23%)176 (25%)
    24:00:0048 (14%)94 (26%)142 (20%)
    25:00:001 (0.3%)6 (1.6%)7 (1.0%)
Historic High Temperature84 (78, 89)83 (78, 87)84 (78, 88)
1 n (%); Median (IQR)
\n
\n```\n\n:::\n:::\n\n\nNotice in @tbl-weighted-gtsummary that the variables are more **balanced** between the groups.\nFor example, looking at the variable `park_ticket_season`, in @tbl-unweighted-gtsummary we see that a higher percentage of the days with extra magic hours in the mornings were during the peak season (30%) compared to the percent of days without extra magic hours in the morning (23%).\nIn @tbl-weighted-gtsummary this is balanced, with a weighted percent of 22% peak days in the extra magic morning group and 22% in the no extra magic morning group.\nThe weights change the variables' distribution by weighting each observation so that the two groups appear balanced.\nAlso, notice that the distribution of variables in @tbl-weighted-gtsummary now matches the Overall column of @tbl-unweighted-gtsummary.\nThat is what ATE weights do: they make the distribution of variables included in the propensity score for each exposure group look like the overall distribution across the whole sample.\n\nLet's look at another visualization that can help us understand the weighted psuedopopulation created by the ATE weights: a mirrored histogram.\nWe will plot the distribution of the propensity score for the days in the exposure group (days with extra magic hours in the morning) on the top half of the plot; then, we will mirror the distribution of the propensity score for the days in the\"unexposed group below it. This visualization allows us to compare the two distributions quickly. Then we will overlay weighted histograms to demonstrate how these distributions differ after incorporating the weights. The {halfmoon} package includes a function `geom_mirror_histogram()` to assist with this visualization.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(halfmoon)\nggplot(seven_dwarfs_wts, aes(.fitted, group = park_extra_magic_morning)) +\n geom_mirror_histogram(bins = 50) +\n geom_mirror_histogram(\n aes(fill = park_extra_magic_morning, weight = w_ate),\n bins = 50,\n alpha = 0.5\n ) +\n scale_y_continuous(labels = abs) +\n labs(\n x = \"propensity score\",\n fill = \"Extra Magic Morning\"\n )\n```\n\n::: {.cell-output-display}\n![A mirrored histogram of the distribution of propensity scores between exposure groups. The dark bars represent the unweighted distribution, and the colored bars represent the distribution weighted by the average treatment effect (ATE) weight.](11-estimands_files/figure-html/fig-sd-mirror-hist-ate-1.png){#fig-sd-mirror-hist-ate width=672}\n:::\n:::\n\n\nWe can learn a few things from @fig-sd-mirror-hist-ate.\nFirst, we can see that there are more \"unexposed\" days---days without extra magic hours in the morning---compared to \"exposed\" in the observed population.\nWe can see this by examining the darker histograms, which show the distributions in the observed sample.\nSimilarly, the exposed days are upweighted more substantially, as evidenced by the light blue we see on the top.\nWe can also see that after weighting, the two distributions look similar; the shape of the blue weighted distribution on top looks comparable to the orange weighted distribution below.\n\n### Average treatment effect among the treated\n\nThe target population to estimate the average treatment effect among the treated (ATT) is the *exposed* (treated) population.\nThis causal estimand conditions on those in the exposed group:\n\n$$E[Y(1) - Y(0) | X = 1]$$\n\nExample research questions where the ATT is of interest could include \"Should we stop our marketing campaign to those currently receiving it?\" or \"Should medical providers stop recommending treatment for those currently receiving it?\" [@greifer2021choosing]\n\nThe ATT is a common target estimand when using matching; here, all exposed observations are included and \"matched\" to control observations, some of which may be discarded.\nAlternatively, the ATT can be estimated via weighting.\nThe ATT weight is estimated by:\n\n$$w_{ATT} = X + \\frac{(1 - X)p}{1 - p}$$\n\nLet's add the ATT weights to the `seven_dwarfs_wts` data frame and look at their distribution.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nseven_dwarfs_wts <- seven_dwarfs_wts |>\n mutate(w_att = wt_att(.fitted, park_extra_magic_morning))\n\nggplot(seven_dwarfs_wts, aes(w_att)) +\n geom_histogram(bins = 50)\n```\n\n::: {.cell-output-display}\n![A histogram of the average treatment effect among the treated (ATT) weights. The range of the ATT weights in this example is more stable than the ATE weights: the range is much smaller.](11-estimands_files/figure-html/fig-sd-att-hist-1.png){#fig-sd-att-hist width=672}\n:::\n:::\n\n\nCompared to the ATE weights, the weights in @fig-sd-att-hist look more *stable*; the distribution of the weights is not heavily skewed, and the range is small, from ground zero to a bit over one, with many at precisely one.\nThese weights are exactly one for all days in the exposed group.\nTheoretically, for unexposed days these weights can range from zero to infinity.\nStill, because we have many more unexposed days compared to exposed in this particular sample, the range is much smaller, meaning the vast majority of unexposed days are \"down-weighted\" to match the variable distribution of the exposed days.\nLet's take a look at the weighted table to see this a bit more clearly.\n\n\n::: {#tbl-weighted-att .cell tbl-cap='A descriptive table of Extra Magic Morning hours weighted by the Average Treatment Effect among the Treated Weights. This table shows the distributions of these variables in the psuedopopulation created by these weights.'}\n\n```{.r .cell-code}\nseven_dwarfs_svy <- svydesign(\n ids = ~1,\n data = seven_dwarfs_wts,\n weights = ~w_att\n)\ntbl_svysummary(\n seven_dwarfs_svy,\n by = park_extra_magic_morning,\n include = c(park_ticket_season, park_close, park_temperature_high)\n) |>\n add_overall(last = TRUE)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n \n \n \n \n \n \n
CharacteristicNo Magic Hours, N = 601Extra Magic Hours, N = 601Overall, N = 1201
Ticket Season


    peak18 (30%)18 (30%)36 (30%)
    regular35 (58%)35 (58%)70 (58%)
    value7 (12%)7 (12%)14 (12%)
Close Time


    16:30:001 (0.9%)0 (0%)1 (0.4%)
    18:00:0013 (21%)18 (30%)31 (26%)
    20:00:002 (3.3%)2 (3.3%)4 (3.3%)
    21:00:003 (4.6%)0 (0%)3 (2.3%)
    22:00:0017 (28%)11 (18%)28 (23%)
    23:00:0017 (28%)11 (18%)28 (23%)
    24:00:008 (14%)17 (28%)25 (21%)
    25:00:000 (0.3%)1 (1.7%)1 (1.0%)
Historic High Temperature83 (74, 88)83 (75, 87)83 (75, 88)
1 n (%); Median (IQR)
\n
\n```\n\n:::\n:::\n\n\nWe again achieve a balance between groups, but the target values in @tbl-weighted-att differ from the previous tables.\nRecall in our ATE weighted table (@tbl-weighted-gtsummary), when looking at the `wdw_ticket_season` variable, we saw the weighted percent in the peak season balanced close to 22%, the overall percent of peak days in the observed sample.\nWe again see balance, but the weighted percent of peak season days is 30% in the exposed and unexposed groups, reflecting the percent in the unweighted exposure group.\nIf you compare @tbl-weighted-att to our unweighted table (@tbl-unweighted-gtsummary), you might notice that the columns are most similar to the exposed column from the unweighted table.\nThis is because the \"target\" population is the exposed group, the days with extra magic hours in the morning, so we're trying to make the comparison group (no magic hours) as similar to that group as possible.\n\nWe can again create a mirrored histogram to observe the weighted psuedopopulation.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(seven_dwarfs_wts, aes(.fitted, group = park_extra_magic_morning)) +\n geom_mirror_histogram(bins = 50) +\n geom_mirror_histogram(\n aes(fill = park_extra_magic_morning, weight = w_att),\n bins = 50,\n alpha = 0.5\n ) +\n scale_y_continuous(labels = abs) +\n labs(\n x = \"propensity score\",\n fill = \"Extra Magic Morning\"\n )\n```\n\n::: {.cell-output-display}\n![A mirrored histogram of the distribution of propensity scores between exposure groups. The dark bars represent the unweighted distribution, and the colored bars represent the distribution weighted by the average treatment effect among the treated (ATT) weight.](11-estimands_files/figure-html/fig-sd-mirror-hist-att-1.png){#fig-sd-mirror-hist-att width=672}\n:::\n:::\n\n\nSince every day in the exposed group has a weight of 1, the distribution of the propensity scores (the dark bars above 0 on the graph) in @fig-sd-mirror-hist-att overlaps exactly with the weighted distribution overlaid in blue.\nIn the unexposed population, we see that almost all observations are *down weighted*; the dark orange distribution is smaller than the distribution of the propensity scores and more closely matches the exposed distribution above it.\nThe ATT is easier to estimate when there are many more observations in the unexposed group compared to the exposed one.\n\n### Average treatment effect among the unexposed\n\nThe target population to estimate the average treatment effect among the unexposed (control) (ATU / ATC) is the *unexposed* (control) population.\nThis causal estimand conditions on those in the unexposed group.\n\n$$E[Y(1) - Y(0) | X = 0]$$\n\nExample research questions where the ATU is of interest could include \"Should we send our marketing campaign to those not currently receiving it?\" or \"Should medical providers begin recommending treatment for those not currently receiving it?\" [@greifer2021choosing]\n\nThe ATU can be estimated via matching; here, all unexposed observations are included and \"matched\" to exposed observations, some of which may be discarded.\nAlternatively, the ATU can be estimated via weighting.\nThe ATU weight is estimated by:\n\n$$w_{ATU} = \\frac{X(1-p)}{p}+ (1-X)$$ Let's add the ATU weights to the `seven_dwarfs_wts` data frame and look at their distribution.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nseven_dwarfs_wts <- seven_dwarfs_wts |>\n mutate(w_atu = wt_atu(.fitted, park_extra_magic_morning))\n\nggplot(seven_dwarfs_wts, aes(w_atu)) +\n geom_histogram(bins = 50)\n```\n\n::: {.cell-output-display}\n![A histogram of the average treatment effect among the unexposed (ATU) weights. The range of the ATU weights in this example is very similar to the ATE weights.](11-estimands_files/figure-html/fig-sd-atu-hist-1.png){#fig-sd-atu-hist width=672}\n:::\n:::\n\n\nThe distribution of the weights in @fig-sd-atu-hist weights looks very similar to what we saw with the ATE weights---several around 1, with a long tail.\nThese weights will be 1 for all observations in the unexposed group; they can range from 0 to infinity for the exposed group.\nBecause we have more observations in our unexposed group, our exposed observations are upweighted to match their distribution.\n\nNow let's take a look at the weighted table.\n\n\n::: {#tbl-weighted-atu .cell tbl-cap='A descriptive table of Extra Magic Morning hours weighted by the Average Treatment Effect among the Unexposed Weights. This table shows the distributions of these variables in the psuedopopulation created by these weights.'}\n\n```{.r .cell-code}\nseven_dwarfs_svy <- svydesign(\n ids = ~1,\n data = seven_dwarfs_wts,\n weights = ~w_atu\n)\ntbl_svysummary(\n seven_dwarfs_svy,\n by = park_extra_magic_morning,\n include = c(park_ticket_season, park_close, park_temperature_high)\n) |>\n add_overall(last = TRUE)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n \n \n \n \n \n \n
CharacteristicNo Magic Hours, N = 2941Extra Magic Hours, N = 2971Overall, N = 5911
Ticket Season


    peak60 (20%)63 (21%)123 (21%)
    regular158 (54%)152 (51%)310 (52%)
    value76 (26%)82 (27%)158 (27%)
Close Time


    16:30:001 (0.3%)0 (0%)1 (0.2%)
    18:00:0037 (13%)54 (18%)91 (15%)
    20:00:0018 (6.1%)17 (5.7%)35 (5.9%)
    21:00:0028 (9.5%)0 (0%)28 (4.7%)
    22:00:0091 (31%)75 (25%)166 (28%)
    23:00:0078 (27%)70 (23%)148 (25%)
    24:00:0040 (14%)77 (26%)117 (20%)
    25:00:001 (0.3%)5 (1.6%)6 (1.0%)
Historic High Temperature84 (78, 89)83 (78, 87)84 (78, 88)
1 n (%); Median (IQR)
\n
\n```\n\n:::\n:::\n\n\nNow the exposed column of @tbl-weighted-atu is weighted to match the unexposed column of the unweighted table.\n\nWe can again create a mirrored histogram to observe the weighted psuedopopulation.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(seven_dwarfs_wts, aes(.fitted, group = park_extra_magic_morning)) +\n geom_mirror_histogram(bins = 50) +\n geom_mirror_histogram(\n aes(fill = park_extra_magic_morning, weight = w_atu),\n bins = 50,\n alpha = 0.5\n ) +\n scale_y_continuous(labels = abs) +\n labs(\n x = \"propensity score\",\n fill = \"Extra Magic Morning\"\n )\n```\n\n::: {.cell-output-display}\n![A mirrored histogram of the distribution of propensity scores between exposure groups. The dark bars represent the unweighted distribution, and the colored bars represent the distribution weighted by the average treatment effect among the unexposed (ATU) weight.](11-estimands_files/figure-html/fig-sd-mirror-hist-atu-1.png){#fig-sd-mirror-hist-atu width=672}\n:::\n:::\n\n\nThe weights for the unexposed days are all 1, so the distribution of the propensity scores for this group exactly overlaps the weighted pseudopopulation in orange.\nThe blue bars indicate that the exposed population is largely upweighted to match the unexposed distribution.\n\n### Average treatment effect among the evenly matchable\n\nThe target population to estimate the average treatment effect among the evenly matchable (ATM) is the evenly matchable.\nThis causal estimand conditions on those deemed \"evenly matchable\" by some distance metric.\n\n$$E[Y(1) - Y(0) | M_d = 1]$$\n\nHere $d$ denotes a distance measure, and $M_d=1$ indicates that a unit is evenly matchable under that distance measure.[@samuels2017aspects; @d2018improving] Example research questions where the ATM is of interest could include \"Should those at clinical equipoise receive the exposure?\" [@greifer2021choosing]\n\nThe ATM is a common target estimand when using matching.\nHere, some exposed observations are \"matched\" to control observations, however some observations in both groups may be discarded.\nThis is often done via a \"caliper\", where observations are only matched if they fall within a pre-specified caliper distance of each other.\nAlternatively, the ATM can be estimated via weighting.\nThe ATM weight is estimated by:\n\n$$w_{ATM} = \\frac{\\min \\{p, 1-p\\}}{Xp + (1-X)(1-p)}$$\n\nLet's add the ATM weights to the `seven_dwarfs_wts` data frame and look at their distribution.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nseven_dwarfs_wts <- seven_dwarfs_wts |>\n mutate(w_atm = wt_atm(.fitted, park_extra_magic_morning))\n\nggplot(seven_dwarfs_wts, aes(w_atm)) +\n geom_histogram(bins = 50)\n```\n\n::: {.cell-output-display}\n![A histogram of the average treatment effect among the evenly matchable (ATM) weights for whether or not a day had extra magic hours. ATM weights can range from 0 to 1, so they are always stable.](11-estimands_files/figure-html/fig-sd-atm-hist-1.png){#fig-sd-atm-hist width=672}\n:::\n:::\n\n\nThe distribution of the weights in @fig-sd-atm-hist looks very similar to what we saw with the ATT weights.\nThat is because, in this particular sample, there are many more unexposed observations compared to exposed ones.\nThese weights have the nice property that they range from 0 to 1, making them *always* stable, regardless of the imbalance between exposure groups.\n\nNow let's take a look at the weighted table.\n\n\n::: {#tbl-weighted-atm .cell tbl-cap='A descriptive table of Extra Magic Morning hours weighted by the Average Treatment Effect among the Evenly Matchable Weights. This table shows the distributions of these variables in the psuedopopulation created by these weights.'}\n\n```{.r .cell-code}\nseven_dwarfs_svy <- svydesign(\n ids = ~1,\n data = seven_dwarfs_wts,\n weights = ~w_atm\n)\ntbl_svysummary(\n seven_dwarfs_svy,\n by = park_extra_magic_morning,\n include = c(park_ticket_season, park_close, park_temperature_high)\n) |>\n add_overall(last = TRUE)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n \n \n \n \n \n \n
CharacteristicNo Magic Hours, N = 601Extra Magic Hours, N = 601Overall, N = 1201
Ticket Season


    peak18 (30%)18 (30%)36 (30%)
    regular35 (58%)35 (58%)70 (58%)
    value7 (12%)7 (12%)14 (12%)
Close Time


    16:30:001 (0.9%)0 (0%)1 (0.4%)
    18:00:0013 (21%)18 (30%)31 (26%)
    20:00:002 (3.3%)2 (3.3%)4 (3.3%)
    21:00:003 (4.6%)0 (0%)3 (2.3%)
    22:00:0017 (28%)11 (18%)28 (23%)
    23:00:0017 (28%)11 (18%)28 (23%)
    24:00:008 (14%)17 (28%)25 (21%)
    25:00:000 (0.3%)1 (1.7%)1 (1.0%)
Historic High Temperature83 (74, 88)83 (75, 87)83 (75, 88)
1 n (%); Median (IQR)
\n
\n```\n\n:::\n:::\n\n\nIn this particular sample, the ATM weights resemble the ATT weights, so this table looks similar to the ATT weighted table.\nThis similarity is not guaranteed, and the ATM pseudopopulation can look different from the overall, exposed, and unexposed unweighted populations.\nIt's thus essential to examine the weighted table when using ATM weights to understand what population inference will ultimately be drawn on.\n\nWe can again create a mirrored histogram to observe the ATM weighted psuedopopulation.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(seven_dwarfs_wts, aes(.fitted, group = park_extra_magic_morning)) +\n geom_mirror_histogram(bins = 50) +\n geom_mirror_histogram(\n aes(fill = park_extra_magic_morning, weight = w_atm),\n bins = 50,\n alpha = 0.5\n ) +\n scale_y_continuous(labels = abs) +\n labs(\n x = \"propensity score\",\n fill = \"Extra Magic Morning\"\n )\n```\n\n::: {.cell-output-display}\n![A mirrored histogram of the distribution of propensity scores between exposure groups. The dark bars represent the unweighted distribution, and the colored bars represent the distribution weighted by the average treatment effect among the evenly matchable (ATM) weight.](11-estimands_files/figure-html/fig-sd-mirror-hist-atm-1.png){#fig-sd-mirror-hist-atm width=672}\n:::\n:::\n\n\nAgain, the ATM weights are bounded by 0 and 1, meaning that all observations will be *down weighted*.\nThis removes the potential for finite sample bias due to large weights in small samples and has improved variance properties.\n\n### Average treatment effect among the overlap population\n\nThe target population to estimate the average treatment effect among the overlap population (ATO) is the overlap -- this is very similar to the \"even matchable\" from above.\nExample research questions where the ATO is of interest are the same as those for the ATM, such as \"Should those at clinical equipoise receive the exposure?\" [@greifer2021choosing]\n\nThese weights, again, are pretty similar to the ATM weights but are slightly attenuated, yielding improved variance properties.\n\nThe ATO weight is estimated by:\n\n$$w_{ATO} = X(1-p) + (1-X)p$$\n\nLet's add the ATO weights to the `seven_dwarfs_wts` data frame and look at their distribution.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nseven_dwarfs_wts <- seven_dwarfs_wts |>\n mutate(w_ato = wt_ato(.fitted, park_extra_magic_morning))\n\nggplot(seven_dwarfs_wts, aes(w_ato)) +\n geom_histogram(bins = 50)\n```\n\n::: {.cell-output-display}\n![A histogram of the average treatment effect among the overlap population (ATO) weights for whether or not a day had extra magic hours. Like ATM weights, ATO weights can range from 0 to 1, so they are always stable. ATO weights also have improved variance properties compared to the ATM weights.](11-estimands_files/figure-html/fig-sd-ato-hist-1.png){#fig-sd-ato-hist width=672}\n:::\n:::\n\n\nLike the ATM weights, the ATO weights are bounded by 0 and 1, making them more stable than the ATE, ATT, and ATU, regardless of the exposed and unexposed imbalance.\n\n::: {.callout-warning}\n## Finite sample bias - revisited\n\nLet's revisit our finite sample bias simulation, adding in the ATO weights to examine whether they are impacted by this potential for bias.\n\n\n::: {.cell hash='11-estimands_cache/html/sim-finite-sample-bias-2_879b7e35a694946c269801944ab8948b'}\n\n```{.r .cell-code}\nsim <- function(n) {\n ## create a simulated dataset\n finite_sample <- tibble(\n z = rnorm(n),\n x = case_when(\n 0.5 + z + rnorm(n) > 0 ~ 1,\n TRUE ~ 0\n ),\n y = z + rnorm(n)\n )\n finite_sample_wts <- glm(\n x ~ z,\n data = finite_sample,\n family = binomial(\"probit\")\n ) |>\n augment(newdata = finite_sample, type.predict = \"response\") |>\n mutate(\n wts_ate = wt_ate(.fitted, x),\n wts_ato = wt_ato(.fitted, x)\n )\n bias <- finite_sample_wts |>\n summarize(\n effect_ate = sum(y * x * wts_ate) / sum(x * wts_ate) -\n sum(y * (1 - x) * wts_ate) / sum((1 - x) * wts_ate),\n effect_ato = sum(y * x * wts_ato) / sum(x * wts_ato) -\n sum(y * (1 - x) * wts_ato) / sum((1 - x) * wts_ato)\n )\n tibble(\n n = n,\n bias = c(bias$effect_ate, bias$effect_ato),\n weight = c(\"ATE\", \"ATO\")\n )\n}\n\n## Examine 5 different sample sizes, simulate each 1000 times\nset.seed(1)\nfinite_sample_sims <- map_df(\n rep(\n c(50, 100, 500, 1000, 5000, 10000),\n each = 1000\n ),\n sim\n)\n\nbias <- finite_sample_sims %>%\n group_by(n, weight) %>%\n summarise(bias = mean(bias))\n```\n:::\n\n\nLet's plot our results:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(bias, aes(x = n, y = bias, color = weight)) +\n geom_point() +\n geom_line() +\n geom_hline(yintercept = 0, lty = 2)\n```\n\n::: {.cell-output-display}\n![Finite sample bias with ATE weights (orange) and ATO weights (blue) created using correctly specified propensity score model, varying the sample size from n = 50 to n = 10,000](11-estimands_files/figure-html/fig-finite-sample-bias-2-1.png){#fig-finite-sample-bias-2 width=672}\n:::\n:::\n\n\nNotice here the estimate using the ATO weights is almost immediately unbiased, even in fairly small samples.\n:::\n\nNow let's take a look at the weighted table.\n\n\n::: {#tbl-weighted-ato .cell tbl-cap='A descriptive table of Extra Magic Morning hours weighted by the Average Treatment Effect among the Overlap Population Weights. This table shows the distributions of these variables in the psuedopopulation created by these weights.'}\n\n```{.r .cell-code}\nseven_dwarfs_svy <- svydesign(\n ids = ~1,\n data = seven_dwarfs_wts,\n weights = ~w_ato\n)\ntbl_svysummary(\n seven_dwarfs_svy,\n by = park_extra_magic_morning,\n include = c(park_ticket_season, park_close, park_temperature_high)\n) |>\n add_overall(last = TRUE)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n \n \n \n \n \n \n
CharacteristicNo Magic Hours, N = 481Extra Magic Hours, N = 481Overall, N = 961
Ticket Season


    peak14 (29%)14 (29%)28 (29%)
    regular28 (58%)28 (58%)55 (58%)
    value6 (13%)6 (13%)13 (13%)
Close Time


    16:30:000 (0.7%)0 (0%)0 (0.4%)
    18:00:009 (19%)13 (27%)22 (23%)
    20:00:002 (3.7%)2 (3.7%)4 (3.7%)
    21:00:002 (5.1%)0 (0%)2 (2.5%)
    22:00:0014 (29%)9 (20%)23 (24%)
    23:00:0014 (28%)9 (19%)23 (24%)
    24:00:007 (14%)14 (29%)21 (22%)
    25:00:000 (0.3%)1 (1.7%)1 (1.0%)
Historic High Temperature83 (75, 88)83 (76, 87)83 (75, 88)
1 n (%); Median (IQR)
\n
\n```\n\n:::\n:::\n\n\nLike the ATM weights, the ATO pseudopopulation may not resemble the overall, exposed, or unexposed unweighted populations, so it is essential to examine these weighted tables to understand the population on whom we will draw inferences.\n\nWe can again create a mirrored histogram to observe the ATO weighted psuedopopulation.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(seven_dwarfs_wts, aes(.fitted, group = park_extra_magic_morning)) +\n geom_mirror_histogram(bins = 50) +\n geom_mirror_histogram(\n aes(fill = park_extra_magic_morning, weight = w_ato),\n bins = 50,\n alpha = 0.5\n ) +\n scale_y_continuous(labels = abs) +\n labs(\n x = \"propensity score\",\n fill = \"Extra Magic Morning\"\n )\n```\n\n::: {.cell-output-display}\n![A mirrored histogram of the distribution of propensity scores between exposure groups. The dark bars represent the unweighted distribution, and the colored bars represent the distribution weighted by the average treatment effect among the overlap population (ATO) weight.](11-estimands_files/figure-html/fig-sd-mirror-hist-ato-1.png){#fig-sd-mirror-hist-ato width=672}\n:::\n:::\n\n\n@fig-sd-mirror-hist-ato looks similar to the ATM pseudopopulation, with slight attenuation, as evidenced by the increased down weighting in the exposed population.\nWe prefer using the ATO weights when the overlap (or evenly matchable) population is the target, as they have improved variance properties.\nAnother benefit is that any variables included in the propensity score model (if fit via logistic regression) will be perfectly balanced on the mean.\nWe will discuss this further in @sec-eval-ps-model.\n\n## Choosing estimands\n\nSeveral factors need to be considered when choosing an estimand for a given analysis.\nFirst an foremost, the research question at hand will drive this decision.\nFor example, returing to the Touring Plans data, suppose Disney was interested in assessing whether they should add additional extra magic hours on days that don't currently have them---here, the ATU will be the estimand of interest.\nOther considerations include the available sample size and the distribution of the exposed and unexposed observations (i.e. might finite sample bias be an issue?).\nBelow is a table summarizing the estimands and methods for estimating them (including R functions and example code), along with sample questions extended from Table 2 in @greifer2021choosing.\n\n+----------+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+------------------------+\n| Estimand | Target population | Example Research Question | Matching Method | Weighting Method |\n+==========+==================================+============================================================================================================================================================+===============================================+========================+\n| ATE | Full population | Should we decide whether to have extra magic hours *all* mornings to change the wait time for Seven Dwarfs Mine Train between 5-6pm? | Full matching | ATE Weights |\n| | | | | |\n| | | Should a specific policy be applied to all eligible observations? | Fine stratification | `propensity::wt_ate()` |\n| | | | | |\n| | | | `MatchIt::matchit(…, estimand = \"ATE\")` | | |\n+----------+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+------------------------+\n| ATT | Exposed (treated) observations | Should we stop extra magic hours to change the wait time for Seven Dwarfs Mine Train between 5-6pm? | Pair matching without a caliper | ATT weights |\n| | | | | |\n| | | Should we stop our marketing campaign to those currently receiving it? | Full matching | `propensity::wt_att()` |\n| | | | | |\n| | | Should medical providers stop recommending treatment for those currently receiving it? | Fine stratification | |\n| | | | | |\n| | | | `MatchIt::matchit(…, estimand = \"ATT\")` | |\n+----------+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+------------------------+\n| ATU | Unexposed (control) observations | Should we add extra magic hours for all days to change the wait time for Seven Dwarfs Mine Train between 5-6pm? | Pair matching without a caliper | ATU weights |\n| | | | | |\n| | | Should we extend our marketing campaign to those not receiving it? | Full matching | `propensity::wt_atu()` |\n| | | | | |\n| | | Should medical providers extend treatment to those not currently receiving it? | Fine stratification | |\n| | | | | |\n| | | | `MatchIt::matchit(…, estimand = \"ATC\")` | |\n+----------+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+------------------------+\n| ATM | Evenly matchable | Are there some days we should change whether we are offering extra magic hours in order to change the wait time for Seven Dwarfs Mine Train between 5-6pm? | Caliper matching | ATM weights |\n| | | | | |\n| | | Is there an effect of the exposure for some observations? | `MatchIt::matchit(…, caliper = 0.1)` | `propensity::wt_atm()` |\n| | | | | |\n| | | Should those at clinical equipoise receive treatment? | Coarsened exact matching | |\n| | | | | |\n| | | | Cardinality matching | |\n| | | | | |\n| | | | `MatchIt::matchit(…, method = \"cardinality\")` | |\n+----------+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+------------------------+\n| ATO | Overlap population | Same as ATM | | ATO weights |\n| | | | | |\n| | | | | `propensity::wt_ato()` |\n+----------+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+------------------------+\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/chapters/11-estimands/figure-html/fig-finite-sample-bias-2-1.png b/_freeze/chapters/11-estimands/figure-html/fig-finite-sample-bias-2-1.png new file mode 100644 index 0000000..6abc919 Binary files /dev/null and b/_freeze/chapters/11-estimands/figure-html/fig-finite-sample-bias-2-1.png differ diff --git a/_freeze/chapters/11-estimands/figure-html/fig-sd-atm-hist-1.png b/_freeze/chapters/11-estimands/figure-html/fig-sd-atm-hist-1.png new file mode 100644 index 0000000..452b494 Binary files /dev/null and b/_freeze/chapters/11-estimands/figure-html/fig-sd-atm-hist-1.png differ diff --git a/_freeze/chapters/11-estimands/figure-html/fig-sd-ato-hist-1.png b/_freeze/chapters/11-estimands/figure-html/fig-sd-ato-hist-1.png new file mode 100644 index 0000000..a6ec8f1 Binary files /dev/null and b/_freeze/chapters/11-estimands/figure-html/fig-sd-ato-hist-1.png differ diff --git a/_freeze/chapters/11-estimands/figure-html/fig-sd-atu-hist-1.png b/_freeze/chapters/11-estimands/figure-html/fig-sd-atu-hist-1.png new file mode 100644 index 0000000..679c4ec Binary files /dev/null and b/_freeze/chapters/11-estimands/figure-html/fig-sd-atu-hist-1.png differ diff --git a/_freeze/chapters/11-estimands/figure-html/fig-sd-mirror-hist-atm-1.png b/_freeze/chapters/11-estimands/figure-html/fig-sd-mirror-hist-atm-1.png new file mode 100644 index 0000000..b1dc0d7 Binary files /dev/null and b/_freeze/chapters/11-estimands/figure-html/fig-sd-mirror-hist-atm-1.png differ diff --git a/_freeze/chapters/11-estimands/figure-html/fig-sd-mirror-hist-ato-1.png b/_freeze/chapters/11-estimands/figure-html/fig-sd-mirror-hist-ato-1.png new file mode 100644 index 0000000..01ffcfb Binary files /dev/null and b/_freeze/chapters/11-estimands/figure-html/fig-sd-mirror-hist-ato-1.png differ diff --git a/_freeze/chapters/11-estimands/figure-html/fig-sd-mirror-hist-atu-1.png b/_freeze/chapters/11-estimands/figure-html/fig-sd-mirror-hist-atu-1.png new file mode 100644 index 0000000..fc98cce Binary files /dev/null and b/_freeze/chapters/11-estimands/figure-html/fig-sd-mirror-hist-atu-1.png differ diff --git a/chapters/11-estimands.qmd b/chapters/11-estimands.qmd index ddce50a..8758675 100644 --- a/chapters/11-estimands.qmd +++ b/chapters/11-estimands.qmd @@ -360,10 +360,13 @@ The `{survey}` package supports calculating statistics and models using weights. Historically, many researchers applied techniques incorporating weights to survey analyses. Even though we are not doing a survey analysis, the same techniques are helpful for our propensity score weights. + + ```{r} #| label: tbl-weighted-gtsummary #| tbl-cap: A descriptive table of Extra Magic Morning hours weighted by the Average Treatment Effect Weights. This table shows the distributions of these variables in the psuedopopulation created by these weights. #| message: false +#| warning: false library(survey) seven_dwarfs_svy <- svydesign( ids = ~1, @@ -451,6 +454,7 @@ Let's take a look at the weighted table to see this a bit more clearly. #| label: tbl-weighted-att #| tbl-cap: A descriptive table of Extra Magic Morning hours weighted by the Average Treatment Effect among the Treated Weights. This table shows the distributions of these variables in the psuedopopulation created by these weights. #| message: false +#| warning: false seven_dwarfs_svy <- svydesign( ids = ~1, data = seven_dwarfs_wts, @@ -509,233 +513,337 @@ The ATU weight is estimated by: $$w_{ATU} = \frac{X(1-p)}{p}+ (1-X)$$ Let's add the ATU weights to the `seven_dwarfs_wts` data frame and look at their distribution. -When fitting an outcome model on matched data sets, we can simply subset the original data to only those who were matched and then fit a model on these data as we would otherwise. For example, re-performing the matching as we did in @sec-using-ps, we can extract the matched observations in a dataset called `matched_data` as follows. - ```{r} -#| message: false -#| warning: false -library(broom) -library(touringplans) -library(MatchIt) - -seven_dwarfs_9 <- seven_dwarfs_train_2018 |> - filter(wait_hour == 9) - -m <- matchit( - park_extra_magic_morning ~ park_ticket_season + park_close + park_temperature_high, - data = seven_dwarfs_9 -) -matched_data <- get_matches(m) -``` - -We can then fit the outcome model on this data. For this analysis, we are interested in the impact of extra magic morning hours on the average posted wait time between 9 and 10am. The linear model below will estimate this in the matched cohort. +#| label: fig-sd-atu-hist +#| fig.cap : > +#| A histogram of the average treatment effect among the unexposed (ATU) weights. The range of the ATU weights in this example is very similar to the ATE weights. +seven_dwarfs_wts <- seven_dwarfs_wts |> + mutate(w_atu = wt_atu(.fitted, park_extra_magic_morning)) -```{r} -lm(wait_minutes_posted_avg ~ park_extra_magic_morning, data = matched_data) |> - tidy(conf.int = TRUE) +ggplot(seven_dwarfs_wts, aes(w_atu)) + + geom_histogram(bins = 50) ``` -Recall that by default `{MatchIt}` estimates an average treatment effect among the treated. This means among days that have extra magic hours, the expected impact of having extra magic hours on the average posted wait time between 9 and 10am is 7.9 minutes (95% CI: 1.2-14.5). - -## Using weights in outcome models +The distribution of the weights in @fig-sd-atu-hist weights looks very similar to what we saw with the ATE weights---several around 1, with a long tail. +These weights will be 1 for all observations in the unexposed group; they can range from 0 to infinity for the exposed group. +Because we have more observations in our unexposed group, our exposed observations are upweighted to match their distribution. -Now let's use propensity score weights to estimate this same estimand. We will use the ATT weights so the analysis matches that for matching above. +Now let's take a look at the weighted table. ```{r} -#| message: false +#| label: tbl-weighted-atu +#| tbl-cap: A descriptive table of Extra Magic Morning hours weighted by the Average Treatment Effect among the Unexposed Weights. This table shows the distributions of these variables in the psuedopopulation created by these weights. #| warning: false -library(propensity) -seven_dwarfs_9_with_ps <- - glm( - park_extra_magic_morning ~ park_ticket_season + park_close + park_temperature_high, - data = seven_dwarfs_9, - family = binomial() - ) |> - augment(type.predict = "response", data = seven_dwarfs_9) -seven_dwarfs_9_with_wt <- seven_dwarfs_9_with_ps |> - mutate(w_att = wt_att(.fitted, park_extra_magic_morning)) +seven_dwarfs_svy <- svydesign( + ids = ~1, + data = seven_dwarfs_wts, + weights = ~w_atu +) +tbl_svysummary( + seven_dwarfs_svy, + by = park_extra_magic_morning, + include = c(park_ticket_season, park_close, park_temperature_high) +) |> + add_overall(last = TRUE) ``` -We can fit a *weighted* outcome model by using the `weights` argument. +Now the exposed column of @tbl-weighted-atu is weighted to match the unexposed column of the unweighted table. + +We can again create a mirrored histogram to observe the weighted psuedopopulation. ```{r} -lm( - wait_minutes_posted_avg ~ park_extra_magic_morning, - data = seven_dwarfs_9_with_wt, - weights = w_att -) |> - tidy() +#| label: fig-sd-mirror-hist-atu +#| fig.cap: > +#| A mirrored histogram of the distribution of propensity scores between exposure groups. The dark bars represent the unweighted distribution, and the colored bars represent the distribution weighted by the average treatment effect among the unexposed (ATU) weight. +ggplot(seven_dwarfs_wts, aes(.fitted, group = park_extra_magic_morning)) + + geom_mirror_histogram(bins = 50) + + geom_mirror_histogram( + aes(fill = park_extra_magic_morning, weight = w_atu), + bins = 50, + alpha = 0.5 + ) + + scale_y_continuous(labels = abs) + + labs( + x = "propensity score", + fill = "Extra Magic Morning" + ) ``` -Using weighting, we estimate that among days that have extra magic hours, the expected impact of having extra magic hours on the average posted wait time between 9 and 10am is 6.2 minutes. -While this approach will get us the desired estimate for the point estimate, the default output using the `lm` function for the uncertainty (the standard errors and confidence intervals) are not correct. +The weights for the unexposed days are all 1, so the distribution of the propensity scores for this group exactly overlaps the weighted pseudopopulation in orange. +The blue bars indicate that the exposed population is largely upweighted to match the unexposed distribution. +### Average treatment effect among the evenly matchable -## Estimating uncertainty +The target population to estimate the average treatment effect among the evenly matchable (ATM) is the evenly matchable. +This causal estimand conditions on those deemed "evenly matchable" by some distance metric. -There are three ways to estimate the uncertainty: +$$E[Y(1) - Y(0) | M_d = 1]$$ -1. A bootstrap -2. A sandwich estimator that only takes into account the outcome model -3. A sandwich estimator that takes into account the propensity score model +Here $d$ denotes a distance measure, and $M_d=1$ indicates that a unit is evenly matchable under that distance measure.[@samuels2017aspects; @d2018improving] Example research questions where the ATM is of interest could include "Should those at clinical equipoise receive the exposure?" [@greifer2021choosing] -The first option can be computationally intensive, but should get you the correct estimates. -The second option is computationally the easiest, but tends to overestimate the variability. There are not many current solutions in R (aside from coding it up yourself) for the third; however, the `{PSW}` package will do this. +The ATM is a common target estimand when using matching. +Here, some exposed observations are "matched" to control observations, however some observations in both groups may be discarded. +This is often done via a "caliper", where observations are only matched if they fall within a pre-specified caliper distance of each other. +Alternatively, the ATM can be estimated via weighting. +The ATM weight is estimated by: -### The bootstrap +$$w_{ATM} = \frac{\min \{p, 1-p\}}{Xp + (1-X)(1-p)}$$ -1. Create a function to run your analysis once on a sample of your data +Let's add the ATM weights to the `seven_dwarfs_wts` data frame and look at their distribution. ```{r} -fit_ipw <- function(split, ...) { - .df <- analysis(split) - - # fit propensity score model - propensity_model <- glm( - park_extra_magic_morning ~ park_ticket_season + park_close + park_temperature_high, - data = seven_dwarfs_9, - family = binomial() - ) +#| label: fig-sd-atm-hist +#| fig.cap: > +#| A histogram of the average treatment effect among the evenly matchable (ATM) weights for whether or not a day had extra magic hours. ATM weights can range from 0 to 1, so they are always stable. +seven_dwarfs_wts <- seven_dwarfs_wts |> + mutate(w_atm = wt_atm(.fitted, park_extra_magic_morning)) - # calculate inverse probability weights - .df <- propensity_model |> - augment(type.predict = "response", data = .df) |> - mutate(wts = wt_att( - .fitted, - park_extra_magic_morning, - exposure_type = "binary" - )) - - # fit correctly bootstrapped ipw model - lm( - wait_minutes_posted_avg ~ park_extra_magic_morning, - data = .df, - weights = wts - ) |> - tidy() -} +ggplot(seven_dwarfs_wts, aes(w_atm)) + + geom_histogram(bins = 50) ``` +The distribution of the weights in @fig-sd-atm-hist looks very similar to what we saw with the ATT weights. +That is because, in this particular sample, there are many more unexposed observations compared to exposed ones. +These weights have the nice property that they range from 0 to 1, making them *always* stable, regardless of the imbalance between exposure groups. -2. Use {rsample} to bootstrap our causal effect +Now let's take a look at the weighted table. ```{r} -#| message: false +#| label: tbl-weighted-atm +#| tbl-cap: A descriptive table of Extra Magic Morning hours weighted by the Average Treatment Effect among the Evenly Matchable Weights. This table shows the distributions of these variables in the psuedopopulation created by these weights. #| warning: false -library(rsample) -# fit ipw model to bootstrapped samples -bootstrapped_seven_dwarfs <- bootstraps( - seven_dwarfs_9, - times = 1000, - apparent = TRUE +seven_dwarfs_svy <- svydesign( + ids = ~1, + data = seven_dwarfs_wts, + weights = ~w_atm ) - -ipw_results <- bootstrapped_seven_dwarfs |> - mutate(boot_fits = map(splits, fit_ipw)) - -ipw_results +tbl_svysummary( + seven_dwarfs_svy, + by = park_extra_magic_morning, + include = c(park_ticket_season, park_close, park_temperature_high) +) |> + add_overall(last = TRUE) ``` +In this particular sample, the ATM weights resemble the ATT weights, so this table looks similar to the ATT weighted table. +This similarity is not guaranteed, and the ATM pseudopopulation can look different from the overall, exposed, and unexposed unweighted populations. +It's thus essential to examine the weighted table when using ATM weights to understand what population inference will ultimately be drawn on. -Let's look at the results. +We can again create a mirrored histogram to observe the ATM weighted psuedopopulation. ```{r} -ipw_results |> - mutate( - estimate = map_dbl( - boot_fits, - \(.fit) .fit |> - filter(term == "park_extra_magic_morning") |> - pull(estimate) - ) - ) |> - ggplot(aes(estimate)) + - geom_histogram(bins = 30, fill = "#D55E00FF", color = "white", alpha = 0.8) + - theme_minimal() +#| label: fig-sd-mirror-hist-atm +#| fig.cap: > +#| A mirrored histogram of the distribution of propensity scores between exposure groups. The dark bars represent the unweighted distribution, and the colored bars represent the distribution weighted by the average treatment effect among the evenly matchable (ATM) weight. +ggplot(seven_dwarfs_wts, aes(.fitted, group = park_extra_magic_morning)) + + geom_mirror_histogram(bins = 50) + + geom_mirror_histogram( + aes(fill = park_extra_magic_morning, weight = w_atm), + bins = 50, + alpha = 0.5 + ) + + scale_y_continuous(labels = abs) + + labs( + x = "propensity score", + fill = "Extra Magic Morning" + ) ``` +Again, the ATM weights are bounded by 0 and 1, meaning that all observations will be *down weighted*. +This removes the potential for finite sample bias due to large weights in small samples and has improved variance properties. + +### Average treatment effect among the overlap population -3. Pull out the causal effect +The target population to estimate the average treatment effect among the overlap population (ATO) is the overlap -- this is very similar to the "even matchable" from above. +Example research questions where the ATO is of interest are the same as those for the ATM, such as "Should those at clinical equipoise receive the exposure?" [@greifer2021choosing] + +These weights, again, are pretty similar to the ATM weights but are slightly attenuated, yielding improved variance properties. + +The ATO weight is estimated by: + +$$w_{ATO} = X(1-p) + (1-X)p$$ + +Let's add the ATO weights to the `seven_dwarfs_wts` data frame and look at their distribution. ```{r} -# get t-based CIs -boot_estimate <- int_t(ipw_results, boot_fits) |> - filter(term == "park_extra_magic_morning") -boot_estimate +#| label: fig-sd-ato-hist +#| fig.cap: > +#| A histogram of the average treatment effect among the overlap population (ATO) weights for whether or not a day had extra magic hours. Like ATM weights, ATO weights can range from 0 to 1, so they are always stable. ATO weights also have improved variance properties compared to the ATM weights. +seven_dwarfs_wts <- seven_dwarfs_wts |> + mutate(w_ato = wt_ato(.fitted, park_extra_magic_morning)) + +ggplot(seven_dwarfs_wts, aes(w_ato)) + + geom_histogram(bins = 50) ``` -We estimate that among days that have extra magic hours, the expected impact of having extra magic hours on the average posted wait time between 9 and 10am is `r round(boot_estimate$.estimate, 1)` minutes, 95% CI (`r round(boot_estimate$.lower, 1)`, `r round(boot_estimate$.upper, 1)`). +Like the ATM weights, the ATO weights are bounded by 0 and 1, making them more stable than the ATE, ATT, and ATU, regardless of the exposed and unexposed imbalance. -### The outcome model sandwich +::: {.callout-warning} +## Finite sample bias - revisited -There are two ways to get the sandwich estimator. The first is to use the same weighted outcome model as above along with the `{sandwich}` package. Using the `sandwich` function, we can get the robust estimate for the variance for the parameter of interest, as shown below. +Let's revisit our finite sample bias simulation, adding in the ATO weights to examine whether they are impacted by this potential for bias. ```{r} -#| message: false +#| label: sim-finite-sample-bias-2 #| warning: false -library(sandwich) -weighted_mod <- lm( - wait_minutes_posted_avg ~ park_extra_magic_morning, - data = seven_dwarfs_9_with_wt, - weights = w_att +#| message: false +#| cache: true +sim <- function(n) { + ## create a simulated dataset + finite_sample <- tibble( + z = rnorm(n), + x = case_when( + 0.5 + z + rnorm(n) > 0 ~ 1, + TRUE ~ 0 + ), + y = z + rnorm(n) + ) + finite_sample_wts <- glm( + x ~ z, + data = finite_sample, + family = binomial("probit") + ) |> + augment(newdata = finite_sample, type.predict = "response") |> + mutate( + wts_ate = wt_ate(.fitted, x), + wts_ato = wt_ato(.fitted, x) + ) + bias <- finite_sample_wts |> + summarize( + effect_ate = sum(y * x * wts_ate) / sum(x * wts_ate) - + sum(y * (1 - x) * wts_ate) / sum((1 - x) * wts_ate), + effect_ato = sum(y * x * wts_ato) / sum(x * wts_ato) - + sum(y * (1 - x) * wts_ato) / sum((1 - x) * wts_ato) + ) + tibble( + n = n, + bias = c(bias$effect_ate, bias$effect_ato), + weight = c("ATE", "ATO") + ) +} + +## Examine 5 different sample sizes, simulate each 1000 times +set.seed(1) +finite_sample_sims <- map_df( + rep( + c(50, 100, 500, 1000, 5000, 10000), + each = 1000 + ), + sim ) -sandwich(weighted_mod) +bias <- finite_sample_sims %>% + group_by(n, weight) %>% + summarise(bias = mean(bias)) ``` -Here, our robust variance estimate is `r round(sandwich(weighted_mod)[2,2], 3)`. We can then use this to construct a robust confidence interval. +Let's plot our results: ```{r} -robust_var <- sandwich(weighted_mod)[2, 2] -point_est <- coef(weighted_mod)[2] -lb <- point_est - 1.96 * sqrt(robust_var) -ub <- point_est + 1.96 * sqrt(robust_var) -lb -ub +#| label: fig-finite-sample-bias-2 +#| fig-cap: "Finite sample bias with ATE weights (orange) and ATO weights (blue) created using correctly specified propensity score model, varying the sample size from n = 50 to n = 10,000" +ggplot(bias, aes(x = n, y = bias, color = weight)) + + geom_point() + + geom_line() + + geom_hline(yintercept = 0, lty = 2) ``` -We estimate that among days that have extra magic hours, the expected impact of having extra magic hours on the average posted wait time between 9 and 10am is `r round(point_est, 1)` minutes, 95% CI (`r round(lb, 1)`, `r round(ub, 1)`). -Alternatively, we could fit the model using the `{survey}` package. To do this, we need to create a design object, like we did when fitting weighted tables. +Notice here the estimate using the ATO weights is almost immediately unbiased, even in fairly small samples. +::: + +Now let's take a look at the weighted table. ```{r} -#| message: false +#| label: tbl-weighted-ato +#| tbl-cap: A descriptive table of Extra Magic Morning hours weighted by the Average Treatment Effect among the Overlap Population Weights. This table shows the distributions of these variables in the psuedopopulation created by these weights. #| warning: false -library(survey) -des <- svydesign( +seven_dwarfs_svy <- svydesign( ids = ~1, - weights = ~w_att, - data = seven_dwarfs_9_with_wt + data = seven_dwarfs_wts, + weights = ~w_ato ) +tbl_svysummary( + seven_dwarfs_svy, + by = park_extra_magic_morning, + include = c(park_ticket_season, park_close, park_temperature_high) +) |> + add_overall(last = TRUE) ``` -Then we can use `svyglm` to fit the outcome model. - -```{r} -svyglm(wait_minutes_posted_avg ~ park_extra_magic_morning, des) |> - tidy(conf.int = TRUE) -``` -### Sandwich estimator that takes into account the propensity score model +Like the ATM weights, the ATO pseudopopulation may not resemble the overall, exposed, or unexposed unweighted populations, so it is essential to examine these weighted tables to understand the population on whom we will draw inferences. -The correct sandwich estimator will also take into account the propensity score model. The `{PSW}` will allow us to do this. This package has some quirks, for example it doesn't work well with categorical variables, so we need to create dummy variables for `park_ticket_season` to pass into the model. *Actually, the code below isn't working because it seems there is a bug in the package. Stay tuned!* +We can again create a mirrored histogram to observe the ATO weighted psuedopopulation. ```{r} -#| eval: false -library(PSW) - -seven_dwarfs_9 <- seven_dwarfs_9 |> - mutate( - park_ticket_season_regular = ifelse(park_ticket_season == "regular", 1, 0), - park_ticket_season_value = ifelse(park_ticket_season == "value", 1, 0) +#| label: fig-sd-mirror-hist-ato +#| fig.cap: > +#| A mirrored histogram of the distribution of propensity scores between exposure groups. The dark bars represent the unweighted distribution, and the colored bars represent the distribution weighted by the average treatment effect among the overlap population (ATO) weight. +ggplot(seven_dwarfs_wts, aes(.fitted, group = park_extra_magic_morning)) + + geom_mirror_histogram(bins = 50) + + geom_mirror_histogram( + aes(fill = park_extra_magic_morning, weight = w_ato), + bins = 50, + alpha = 0.5 + ) + + scale_y_continuous(labels = abs) + + labs( + x = "propensity score", + fill = "Extra Magic Morning" ) -psw( - data = seven_dwarfs_9, - form.ps = "park_extra_magic_morning ~ park_ticket_season_regular + park_ticket_season_value + park_close + park_temperature_high", - weight = "ATT", - wt = TRUE, - out.var = "wait_minutes_posted_avg" -) ``` +@fig-sd-mirror-hist-ato looks similar to the ATM pseudopopulation, with slight attenuation, as evidenced by the increased down weighting in the exposed population. +We prefer using the ATO weights when the overlap (or evenly matchable) population is the target, as they have improved variance properties. +Another benefit is that any variables included in the propensity score model (if fit via logistic regression) will be perfectly balanced on the mean. +We will discuss this further in @sec-eval-ps-model. + +## Choosing estimands + +Several factors need to be considered when choosing an estimand for a given analysis. +First an foremost, the research question at hand will drive this decision. +For example, returing to the Touring Plans data, suppose Disney was interested in assessing whether they should add additional extra magic hours on days that don't currently have them---here, the ATU will be the estimand of interest. +Other considerations include the available sample size and the distribution of the exposed and unexposed observations (i.e. might finite sample bias be an issue?). +Below is a table summarizing the estimands and methods for estimating them (including R functions and example code), along with sample questions extended from Table 2 in @greifer2021choosing. + ++----------+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+------------------------+ +| Estimand | Target population | Example Research Question | Matching Method | Weighting Method | ++==========+==================================+============================================================================================================================================================+===============================================+========================+ +| ATE | Full population | Should we decide whether to have extra magic hours *all* mornings to change the wait time for Seven Dwarfs Mine Train between 5-6pm? | Full matching | ATE Weights | +| | | | | | +| | | Should a specific policy be applied to all eligible observations? | Fine stratification | `propensity::wt_ate()` | +| | | | | | +| | | | `MatchIt::matchit(…, estimand = "ATE")` | | | ++----------+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+------------------------+ +| ATT | Exposed (treated) observations | Should we stop extra magic hours to change the wait time for Seven Dwarfs Mine Train between 5-6pm? | Pair matching without a caliper | ATT weights | +| | | | | | +| | | Should we stop our marketing campaign to those currently receiving it? | Full matching | `propensity::wt_att()` | +| | | | | | +| | | Should medical providers stop recommending treatment for those currently receiving it? | Fine stratification | | +| | | | | | +| | | | `MatchIt::matchit(…, estimand = "ATT")` | | ++----------+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+------------------------+ +| ATU | Unexposed (control) observations | Should we add extra magic hours for all days to change the wait time for Seven Dwarfs Mine Train between 5-6pm? | Pair matching without a caliper | ATU weights | +| | | | | | +| | | Should we extend our marketing campaign to those not receiving it? | Full matching | `propensity::wt_atu()` | +| | | | | | +| | | Should medical providers extend treatment to those not currently receiving it? | Fine stratification | | +| | | | | | +| | | | `MatchIt::matchit(…, estimand = "ATC")` | | ++----------+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+------------------------+ +| ATM | Evenly matchable | Are there some days we should change whether we are offering extra magic hours in order to change the wait time for Seven Dwarfs Mine Train between 5-6pm? | Caliper matching | ATM weights | +| | | | | | +| | | Is there an effect of the exposure for some observations? | `MatchIt::matchit(…, caliper = 0.1)` | `propensity::wt_atm()` | +| | | | | | +| | | Should those at clinical equipoise receive treatment? | Coarsened exact matching | | +| | | | | | +| | | | Cardinality matching | | +| | | | | | +| | | | `MatchIt::matchit(…, method = "cardinality")` | | ++----------+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+------------------------+ +| ATO | Overlap population | Same as ATM | | ATO weights | +| | | | | | +| | | | | `propensity::wt_ato()` | ++----------+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+------------------------+ diff --git a/chapters/11-estimands_cache/html/__packages b/chapters/11-estimands_cache/html/__packages index b5a0de5..a9a273e 100644 --- a/chapters/11-estimands_cache/html/__packages +++ b/chapters/11-estimands_cache/html/__packages @@ -1,10 +1,3 @@ -base -methods -datasets -utils -grDevices -graphics -stats ggplot2 tidyverse tibble @@ -20,3 +13,7 @@ touringplans gtsummary labelled propensity +Matrix +survival +survey +halfmoon diff --git a/chapters/11-estimands_cache/html/sim-finite-sample-bias-2_879b7e35a694946c269801944ab8948b.RData b/chapters/11-estimands_cache/html/sim-finite-sample-bias-2_879b7e35a694946c269801944ab8948b.RData new file mode 100644 index 0000000..242bc02 Binary files /dev/null and b/chapters/11-estimands_cache/html/sim-finite-sample-bias-2_879b7e35a694946c269801944ab8948b.RData differ diff --git a/chapters/11-estimands_cache/html/sim-finite-sample-bias-2_879b7e35a694946c269801944ab8948b.rdb b/chapters/11-estimands_cache/html/sim-finite-sample-bias-2_879b7e35a694946c269801944ab8948b.rdb new file mode 100644 index 0000000..2c722ea Binary files /dev/null and b/chapters/11-estimands_cache/html/sim-finite-sample-bias-2_879b7e35a694946c269801944ab8948b.rdb differ diff --git a/chapters/11-estimands_cache/html/sim-finite-sample-bias-2_879b7e35a694946c269801944ab8948b.rdx b/chapters/11-estimands_cache/html/sim-finite-sample-bias-2_879b7e35a694946c269801944ab8948b.rdx new file mode 100644 index 0000000..057fe3d Binary files /dev/null and b/chapters/11-estimands_cache/html/sim-finite-sample-bias-2_879b7e35a694946c269801944ab8948b.rdx differ diff --git a/chapters/11-estimands_files/figure-html/fig-finite-sample-bias-2-1.png b/chapters/11-estimands_files/figure-html/fig-finite-sample-bias-2-1.png new file mode 100644 index 0000000..6abc919 Binary files /dev/null and b/chapters/11-estimands_files/figure-html/fig-finite-sample-bias-2-1.png differ diff --git a/chapters/11-estimands_files/figure-html/fig-sd-atm-hist-1.png b/chapters/11-estimands_files/figure-html/fig-sd-atm-hist-1.png new file mode 100644 index 0000000..452b494 Binary files /dev/null and b/chapters/11-estimands_files/figure-html/fig-sd-atm-hist-1.png differ diff --git a/chapters/11-estimands_files/figure-html/fig-sd-ato-hist-1.png b/chapters/11-estimands_files/figure-html/fig-sd-ato-hist-1.png new file mode 100644 index 0000000..a6ec8f1 Binary files /dev/null and b/chapters/11-estimands_files/figure-html/fig-sd-ato-hist-1.png differ diff --git a/chapters/11-estimands_files/figure-html/fig-sd-atu-hist-1.png b/chapters/11-estimands_files/figure-html/fig-sd-atu-hist-1.png new file mode 100644 index 0000000..679c4ec Binary files /dev/null and b/chapters/11-estimands_files/figure-html/fig-sd-atu-hist-1.png differ diff --git a/chapters/11-estimands_files/figure-html/fig-sd-mirror-hist-atm-1.png b/chapters/11-estimands_files/figure-html/fig-sd-mirror-hist-atm-1.png new file mode 100644 index 0000000..b1dc0d7 Binary files /dev/null and b/chapters/11-estimands_files/figure-html/fig-sd-mirror-hist-atm-1.png differ diff --git a/chapters/11-estimands_files/figure-html/fig-sd-mirror-hist-ato-1.png b/chapters/11-estimands_files/figure-html/fig-sd-mirror-hist-ato-1.png new file mode 100644 index 0000000..01ffcfb Binary files /dev/null and b/chapters/11-estimands_files/figure-html/fig-sd-mirror-hist-ato-1.png differ diff --git a/chapters/11-estimands_files/figure-html/fig-sd-mirror-hist-atu-1.png b/chapters/11-estimands_files/figure-html/fig-sd-mirror-hist-atu-1.png new file mode 100644 index 0000000..fc98cce Binary files /dev/null and b/chapters/11-estimands_files/figure-html/fig-sd-mirror-hist-atu-1.png differ