class: center, middle, inverse, title-slide .title[ # 1: Potential Outcomes and Experiments ] .subtitle[ ## Quantitative Causal Inference ] .author[ ###
J. Seawright
] .institute[ ###
Northwestern Political Science
] .date[ ### April 3, 2025 ] --- class: center, middle <style type="text/css"> pre { max-height: 400px; overflow-y: auto; } pre[class] { max-height: 200px; } </style> --- ### Business - Presenting an estimator - Weekly lab assignments - Final grant proposal --- ### What makes an experiment work? --- ### Potential Outcomes - Let's build a mathematical model of one person participating in an experiment. --- ### Potential Outcomes - Suppose we name that person `\(i\)`. - The person we're interested in has an outcome of 1 when in the control group, and 0 when in the treatment group. --- ### Potential Outcomes - `\(y_{i,c} = 0\)` - `\(y_{i,t} = 1\)` --- ### The Potential Outcomes Framework - We are interested in the effects of a dichotomous treatment (i.e., independent variable): whether person got the treatment (t) or the control (c). - This variable can be written as `\(W_{i} = (t,c)\)`. --- ### The Potential Outcomes Framework - For a given case, `\(i\)`, we either observe `\(W_{i} = t\)` or `\(W_{i} = c\)`. If `\(W_{i} = t\)`, let us denote the value of the dependent variable as `\(y_{i,t}\)`. If `\(W_{i} = c\)`, let us denote the value of the dependent variable as `\(y_{i,c}\)` --- ### The Potential Outcomes Framework - The causal effect of `\(W\)` on `\(y\)` is: - `\(y_{i,t} - y_{i,c}\)` For hypothetical person `\(i\)` above, this effect is `\(1 - 0 = 1\)`. --- ### The Average Treatment Effect - Sometimes, we are interested in developing an estimate of the effect of `\(W\)` on `\(y\)` in some population `\(\Pi\)`, from which we have a random sample (or even the whole population) split randomly into treatment and control cases. - Here, interest focuses on the "average treatment effect": - `\(E(y_{i,t}) - E(y_{i,c})\)` --- ### The Average Treatment Effect What helps us estimate the average treatment effect well? --- ### Assignment Mechanisms - `\(\mathbb{W}\)` has a probability distribution. <!-- --> - `\(Pr(\mathbb{W} | \mathbb{X}, \mathbb{Y}_{0}, \mathbb{Y}_{1})\)` `$$p_{i}(\mathbb{X}, \mathbb{Y}_{0}, \mathbb{Y}_{1}) = \sum_{\mathbb{W}: W_{i} = 1} Pr(\mathbb{W} | \mathbb{X}, \mathbb{Y}_{0}, \mathbb{Y}_{1})$$` `\(\mathbb{W}\)`, `\(\mathbb{X}\)`, `\(\mathbb{Y}_{0}\)`, and `\(\mathbb{Y}_{1}\)` take on joint values that are drawn from some describable set of possibilities. --- ### Individualistic Assignment 1. There exists a function `\(q()\)` that is bounded between 0 and 1, such that `\(p_{i}(\mathbb{X}, \mathbb{Y}_{0}, \mathbb{Y}_{1}) = q(X_{i}, Y_{0,i}, Y_{1,i})\)` 2. `\(Pr(\mathbb{W} | \mathbb{X}, \mathbb{Y}_{0}, \mathbb{Y}_{1})\)` is the product of those individual probabilities. --- ### Probabilistic Assignment 1. For all permissible values of `\(\mathbb{X}\)`, `\(\mathbb{Y}_{0}\)`, and `\(\mathbb{Y}_{1}\)`, `\(0 < p_{i}(\mathbb{X}, \mathbb{Y}_{0}, \mathbb{Y}_{1}) < 1\)`. --- ### Unconfounded Assignment 1. `\(Pr(\mathbb{W} | \mathbb{X}, \mathbb{Y}_{0}, \mathbb{Y}_{1}) = Pr(\mathbb{W} | \mathbb{X}, \mathbb{Y}^{'}_{0}, \mathbb{Y}^{'}_{1})\)` --- ### Experiments and Causal Inference - Under individualistic, probabilistic, and unconfounded assignment, the set of cases where `\(W_{i} = t\)` produces a random sample from the population of `\(y_{t}\)`. Likewise, the set of cases where `\(W_{i} = c\)` produces a random sample from the population of `\(y_{c}\)`. Thus: - `\(E_{t}(y_{i,t}) = E(y_{i,t})\)` - `\(E_{c}(y_{i,c}) = E(y_{i,c})\)` - `\(E(y_{i,t}) - E(y_{i,c}) = E_{t}(y_{i,t}) - E_{c}(y_{i,c})\)` --- ``` r mean(peruemotions$outsidervote[peruemotions$simpletreat==1]) ``` ``` ## [1] 0.6092715 ``` ``` r mean(peruemotions$outsidervote[peruemotions$simpletreat==0]) ``` ``` ## [1] 0.4916388 ``` ``` r mean(peruemotions$outsidervote[peruemotions$simpletreat==1]) - mean(peruemotions$outsidervote[peruemotions$simpletreat==0]) ``` ``` ## [1] 0.1176327 ``` --- ``` r summ(lm(outsidervote ~ simpletreat, data=peruemotions)) ``` <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Observations </td> <td style="text-align:right;"> 450 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Dependent variable </td> <td style="text-align:right;"> outsidervote </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Type </td> <td style="text-align:right;"> OLS linear regression </td> </tr> </tbody> </table> <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> F(1,448) </td> <td style="text-align:right;"> 5.62 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> R² </td> <td style="text-align:right;"> 0.01 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Adj. R² </td> <td style="text-align:right;"> 0.01 </td> </tr> </tbody> </table> <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;border-bottom: 0;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> S.E. </th> <th style="text-align:right;"> t val. </th> <th style="text-align:right;"> p </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> (Intercept) </td> <td style="text-align:right;"> 0.49 </td> <td style="text-align:right;"> 0.03 </td> <td style="text-align:right;"> 17.10 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> simpletreat </td> <td style="text-align:right;"> 0.12 </td> <td style="text-align:right;"> 0.05 </td> <td style="text-align:right;"> 2.37 </td> <td style="text-align:right;"> 0.02 </td> </tr> </tbody> <tfoot><tr><td style="padding: 0; " colspan="100%"> <sup></sup> Standard errors: OLS</td></tr></tfoot> </table> --- ### Assumptions What do we need to assume in order to get a causal inference here? --- ### Randomization Inference - It is clearly important to test the null hypothesis that `\(E(y_{i,t}) = E(y_{i,c})\)` for all `\(i\)`. - If this hypothesis is true, then every case's treatment assignment is unrelated to its outcome. - Thus, under the null hypothesis, it is fine for us to reassign treatment at random --- the outcome won't change. --- ### Randomization Inference - Randomization inference involves: 1. Randomly reordering the treatment condition vector 2. Calculating the difference between the treatment and control group for the new (artificial) treatment condition vector 3. Storing the result somewhere 4. Repeating the whole process hundreds or thousands of times. --- ### Randomization Inference - If the null hypothesis is true, then the distribution of simulated differences in means is the sampling distribution from which the real difference in means was drawn. - Therefore, a good `\(P\)` value for our observed difference in means is the proportion of simulated differences in means that are at least as far from 0 as the real number. --- ### Assumptions When analyzing an experiment using randomization inference, we do *not* need to assume that: - we know and can measure all (or even any!) of the confounding variables for the relationship of interest. - causal effects are additive or linear. - causal effects are constant across cases. - errors are normally distributed or heteroskedastic. --- ### Assumptions When analyzing an experiment using randomization inference, we *do* need to assume that: - SUTVA (stable unit treatment value assumption) holds. - Experimental/psychological realism holds. --- ``` r library(ri2) ``` ``` ## Loading required package: randomizr ``` ``` ## Loading required package: estimatr ``` ``` r emotions_declaration <- declare_ra(N = 450, m = 151) emotions_table <- data_frame(Z = peruemotions$simpletreat, Y = peruemotions$outsidervote) ``` ``` ## Warning: `data_frame()` was deprecated in tibble 1.1.0. ## ℹ Please use `tibble()` instead. ## This warning is displayed once every 8 hours. ## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was ## generated. ``` --- ``` r ri2_emotionsresult <- conduct_ri( formula = Y ~ Z, declaration = emotions_declaration, sharp_hypothesis = 0, data = emotions_table ) ``` --- ``` r ri2_emotionsresult ``` ``` ## term estimate two_tailed_p_value ## 1 Z 0.1176327 0.016 ``` --- ### Balance Testing --- ``` r library(cobalt) ``` ``` ## cobalt (Version 4.5.5, Build Date: 2024-04-02) ``` ``` r peruemotionscovs <- subset(peruemotions, select = c(Cuzco, age)) bal.tab(peruemotionscovs, treat=peruemotions$simpletreat, thresholds = c(m = .1, v = 2)) ``` ``` ## Warning: Missing values exist in the covariates. Displayed values omit these ## observations. ``` ``` ## Note: `s.d.denom` not specified; assuming "pooled". ``` ``` ## Balance Measures ## Type Diff.Un M.Threshold.Un V.Ratio.Un V.Threshold.Un ## Cuzco Binary 0.0598 Balanced, <0.1 . ## age Contin. 0.0476 Balanced, <0.1 1.0738 Balanced, <2 ## age:<NA> Binary 0.0097 Balanced, <0.1 . ## ## Balance tally for mean differences ## count ## Balanced, <0.1 3 ## Not Balanced, >0.1 0 ## ## Variable with the greatest mean difference ## Variable Diff.Un M.Threshold.Un ## Cuzco 0.0598 Balanced, <0.1 ## ## Balance tally for variance ratios ## count ## Balanced, <2 1 ## Not Balanced, >2 0 ## ## Variable with the greatest variance ratio ## Variable V.Ratio.Un V.Threshold.Un ## age 1.0738 Balanced, <2 ## ## Sample sizes ## Control Treated ## All 299 151 ``` --- ### Encouragement Designs --- ### Regression in Experiments Recall that `\(W_{i}\)` as an indicator of treatment assignment in an experiment. A bivariate regression predicting the dependent variable based on `\(W\)` is algebraically the same as the difference in means we've looked at above, so it is necessarily an okay causal inference. --- ### Regression in Experiments Multivariate regression for experiments is *not* guaranteed to be unbiased in the way that bivariate regression is. Multivariate regression can be (even very badly) biased if: 1. Some control variable included in the model is caused by the treatment, or 2. The causal effect of interest is very heterogeneous across cases and the total number of cases is small --- ``` r summary(lm(outsidervote ~ simpletreat, data=peruemotions)) ``` ``` ## ## Call: ## lm(formula = outsidervote ~ simpletreat, data = peruemotions) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.6093 -0.4916 0.3907 0.5084 0.5084 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.49164 0.02874 17.104 <2e-16 *** ## simpletreat 0.11763 0.04962 2.371 0.0182 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.497 on 448 degrees of freedom ## Multiple R-squared: 0.01239, Adjusted R-squared: 0.01018 ## F-statistic: 5.62 on 1 and 448 DF, p-value: 0.01818 ``` --- ``` r summary(lm(outsidervote ~ simpletreat + risk, data=peruemotions)) ``` ``` ## ## Call: ## lm(formula = outsidervote ~ simpletreat + risk, data = peruemotions) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.04217 -0.34282 0.07292 0.34842 1.03717 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.0224427 0.0511549 19.987 <2e-16 *** ## simpletreat 0.0197234 0.0456917 0.432 0.666 ## risk -0.0105961 0.0009036 -11.726 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4294 on 402 degrees of freedom ## (45 observations deleted due to missingness) ## Multiple R-squared: 0.2624, Adjusted R-squared: 0.2587 ## F-statistic: 71.49 on 2 and 402 DF, p-value: < 2.2e-16 ``` --- ### Causal Mediation Suppose we want to know the causal steps by which treatment affects `\(Y\)`. Let `\(M\)` be a hypothesized mediator, i.e., a variable caused by treatment that causes `\(Y\)`. Because `\(M\)` is affected by `\(W\)`, `\(M\)` is a sort of dependent variable. Let us denote two potential outcomes: `\(M_{i,t}\)` and `\(M_{i,c}\)`. `\(Y\)` depends on `\(W\)` and `\(M\)`, so there are now four potential outcomes on `\(Y\)`: `\(Y_{i}(t,M_{i,t})\)`, `\(Y_{i}(t,M_{i,c})\)`, `\(Y_{i}(c,M_{i,t})\)`, and `\(Y_{i}(c,M_{i,c})\)` --- ### Causal Mediation The two causal mediation effects for each case are: `$$\begin{aligned} \delta_{i,t} = Y_{i}(t,M_{i,t}) - Y_{i}(t,M_{i,c})\end{aligned}$$` `$$\begin{aligned} \delta_{i,c} = Y_{i}(c,M_{i,t}) - Y_{i}(c,M_{i,c})\end{aligned}$$` --- ### Causal Mediation To make inferences, we need the assumptions: `$$\begin{aligned} Y_{i}(t,M_{i,t}) \perp\!\!\!\perp W_{i} | X_{i} = x\end{aligned}$$` `$$\begin{aligned} M_{i,t} \perp\!\!\!\perp W_{i} | X_{i} = x\end{aligned}$$` `$$\begin{aligned} Y_{i}(t^{'},m) \perp\!\!\!\perp M_{i,t} | (W_{i} = t, X_{i} = x) \end{aligned}$$` --- ``` r library(mediation) ``` ``` ## Loading required package: MASS ``` ``` ## ## Attaching package: 'MASS' ``` ``` ## The following object is masked from 'package:dplyr': ## ## select ``` ``` ## Loading required package: Matrix ``` ``` ## Loading required package: mvtnorm ``` ``` ## ## Attaching package: 'mvtnorm' ``` ``` ## The following object is masked from 'package:jtools': ## ## standardize ``` ``` ## Loading required package: sandwich ``` ``` ## mediation: Causal Mediation Analysis ## Version: 4.5.0 ``` --- ``` r peruemotionsmed <- with(peruemotions, na.omit(data.frame(risk=risk, outsidervote = outsidervote, simpletreat=simpletreat, Cuzco=Cuzco, age=age))) perumed.lm1 <- lm(risk ~ simpletreat, data=peruemotionsmed) perumed.lm2 <- lm(outsidervote ~ risk + simpletreat + Cuzco + age, data=peruemotionsmed) perumed.out <- mediate(perumed.lm1, perumed.lm2, treat="simpletreat", mediator = "risk") ``` --- ``` r summary(perumed.out) ``` ``` ## ## Causal Mediation Analysis ## ## Quasi-Bayesian Confidence Intervals ## ## Estimate 95% CI Lower 95% CI Upper p-value ## ACME 0.0813 0.0299 0.13 0.002 ** ## ADE 0.0172 -0.0761 0.11 0.716 ## Total Effect 0.0986 -0.0116 0.20 0.088 . ## Prop. Mediated 0.7696 -2.9748 4.26 0.090 . ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Sample Size Used: 398 ## ## ## Simulations: 1000 ``` --- ``` r plot(perumed.out) ``` <img src="1experiments_files/figure-html/unnamed-chunk-13-1.png" width="70%" /> --- ``` r perumedsens.out <- medsens(perumed.out, rho.by = 0.1, effect.type = "indirect", sims = 100) ``` --- ``` r plot(perumedsens.out) ``` <img src="1experiments_files/figure-html/unnamed-chunk-15-1.png" width="65%" /> --- ### Heterogeneity --- ``` r summary(lm(outsidervote ~ simpletreat + Cuzco + simpletreat:Cuzco + age + simpletreat:age, data=peruemotions)) ``` ``` ## ## Call: ## lm(formula = outsidervote ~ simpletreat + Cuzco + simpletreat:Cuzco + ## age + simpletreat:age, data = peruemotions) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.6325 -0.5234 0.3705 0.4750 0.5903 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.5045928 0.0875077 5.766 1.55e-08 *** ## simpletreat 0.1957848 0.1525120 1.284 0.1999 ## Cuzco -0.1121455 0.0639902 -1.753 0.0804 . ## age 0.0007842 0.0027246 0.288 0.7736 ## simpletreat:Cuzco 0.0924688 0.1065091 0.868 0.3858 ## simpletreat:age -0.0038673 0.0045507 -0.850 0.3959 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4981 on 432 degrees of freedom ## (12 observations deleted due to missingness) ## Multiple R-squared: 0.01819, Adjusted R-squared: 0.006829 ## F-statistic: 1.601 on 5 and 432 DF, p-value: 0.1585 ``` --- ### Population Inference - Unrepresentative samples - Let `\(Z_{i}\)` be a dichotomous variable that represents whether a case is in the experimental sample --- ### Population Inference Assume: `$$\begin{aligned} f(Y_{i,1} - Y_{i,0} | Z_{i}, X_{i}) = f(Y_{i,1} - Y_{i,0} | X_{i})\end{aligned}$$` --- ### Population Inference Assume, for all possible `\(X^{*}\)`: `$$\begin{aligned} P(Z_{i} = 1 | X_{i} = X^{*}) > 0\end{aligned}$$` --- ### Population Inference O'Muircheartaigh and Hedges propose: 1. Let `\(\mathbf{x}\)` be the collection of all observed combination of values `\(X^{*}\)`. 2. Let `\(T(x)\)` be the sample average of `\(Y_{i,1} - Y_{i,0}\)` across all `\(i\)` such that `\(X_{i} = x\)`. 3. Let `\(p(x)\)` be the proportion of the population with `\(X_{i} = x\)`. --- ### Population Inference O'Muircheartaigh and Hedges propose: 4. `\(PATE \approx \sum_{x \in \mathbf{x}}(p(x) T(x))\)` --- ``` r library(devtools) ``` ``` ## Loading required package: usethis ``` ``` r if (!require("generalize")) devtools::install_github('benjamin-ackerman/generalize') ``` ``` ## Loading required package: generalize ``` ``` r library(generalize) ?assess ``` ``` ## starting httpd help server ... ``` ``` ## done ``` --- ``` r ?generalize ``` --- <img src="images/Bush1.png" width="90%" /> --- <img src="images/Bush2.png" width="100%" /> --- <img src="images/Bush3.png" width="100%" /> --- <img src="images/Cheema1.png" width="100%" /> --- <img src="images/Cheema2.png" width="100%" /> --- <img src="images/Cheema3.png" width="100%" /> --- <img src="images/Cheema4.png" width="100%" />