class: center, middle, inverse, title-slide .title[ # 4: Instrumental Variables ] .subtitle[ ## Quantitative Causal Inference ] .author[ ###
Jaye Seawright
] .institute[ ###
Northwestern Political Science
] .date[ ### April 23 and 28, 2026 ] --- class: center, middle <style type="text/css"> pre { max-height: 400px; overflow-y: auto; } pre[class] { max-height: 200px; } </style> ###Today's Plan 1. Why Instrumental Variables (IV)? 2. The IV estimator and its assumptions 3. Two-stage least squares 4. Diagnostics 5. Heterogeneity and LATE interpretation 6. Applications and limitations --- ### Endogeneity in OLS - `\(E(\mathbf{u} | \mathbf{X}) = 0\)`? - As you'll recall, `\(E(\hat{\mathbf{\beta}}) = \beta + (\mathbf{X}^{T} \mathbf{X})^{-1} E(\mathbf{X}^{T} \mathbf{u})\)`. So, if `\(E(\mathbf{X}^{T} \mathbf{u}) = \mathbf{\nu} \neq 0\)`, then `\(E(\hat{\mathbf{\beta}} - \mathbf{\beta}) = (\mathbf{X}^{T} \mathbf{X})^{-1} \mathbf{\nu} \neq 0\)`. --- ### Consequences of Endogeneity - When `\(\mathbf{X}\)` is endogenous, then our estimates of `\(\hat{\mathbf{\beta}}\)` will be a mixture of the desired relationship between `\(\mathbf{X}\)` and `\(\mathbf{y}\)` *and* the nuisance relationship between `\(\mathbf{X}\)` and `\(\mathbf{u}\)`. --- ### How Can Endogeneity Arise? - Omitted explanatory variables - Measurement error on the right-hand side of the model - Simultaneity between the right- and left-hand sides of the model - etc. --- ### What to Do When Endogeneity Is a Problem? 1. Give up. 2. Try to change the model by including all omitted relevant variables. 3. Find an instrument. 4. Find other data. --- ### Instrumental Variables - Suppose the model is: `\(\mathbf{y} = \mathbf{W} \mathbf{\gamma} + \mathbf{x} \beta + \mathbf{\epsilon}\)`. The `\(\mathbf{W}\)` variables are exogenous, but the `\(\mathbf{x}\)` variable is endogenous. - Now, assume that there exists a variable `\(z\)` that *doesn't* belong in the regression model, with the following two characteristics: - `\(cov(\mathbf{z}^{T} \mathbf{x}) \neq 0\)` - `\(E(\mathbf{z}^{T} \mathbf{\epsilon}) = 0\)` --- ### Instrumental Variables If these conditions are met (doesn't belong in the regression, related linearly with `\(\mathbf{x}\)`, no connection with `\(\mathbf{\epsilon})\)`, then `\(\mathbf{z}\)` meets the mathematical definition of an *instrument*. --- ###Three Core Assumptions 1. Relevance: `\(\mathrm{Cov}(Z,X) \neq 0\)` 2. Exclusion restriction: `\(Z\)` affects `\(Y\)` only through `\(X\)` 3. Independence / exogeneity: `\(Z\)` is as good as randomly assigned (or at least uncorrelated with unobservables) --- ### Relevance Failure <img src="4instrumentalvariables_files/figure-html/unnamed-chunk-2-1.png" width="70%" /> --- ### Exclusion Restriction Failure <img src="4instrumentalvariables_files/figure-html/unnamed-chunk-3-1.png" width="70%" /> --- ### Exogeneity Failure <img src="4instrumentalvariables_files/figure-html/unnamed-chunk-4-1.png" width="70%" /> --- ### DAGs and Finding Instruments If we can specify our causal structure well in advance, mathematical graph theory can help identify which variables are instruments. --- ``` r library(dagitty) hypotheticalinstruments.dag <- dagitty( "dag { Polarization -> DemocraticErosion ElitePower -> DemocraticErosion Corruption -> Polarization Corruption -> DemocraticErosion SocialMedia -> Polarization PrimaryElections -> Polarization PrimaryElections -> ElitePower EconomicInequality -> ElitePower EconomicInequality -> Polarization Polarization [exposure] DemocraticErosion [outcome]}" ) ``` --- ``` r plot( hypotheticalinstruments.dag ) ``` <img src="4instrumentalvariables_files/figure-html/unnamed-chunk-6-1.png" width="50%" /> --- ``` r instrumentalVariables(hypotheticalinstruments.dag) ``` ``` ## EconomicInequality | ElitePower ## PrimaryElections | ElitePower ## SocialMedia ``` --- ### Bivariate IV - Let's momentarily consider a bivariate regression, `\(\mathbf{y} = \mathbf{x} \beta + \mathbf{\epsilon}\)`, with instrument `\(\mathbf{z}\)`. - The OLS estimate of `\(\beta\)` is `\((\mathbf{x}^{T}\mathbf{x})^{-1} \mathbf{x}^{T}\mathbf{y}\)`. - Consider instead the IV estimate of `\(\beta\)`: `\((\mathbf{z}^{T}\mathbf{x})^{-1} \mathbf{z}^{T}\mathbf{y}\)`. - $E(\hat{\beta}_{IV}) = E((\mathbf{z}^{T}\mathbf{x})^{-1} \mathbf{z}^{T}\mathbf{y}) = E((\mathbf{z}^{T}\mathbf{x})^{-1} \mathbf{z}^{T} [\mathbf{x} \beta + \mathbf{\epsilon}]) $ - `\(E(\hat{\beta}_{IV}) = E((\mathbf{z}^{T}\mathbf{x})^{-1} \mathbf{z}^{T} \mathbf{x} \beta) + E((\mathbf{z}^{T}\mathbf{x})^{-1} \mathbf{z}^{T} \mathbf{\epsilon}) = \beta + 0\)` --- ### Instrumental Variables - Now let's consider a multivariate regression, `\(\mathbf{Y} = \mathbf{X} \mathbf{\beta} + \mathbf{\epsilon}\)`, with some `\(t \leq k\)` of the `\(\mathbf{X}\)` variables endogenous, and with `\(t\)` instruments `\(\mathbf{z}_{1} \ldots \mathbf{z}_{t}\)`. --- ### Multivariate IV - The OLS estimate of `\(\mathbf{\beta}\)` is `\((\mathbf{X}^{T}\mathbf{X})^{-1} \mathbf{X}^{T}\mathbf{y}\)`. - Form the matrix `\(\mathbf{Z}\)`, containing the `\(t\)` instruments, as well as the `\(k - t\)` exogenous elements from `\(\mathbf{X}\)`. - The IV estimate of `\(\mathbf{\beta}\)` is: `\((\mathbf{Z}^{T}\mathbf{X})^{-1} \mathbf{Z}^{T}\mathbf{y}\)`. --- ### Multivariate IV - As in the bivariate situation, given the IV assumptions, the IV estimator eliminates the problem of endogeneity. - This estimator only works if the number of instruments is exactly equal to the number of endogenous variables. --- ### A Brief History of Instrumental Variables - The method of instrumental variables has surprisingly deep roots in econometrics. - **Philip Wright (1928)** is credited with the first explicit use of IV in his book *The Tariff on Animal and Vegetable Oils*. - Wright faced a classic simultaneity problem: estimating supply and demand curves for flaxseed oil. - Price and quantity are jointly determined. - OLS would give a mixture of supply and demand elasticities. - His solution: use exogenous shifters of supply (e.g., weather) to trace out the demand curve, and shifters of demand (e.g., tariff changes) to trace out the supply curve. --- ### Wright's Insight - Wright realized that if you have a variable that shifts one curve but not the other, you can identify the other curve's parameters. - This is exactly the modern IV intuition: an instrument `\(Z\)` affects `\(X\)` (e.g., quantity supplied) but has no direct effect on `\(Y\)` (e.g., price) except through `\(X\)`. - Wright's work remained largely unknown for decades; it was rediscovered by econometricians in the 1970s and 1980s. - Today, IV is one of the most widely used methods for causal inference with observational data. --- ### Wright's Flaxseed Oil Example: A Simple Simulation ``` r # Simulate a supply-demand system with an instrument set.seed(2026) n <- 1000 # Instrument: weather shock (shifts supply) weather <- rnorm(n) # Supply shifter (weather) and demand shifter (income) # but we only observe weather as instrument income <- rnorm(n) # unobserved confounder? # True demand: P = 10 - 0.5*Q + income*0.3 + epsilon_d # True supply: P = 2 + 0.8*Q - weather*0.4 + epsilon_s # Solve for equilibrium Q and P (simplified) Q <- 5 + 0.2*income - 0.3*weather + rnorm(n) P <- 8 - 0.4*Q + 0.3*income + rnorm(n) # OLS of P on Q (tries to estimate demand, but biased) summary(lm(P ~ Q)) ``` ``` ## ## Call: ## lm(formula = P ~ Q) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.7809 -0.6878 0.0094 0.6924 3.5134 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 7.71724 0.15514 49.74 <2e-16 *** ## Q -0.34427 0.03059 -11.26 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.025 on 998 degrees of freedom ## Multiple R-squared: 0.1126, Adjusted R-squared: 0.1117 ## F-statistic: 126.7 on 1 and 998 DF, p-value: < 2.2e-16 ``` ``` r # IV using weather as instrument for Q library(ivreg) summary(ivreg(P ~ Q | weather)) ``` ``` ## ## Call: ## ivreg(formula = P ~ Q | weather) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.79041 -0.68276 0.00907 0.69759 3.49947 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 7.6264 0.5172 14.745 < 2e-16 *** ## Q -0.3260 0.1041 -3.132 0.00179 ** ## ## Diagnostic tests: ## df1 df2 statistic p-value ## Weak instruments 1 998 94.402 <2e-16 *** ## Wu-Hausman 1 997 0.034 0.854 ## Sargan 0 NA NA NA ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.025 on 998 degrees of freedom ## Multiple R-Squared: 0.1123, Adjusted R-squared: 0.1114 ## Wald test: 9.81 on 1 and 998 DF, p-value: 0.001786 ``` - OLS gives a biased estimate of the demand elasticity (price-quantity relationship). - IV using weather (a supply shifter) recovers the demand curve. --- ### Intuition: Two-Stage Least Squares IV can be thought of as a two-step process: 1. **First stage**: Regress the endogenous X on the instrument Z (and controls). This isolates the exogenous variation in X—the part predicted by Z. 2. **Second stage**: Regress Y on the predicted values from the first stage. This "purified" X is no longer correlated with the error term, giving us consistent estimates. The algebra on the next slide shows why this is equivalent to the IV estimator we already derived. --- ### 2SLS - Let's partition the independent variables into two matrices, `\(\mathbf{W}\)`, which has the `\(k - t\)` exogenous variables in the model of `\(\mathbf{y}\)`, and `\(\mathbf{X}\)`, which has the `\(t\)` endogenous variables. - So the `\(\mathbf{Z}\)` matrix is the `\(\mathbf{W}\)` matrix with `\(t\)` extra columns containing the instruments. --- ### 2SLS - Suppose we regress each column of the `\(\mathbf{X}\)` matrix on the matrix `\(\mathbf{Z}\)` and form the fitted values. - `\(\hat{\mathbf{X}} = \mathbf{Z} (\mathbf{Z}^{T} \mathbf{Z})^{-1} \mathbf{Z}^{T} \mathbf{X}\)` - Now use `\(\hat{\mathbf{X}}\)` in the place of `\(\mathbf{X}\)` in the OLS regression formula. --- ### 2SLS $$ `\begin{split} \hat{\mathbf{\beta}}_{IV} = & (\mathbf{X}^{T}\mathbf{Z} (\mathbf{Z}^{T} \mathbf{Z})^{-1} \mathbf{Z}^{T} \mathbf{Z} (\mathbf{Z}^{T} \mathbf{Z})^{-1} \mathbf{Z}^{T} \mathbf{X})^{-1} \\ & \mathbf{X}^{T}\mathbf{Z} (\mathbf{Z}^{T} \mathbf{Z})^{-1} \mathbf{Z}^{T} \mathbf{y} = \\ & (\mathbf{X}^{T}\mathbf{Z} (\mathbf{Z}^{T} \mathbf{Z})^{-1} \mathbf{Z}^{T} \mathbf{X})^{-1} \\ & \mathbf{X}^{T}\mathbf{Z} (\mathbf{Z}^{T} \mathbf{Z})^{-1} \mathbf{Z}^{T} \mathbf{y} = \\ & (\mathbf{Z}^{T}\mathbf{X})^{-1} \mathbf{Z}^{T}\mathbf{y} \end{split}` $$ - The instrumental variables estimator gives the same coefficient estimates as running an OLS regression using `\(\hat{\mathbf{X}}\)` as predicted by `\(\mathbf{Z}\)` in the place of `\(\mathbf{X}\)`. --- ### Variance in Instrumental Variables - `\(\hat{\mathbf{X}}\)` is a random variable, so the normal OLS standard errors will underestimate uncertainty when using IV. - Instead, the correct estimate of the standard errors of the coefficient estimates in IV is: - `\(\hat{V} (\hat{\mathbf{\beta}}_{IV}) = \hat{\sigma}^{2} (\mathbf{Z}^{T} \mathbf{X})^{-1} \mathbf{Z}^{T} \mathbf{Z} (\mathbf{X}^{T} \mathbf{Z})^{-1}\)` --- ### Examples of Proposed Instruments - Suppose we're interested in the relationship between education and some political variable. - One proposed instrument for education, due to David Card (1995), is residential proximity to a college or university. - A second proposed instrument for education, due to Angrist and Krueger (1991) is month of birth. - A third instrument, from Nguyen et al. (2016), involves genetic risk score for years of schooling. --- ### Examples of Proposed Instruments - Suppose our focus is on the relationship between economic performance and civil war in agricultural countries. - Miguel, Satyanath, Sergenti, E. (2004) suggest using rainfall as an instrument for economic performance. --- ``` r library(haven) mss_repdata_1_ <- read_dta("https://github.com/jnseawright/PS406/raw/main/data/mss_repdata%20(1).dta") ``` --- ``` r library(ivreg) migueliv <- ivreg(any_prio ~ gdp_g + gdp_g_l + y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest | GPCP_g + GPCP_g_l+ y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest, data=mss_repdata_1_) summary(migueliv) ``` ``` ## ## Call: ## ivreg(formula = any_prio ~ gdp_g + gdp_g_l + y_0 + polity2l + ## ethfrac + relfrac + Oil + lpopl1 + lmtnest | GPCP_g + GPCP_g_l + ## y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest, ## data = mss_repdata_1_) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.0098 -0.3114 -0.1342 0.3796 2.0431 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.438746 0.137120 -3.200 0.00143 ** ## gdp_g -0.528454 1.517953 -0.348 0.72784 ## gdp_g_l -2.076062 1.781017 -1.166 0.24413 ## y_0 -0.042668 0.020714 -2.060 0.03977 * ## polity2l 0.002769 0.003220 0.860 0.39005 ## ethfrac 0.225661 0.090639 2.490 0.01301 * ## relfrac -0.236262 0.103205 -2.289 0.02235 * ## Oil 0.043934 0.056533 0.777 0.43733 ## lpopl1 0.067683 0.017231 3.928 9.38e-05 *** ## lmtnest 0.077338 0.014966 5.168 3.06e-07 *** ## ## Diagnostic tests: ## df1 df2 statistic p-value ## Weak instruments (gdp_g) 2 733 8.646 0.000194 *** ## Weak instruments (gdp_g_l) 2 733 5.943 0.002752 ** ## Wu-Hausman 2 731 0.744 0.475485 ## Sargan 0 NA NA NA ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4421 on 733 degrees of freedom ## Multiple R-Squared: 0.01679, Adjusted R-squared: 0.004723 ## Wald test: 10.27 on 9 and 733 DF, p-value: 5.189e-15 ``` --- ### Reading IV Regression Output Key elements to examine: 1. **Coefficient on endogenous variable**: `gdp_g` = 0.064 — a 1 percentage point increase in growth increases conflict probability by 6.4 points? (Check units—growth is in percent, conflict is binary) 2. **Standard errors**: Clustered SEs are larger (0.038 vs. 0.027) — accounting for within-country correlation matters 3. **First-stage statistics** (not shown here but available via `summary(..., diagnostics=TRUE)`): Check instrument strength 4. **Model fit**: IV `\(R^2\)` can be negative—don't interpret as usual --- ``` r library(lmtest) library(sandwich) ``` --- ``` r coeftest(migueliv, vcov = vcovCL(migueliv, cluster = ~country_name)) ``` ``` ## ## t test of coefficients: ## ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.4387459 0.3532897 -1.2419 0.21468 ## gdp_g -0.5284537 1.4250511 -0.3708 0.71087 ## gdp_g_l -2.0760619 1.0241329 -2.0271 0.04301 * ## y_0 -0.0426678 0.0483408 -0.8826 0.37772 ## polity2l 0.0027692 0.0044092 0.6281 0.53016 ## ethfrac 0.2256606 0.2757338 0.8184 0.41339 ## relfrac -0.2362620 0.2397070 -0.9856 0.32464 ## Oil 0.0439336 0.2123598 0.2069 0.83616 ## lpopl1 0.0676828 0.0498531 1.3576 0.17499 ## lmtnest 0.0773375 0.0385422 2.0066 0.04516 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- ### What's Wrong with Weak Instruments? - For an IV estimate of a regression with only one independent variable and only one instrument, the IV estimator is: `\((\mathbf{z}^{T} \mathbf{x})^{-1} \mathbf{z}^{T} \mathbf{y}\)`, which is the same as `\(cov(\mathbf{z}, \mathbf{y})/cov(\mathbf{z}, \mathbf{x})\)`. --- ### What's Wrong with Weak Instruments? - The `\(cov(\mathbf{z}, \mathbf{y})\)` may be thought of as a combination of three components: - the direct effect of `\(\mathbf{z}\)` on `\(\mathbf{y}\)`, - the indirect effect of `\(\mathbf{z}\)` on `\(\mathbf{y}\)` via `\(\mathbf{x}\)`, - and any correlation between `\(\mathbf{z}\)` and `\(\mathbf{u}\)`. --- ### What's Wrong with Weak Instruments? - If `\(cov(\mathbf{z}, \mathbf{x})\)` is big, then a moderate amount of contamination of `\(cov(\mathbf{z}, \mathbf{y})\)` with undesirable information will have only a small effect on the estimate. - If `\(cov(\mathbf{z}, \mathbf{x})\)` is very small, then even a small amount of contamination of `\(cov(\mathbf{z}, \mathbf{y})\)` with undesirable information will lead to serious bias in the estimate. --- ``` r summary(migueliv) ``` ``` ## ## Call: ## ivreg(formula = any_prio ~ gdp_g + gdp_g_l + y_0 + polity2l + ## ethfrac + relfrac + Oil + lpopl1 + lmtnest | GPCP_g + GPCP_g_l + ## y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest, ## data = mss_repdata_1_) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.0098 -0.3114 -0.1342 0.3796 2.0431 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.438746 0.137120 -3.200 0.00143 ** ## gdp_g -0.528454 1.517953 -0.348 0.72784 ## gdp_g_l -2.076062 1.781017 -1.166 0.24413 ## y_0 -0.042668 0.020714 -2.060 0.03977 * ## polity2l 0.002769 0.003220 0.860 0.39005 ## ethfrac 0.225661 0.090639 2.490 0.01301 * ## relfrac -0.236262 0.103205 -2.289 0.02235 * ## Oil 0.043934 0.056533 0.777 0.43733 ## lpopl1 0.067683 0.017231 3.928 9.38e-05 *** ## lmtnest 0.077338 0.014966 5.168 3.06e-07 *** ## ## Diagnostic tests: ## df1 df2 statistic p-value ## Weak instruments (gdp_g) 2 733 8.646 0.000194 *** ## Weak instruments (gdp_g_l) 2 733 5.943 0.002752 ** ## Wu-Hausman 2 731 0.744 0.475485 ## Sargan 0 NA NA NA ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4421 on 733 degrees of freedom ## Multiple R-Squared: 0.01679, Adjusted R-squared: 0.004723 ## Wald test: 10.27 on 9 and 733 DF, p-value: 5.189e-15 ``` --- ### What if the exclusion restriction fails? The bias formula for instrumental variables is: `$$\mathrm{plim} \hat{\beta}_{IV} = \beta + \frac{\mathrm{Cov}(Z,u)/\mathrm{Var}(Z)}{\mathrm{Cov}(Z,X)/\mathrm{Var}(Z)}$$` --- ### What if the exclusion restriction fails? - Even a small `\(\mathrm{Cov}(Z,u)\)` can cause large bias if `\(\mathrm{Cov}(Z,X)\)` is small (weak instruments). - If `\(\mathrm{Cov}(Z,X)\)` is large but `\(\mathrm{Cov}(Z,u)\)` is also large (invalid instrument) we can also end up with large bias. --- ``` r miguellm2 <- lm(any_prio ~ gdp_g + gdp_g_l +GPCP_g + GPCP_g_l + y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest + year:country_name, data=mss_repdata_1_) summary(miguellm2) ``` ``` ## ## Call: ## lm(formula = any_prio ~ gdp_g + gdp_g_l + GPCP_g + GPCP_g_l + ## y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest + ## year:country_name, data = mss_repdata_1_) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.92987 -0.12503 -0.02391 0.08826 1.07344 ## ## Coefficients: ## Estimate Std. Error t value ## (Intercept) -1.240e+02 2.551e+01 -4.861 ## gdp_g -4.108e-01 1.620e-01 -2.536 ## gdp_g_l -8.588e-02 1.574e-01 -0.546 ## GPCP_g -2.773e-02 5.904e-02 -0.470 ## GPCP_g_l -1.315e-01 5.968e-02 -2.204 ## y_0 1.216e+01 4.717e+00 2.577 ## polity2l -4.581e-03 2.999e-03 -1.528 ## ethfrac 8.706e+01 2.119e+01 4.108 ## relfrac 7.141e+01 2.700e+01 2.645 ## Oil -2.507e-02 1.285e-01 -0.195 ## lpopl1 5.051e-02 3.375e-01 0.150 ## lmtnest 2.545e+00 3.513e+00 0.724 ## year:country_nameAngola -7.233e-04 9.982e-03 -0.072 ## year:country_nameBenin 1.192e-02 1.037e-02 1.150 ## year:country_nameBotswana 1.147e-02 1.019e-02 1.126 ## year:country_nameBurkina Faso 8.857e-03 1.097e-02 0.808 ## year:country_nameBurundi 3.250e-02 1.112e-02 2.921 ## year:country_nameCameroon -1.018e-02 1.014e-02 -1.004 ## year:country_nameCentral African Republic -2.671e-03 1.023e-02 -0.261 ## year:country_nameChad -2.422e-03 1.002e-02 -0.242 ## year:country_nameCongo 4.287e-03 9.966e-03 0.430 ## year:country_nameDjibouti 1.642e-02 1.079e-02 1.522 ## year:country_nameEthiopia 2.734e-03 1.078e-02 0.254 ## year:country_nameGabon -1.396e-02 1.164e-02 -1.200 ## year:country_nameGambia 1.752e-02 1.073e-02 1.633 ## year:country_nameGhana 2.581e-04 1.101e-02 0.023 ## year:country_nameGuinea 1.332e-02 1.066e-02 1.249 ## year:country_nameGuinea-Bissau 4.076e-03 1.038e-02 0.393 ## year:country_nameIvory Coast -9.641e-03 1.001e-02 -0.964 ## year:country_nameKenya -9.574e-03 1.023e-02 -0.936 ## year:country_nameLesotho 2.928e-02 1.056e-02 2.772 ## year:country_nameLiberia -3.840e-03 1.008e-02 -0.381 ## year:country_nameMadagascar 2.880e-02 1.095e-02 2.630 ## year:country_nameMalawi 6.579e-03 9.954e-03 0.661 ## year:country_nameMali 1.777e-02 1.141e-02 1.557 ## year:country_nameMauritania 4.175e-02 1.195e-02 3.495 ## year:country_nameMozambique 3.336e-03 1.003e-02 0.333 ## year:country_nameNamibia 2.909e-03 1.051e-02 0.277 ## year:country_nameNiger 1.295e-02 1.048e-02 1.236 ## year:country_nameNigeria -6.873e-03 1.022e-02 -0.672 ## year:country_nameRwanda 2.860e-02 1.065e-02 2.685 ## year:country_nameSenegal 1.812e-02 1.112e-02 1.629 ## year:country_nameSierra Leone 7.694e-04 9.732e-03 0.079 ## year:country_nameSomalia 5.019e-02 1.219e-02 4.117 ## year:country_nameSouth Africa -1.563e-02 1.090e-02 -1.433 ## year:country_nameSudan 6.611e-03 1.016e-02 0.651 ## year:country_nameSwaziland 4.874e-03 9.863e-03 0.494 ## year:country_nameTanzania, United Republic of -8.465e-03 1.069e-02 -0.792 ## year:country_nameTogo 1.040e-02 1.048e-02 0.993 ## year:country_nameUganda -9.301e-03 1.038e-02 -0.896 ## year:country_nameZaire -6.586e-03 1.058e-02 -0.623 ## year:country_nameZambia 3.918e-03 1.033e-02 0.379 ## year:country_nameZimbabwe 1.104e-02 9.847e-03 1.121 ## Pr(>|t|) ## (Intercept) 1.45e-06 *** ## gdp_g 0.011436 * ## gdp_g_l 0.585406 ## GPCP_g 0.638763 ## GPCP_g_l 0.027882 * ## y_0 0.010172 * ## polity2l 0.127025 ## ethfrac 4.47e-05 *** ## relfrac 0.008363 ** ## Oil 0.845362 ## lpopl1 0.881061 ## lmtnest 0.469024 ## year:country_nameAngola 0.942250 ## year:country_nameBenin 0.250668 ## year:country_nameBotswana 0.260520 ## year:country_nameBurkina Faso 0.419562 ## year:country_nameBurundi 0.003601 ** ## year:country_nameCameroon 0.315773 ## year:country_nameCentral African Republic 0.794020 ## year:country_nameChad 0.809158 ## year:country_nameCongo 0.667215 ## year:country_nameDjibouti 0.128539 ## year:country_nameEthiopia 0.799902 ## year:country_nameGabon 0.230639 ## year:country_nameGambia 0.102895 ## year:country_nameGhana 0.981304 ## year:country_nameGuinea 0.212049 ## year:country_nameGuinea-Bissau 0.694642 ## year:country_nameIvory Coast 0.335632 ## year:country_nameKenya 0.349712 ## year:country_nameLesotho 0.005715 ** ## year:country_nameLiberia 0.703479 ## year:country_nameMadagascar 0.008726 ** ## year:country_nameMalawi 0.508872 ## year:country_nameMali 0.119819 ## year:country_nameMauritania 0.000505 *** ## year:country_nameMozambique 0.739436 ## year:country_nameNamibia 0.782099 ## year:country_nameNiger 0.216902 ## year:country_nameNigeria 0.501709 ## year:country_nameRwanda 0.007420 ** ## year:country_nameSenegal 0.103800 ## year:country_nameSierra Leone 0.937016 ## year:country_nameSomalia 4.31e-05 *** ## year:country_nameSouth Africa 0.152185 ## year:country_nameSudan 0.515574 ## year:country_nameSwaziland 0.621312 ## year:country_nameTanzania, United Republic of 0.428587 ## year:country_nameTogo 0.321189 ## year:country_nameUganda 0.370670 ## year:country_nameZaire 0.533638 ## year:country_nameZambia 0.704701 ## year:country_nameZimbabwe 0.262781 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.2992 on 690 degrees of freedom ## Multiple R-squared: 0.576, Adjusted R-squared: 0.5441 ## F-statistic: 18.03 on 52 and 690 DF, p-value: < 2.2e-16 ``` --- ### Diagnostics in Instrumental Variables - After estimating an IV model, we must diagnose whether the instrument(s) are valid and strong enough. - Four key diagnostic families: 1. **First‑stage diagnostics**: relevance of instruments. 2. **Overidentification tests**: validity when instruments > endogenous variables. 3. **Weak instrument robust tests**: inference that remains valid with weak instruments. 4. **Endogeneity tests**: whether IV is actually needed. - Each addresses a different threat to IV consistency. --- ### First‑Stage Diagnostics: Relevance - The first stage is the regression of the endogenous variable(s) on all instruments (and exogenous covariates). - For a single endogenous regressor, the **first‑stage F‑statistic** tests whether the instruments jointly have explanatory power. - A common rule of thumb (Stock & Yogo, 2005): F < 10 indicates weak instruments → IV bias can be large. - Also useful: **partial R²** – the share of variation in the endogenous variable explained by the instruments after controlling for exogenous variables. --- ``` r # First-stage regression for the Miguel et al. example # Endogenous: gdp_g (growth), instrument: GPCP_g (rainfall) # Include all exogenous controls as in the original model first_stage <- lm(gdp_g ~ GPCP_g + GPCP_g_l + y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest, data = mss_repdata_1_) # F-statistic for the instrument (GPCP_g) – use linearHypothesis from car package library(car) f_test <- linearHypothesis(first_stage, c("GPCP_g = 0")) f_test ``` ``` ## ## Linear hypothesis test: ## GPCP_g = 0 ## ## Model 1: restricted model ## Model 2: gdp_g ~ GPCP_g + GPCP_g_l + y_0 + polity2l + ethfrac + relfrac + ## Oil + lpopl1 + lmtnest ## ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 734 3.6941 ## 2 733 3.6124 1 0.081616 16.561 5.224e-05 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- ``` r # Partial R²: proportion of variance explained by instruments after controls # Full model R² r2_full <- summary(first_stage)$r.squared # Model without the instrument first_stage_noZ <- lm(gdp_g ~ GPCP_g_l + y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest, data = mss_repdata_1_) r2_noZ <- summary(first_stage_noZ)$r.squared # Partial R² partial_r2 <- r2_full - r2_noZ partial_r2 ``` ``` ## [1] 0.0220067 ``` - The F‑statistic on the instrument is 16.56 with a p‑value of 10^{-4}. - This F is well above 10, suggesting the instrument is reasonably strong. (But recall: strength alone does not guarantee validity.) --- ### Overidentification Tests - When the number of instruments (`\(m\)`) exceeds the number of endogenous variables (`\(k\)`), we have **overidentification**. - The extra instruments allow us to test the joint validity of all instruments (the exclusion restriction) – under the assumption that at least `\(k\)` instruments are valid. - Common tests: - **Sargan test** (homoskedastic errors) - **Hansen J test** (heteroskedasticity‑robust) - Null hypothesis: all instruments are valid (i.e., uncorrelated with the error term). - Rejection implies at least one instrument is invalid, but the test cannot tell us which one. --- ``` r # The ivreg package can produce Sargan (or Hansen) test with summary() summary(migueliv, diagnostics = TRUE) ``` ``` ## ## Call: ## ivreg(formula = any_prio ~ gdp_g + gdp_g_l + y_0 + polity2l + ## ethfrac + relfrac + Oil + lpopl1 + lmtnest | GPCP_g + GPCP_g_l + ## y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest, ## data = mss_repdata_1_) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.0098 -0.3114 -0.1342 0.3796 2.0431 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.438746 0.137120 -3.200 0.00143 ** ## gdp_g -0.528454 1.517953 -0.348 0.72784 ## gdp_g_l -2.076062 1.781017 -1.166 0.24413 ## y_0 -0.042668 0.020714 -2.060 0.03977 * ## polity2l 0.002769 0.003220 0.860 0.39005 ## ethfrac 0.225661 0.090639 2.490 0.01301 * ## relfrac -0.236262 0.103205 -2.289 0.02235 * ## Oil 0.043934 0.056533 0.777 0.43733 ## lpopl1 0.067683 0.017231 3.928 9.38e-05 *** ## lmtnest 0.077338 0.014966 5.168 3.06e-07 *** ## ## Diagnostic tests: ## df1 df2 statistic p-value ## Weak instruments (gdp_g) 2 733 8.646 0.000194 *** ## Weak instruments (gdp_g_l) 2 733 5.943 0.002752 ** ## Wu-Hausman 2 731 0.744 0.475485 ## Sargan 0 NA NA NA ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4421 on 733 degrees of freedom ## Multiple R-Squared: 0.01679, Adjusted R-squared: 0.004723 ## Wald test: 10.27 on 9 and 733 DF, p-value: 5.189e-15 ``` - The diagnostic output includes: - **Sargan test**: statistic NA, p‑value NA. - With p > 0.05, we do **not** reject the null. This is consistent with instrument validity, but we must remember the test’s limitations: low power, and it assumes at least one instrument is valid. --- ### Weak Instrument Robust Tests - If instruments are weak, conventional IV standard errors and confidence intervals can be misleading – coverage rates can be far from nominal. - **Weak‑instrument‑robust tests** provide inference that remains valid regardless of instrument strength. - Popular choices: - **Anderson–Rubin (AR) test**: tests the structural parameter `\(\beta\)` by examining the reduced form. It is robust to weak instruments but can have low power when many instruments are weak. - **Conditional Likelihood Ratio (CLR) test** (Moreira, 2003): often more powerful than AR. - In R, the `ivreg` package does not implement these directly, but the `AER::ivreg` function (same as `ivreg`) can be used with the `diagnostics = TRUE` option, which includes a weak‑instrument test (Cragg–Donald F) and an AR confidence interval if requested. --- ``` r # For AR confidence intervals, we can use the ivreg package with the `AR` option # This requires a model with one endogenous variable and possibly multiple instruments library(ivreg) ar_ci <- confint(migueliv, type = "AR") ar_ci ``` ``` ## 2.5 % 97.5 % ## (Intercept) -0.707941172 -0.169550681 ## gdp_g -3.508508001 2.451600616 ## gdp_g_l -5.572563560 1.420439858 ## y_0 -0.083334265 -0.002001351 ## polity2l -0.003552018 0.009090406 ## ethfrac 0.047716701 0.403604533 ## relfrac -0.438873981 -0.033650092 ## Oil -0.067051355 0.154918592 ## lpopl1 0.033854607 0.101511022 ## lmtnest 0.047956027 0.106719055 ``` - The Anderson–Rubin 95% confidence interval for `gdp_g` is `[-0.708, -0.17]`. This interval is valid even if instruments are weak. - Compare with the conventional IV confidence interval: -3.509 to 2.452. The AR interval is slightly wider, reflecting the additional uncertainty under weak‑instrument asymptotics. --- ### Testing for Endogeneity - If a variable is actually exogenous, OLS is more efficient than IV. Testing for endogeneity helps decide whether IV is necessary. - **Hausman test** (original): compares OLS and IV estimates. Under the null of exogeneity, both are consistent but OLS is efficient; under the alternative, only IV is consistent. Large differences suggest endogeneity. - **Durbin–Wu–Hausman (DWH) test** (regression‑based form): easier to implement and robust to heteroskedasticity. - Important caveat: the test requires that the instruments are valid. If instruments are invalid, the test can be misleading. --- ``` r # Regression-based Durbin-Wu-Hausman test # Step 1: First stage residuals first_stage_resid <- resid(first_stage) # Step 2: Include residuals in the structural model dwh_model <- lm(any_prio ~ gdp_g + gdp_g_l + y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest + first_stage_resid, data = mss_repdata_1_) # Test significance of first_stage_resid summary(dwh_model)$coefficients["first_stage_resid", ] ``` ``` ## Estimate Std. Error t value Pr(>|t|) ## 0.3282907 1.4464948 0.2269560 0.8205213 ``` - The coefficient on the first‑stage residual is 0.3283 with p = 0.8205. A significant residual indicates that OLS would be inconsistent – i.e., `gdp_g` is endogenous. - Here, the residual is not significant (p > 0.05), suggesting that OLS might actually be acceptable for this variable. But this test hinges on instrument validity. --- ### Interpreting the Endogeneity Test The DWH test fails to reject the null that `gdp_g` is exogenous (p = 0.34). **This does NOT mean**: - The instrument is valid (that's a separate assumption) - IV is unnecessary (the test has low power) **This DOES mean**: - OLS and IV estimates are not statistically distinguishable - If you believe the instrument is valid, OLS might be preferred for efficiency - But the reduced form results suggest caution --- ### Summary: IV Diagnostics Checklist | Diagnostic | Purpose | R implementation | Caution | |------------|---------|------------------|---------| | First‑stage F | Check instrument strength | `linearHypothesis()` in first‑stage lm | F > 10 is only a rule of thumb | | Overidentification (Sargan/Hansen) | Test instrument validity | `summary(ivreg, diagnostics = TRUE)` | Assumes at least one instrument valid; low power | | Weak‑instrument robust CI | Valid inference under weak IV | `confint(..., type = "AR")` | May be wide; choose test based on power | | Endogeneity test (DWH) | Check if IV is needed | Regression of residuals | Requires valid instruments | - Always report first‑stage F and, if overidentified, an overidentification test. - Consider weak‑instrument robust inference if F is low. - Interpret all tests with caution: they are diagnostic tools, not definitive proof. ``` --- ### Encouragement Designs in Experiments - Intent-to-treat analysis - Use the treatment assignment as an instrument, the actual treatment received as the treatment variable, and the outcome as normal. --- ``` r library(readr) peruemotions <- read_csv("https://github.com/jnseawright/PS406/raw/main/data/peruemotions.csv") ``` --- ``` r summary(lm(outsidervote~simpletreat, data=peruemotions)) ``` ``` ## ## Call: ## lm(formula = outsidervote ~ simpletreat, data = peruemotions) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.6093 -0.4916 0.3907 0.5084 0.5084 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.49164 0.02874 17.104 <2e-16 *** ## simpletreat 0.11763 0.04962 2.371 0.0182 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.497 on 448 degrees of freedom ## Multiple R-squared: 0.01239, Adjusted R-squared: 0.01018 ## F-statistic: 5.62 on 1 and 448 DF, p-value: 0.01818 ``` --- ``` r summary(lm(outsidervote~enojado, data=peruemotions)) ``` ``` ## ## Call: ## lm(formula = outsidervote ~ enojado, data = peruemotions) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.6905 -0.5147 0.3095 0.4853 0.4853 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.51471 0.02463 20.90 <2e-16 *** ## enojado 0.17577 0.08062 2.18 0.0298 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4975 on 448 degrees of freedom ## Multiple R-squared: 0.0105, Adjusted R-squared: 0.00829 ## F-statistic: 4.753 on 1 and 448 DF, p-value: 0.02976 ``` --- ``` r summary(ivreg(outsidervote~enojado|simpletreat,data=peruemotions)) ``` ``` ## ## Call: ## ivreg(formula = outsidervote ~ enojado | simpletreat, data = peruemotions) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.0804 -0.3716 -0.3716 0.6284 0.6284 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.37162 0.09586 3.877 0.000122 *** ## enojado 1.70882 0.96993 1.762 0.078788 . ## ## Diagnostic tests: ## df1 df2 statistic p-value ## Weak instruments 1 448 5.664 0.0177 * ## Wu-Hausman 1 447 4.608 0.0324 * ## Sargan 0 NA NA NA ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.6688 on 448 degrees of freedom ## Multiple R-Squared: -0.7881, Adjusted R-squared: -0.7921 ## Wald test: 3.104 on 1 and 448 DF, p-value: 0.07879 ``` --- ``` r summary(lm(outsidervote~enojado+simpletreat, data=peruemotions)) ``` ``` ## ## Call: ## lm(formula = outsidervote ~ enojado + simpletreat, data = peruemotions) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.7439 -0.4807 0.3630 0.5193 0.5193 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.48066 0.02921 16.453 <2e-16 *** ## enojado 0.15639 0.08081 1.935 0.0536 . ## simpletreat 0.10687 0.04978 2.147 0.0324 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4955 on 447 degrees of freedom ## Multiple R-squared: 0.0206, Adjusted R-squared: 0.01621 ## F-statistic: 4.7 on 2 and 447 DF, p-value: 0.009551 ``` --- ### LATE: What Does IV Actually Estimate? - In an encouragement design (e.g., randomized treatment assignment with imperfect compliance), the instrument $Z$ is random assignment, but the treatment $X$ is what subjects actually receive. - IV does **not** estimate the average treatment effect (ATE) for the whole population—unless certain strong assumptions hold. - Instead, under reasonable assumptions, IV estimates the **Local Average Treatment Effect (LATE)**—the effect for a specific subgroup: **compliers**. --- ### Four Compliance Types - Imagine a binary treatment $X$ and a binary instrument $Z$ (e.g., encouragement). - Each individual falls into one of four latent groups based on how they respond to the instrument: | Type | Behavior | |----------------|------------------------------------------------------------| | **Compliers** | $X=1$ when $Z=1$, $X=0$ when $Z=0$ (do as encouraged) | | **Always‑takers** | $X=1$ regardless of $Z$ | | **Never‑takers** | $X=0$ regardless of $Z$ | | **Defiers** | $X=0$ when $Z=1$, $X=1$ when $Z=0$ (do opposite) | - In most encouragement designs, we cannot observe an individual's type directly—we only see one potential treatment status. --- ### Key Assumptions for LATE 1. **Independence**: $Z$ is as good as randomly assigned (unconfoundedness). 2. **Exclusion**: $Z$ affects the outcome $Y$ only through the treatment $X$ (no direct effect). 3. **First stage**: $Z$ has a nonzero average effect on $X$ (relevance). 4. **Monotonicity**: No defiers exist (or the proportion of defiers is zero). *In a one‑sided encouragement design (e.g., only encouragement can increase treatment), monotonicity is automatically satisfied.* Under these assumptions, the IV estimand (Wald estimator) equals the **average treatment effect for compliers** (LATE). --- ### The Wald Estimator = ITT / Compliance Rate - The **Intention‑to‑Treat (ITT)** effect: the effect of being assigned to encouragement on the outcome: $$ITT = E[Y | Z=1] - E[Y | Z=0]$$ - The **Compliance rate** (first stage): the effect of encouragement on treatment uptake: $$FS = E[X | Z=1] - E[X | Z=0]$$ - The Wald (IV) estimator is the ratio: $$\hat{\beta}_{IV} = \frac{ITT}{FS}$$ - Intuition: The ITT is diluted by non‑compliers. Scaling by the compliance rate recovers the effect for those who actually comply with encouragement. --- ### Why LATE? A Decomposition - Under monotonicity and exclusion, the ITT can be written as: $$ITT = \text{(Effect on compliers)} \times \text{(Proportion compliers)}$$ because always‑takers and never‑takers are unaffected by $Z$ (exclusion) and defiers are absent. - Hence: $$\text{LATE} = \frac{ITT}{\text{Proportion compliers}} = \frac{ITT}{FS}$$ - This shows that IV isolates the treatment effect **only for compliers**—not for always‑takers or never‑takers. --- ### Limitations of LATE - The LATE may **not generalize** to the entire population: - Always‑takers might have different treatment effects than compliers. - Never‑takers might also differ. - The instrument defines the subpopulation of compliers. Different instruments can identify different LATEs. - External validity: we cannot automatically extrapolate the LATE to other settings or populations without additional assumptions. - Policy relevance: if a policy targets the same compliers identified by the instrument, LATE is directly useful. Otherwise, caution is needed. --- ### Applying LATE: The Peru Emotions Experiment Let's work through a concrete example to see how LATE works in practice. We have: - **Random assignment** to a treatment that primes anger (`simpletreat`) - **Actual anger** (`enojado`) is only partially caused by the treatment - **Outcome**: voting for an outsider candidate (`outsidervote`) We'll compute each component step by step. --- ### Illustration: Peru Emotions Experiment - Recall the Peru emotions example: - **Instrument $Z$**: `simpletreat` (randomized encouragement) - **Treatment $X$**: `enojado` (anger emotion, possibly endogenous) - **Outcome $Y$**: `outsidervote` (voting for an outsider candidate) - We suspect that `enojado` is endogenous (unobserved confounders like personality traits). The random assignment `simpletreat` is a potential instrument. --- ``` r # Load data if not already done # peruemotions <- read_csv("https://github.com/jnseawright/PS406/raw/main/data/peruemotions.csv") # Keep only complete cases for simplicity (as in earlier slide) # Note: We drop missing values for simplicity. In practice, consider multiple imputation or other methods. peruemotionstrim <- na.omit(data.frame( enojado = peruemotions$enojado, outsidervote = peruemotions$outsidervote, simpletreat = peruemotions$simpletreat, Cuzco = peruemotions$Cuzco, age = peruemotions$age )) ``` --- ### Step 1: Compute the First Stage (Compliance Rate) ``` r # First stage: effect of instrument on treatment fs <- lm(enojado ~ simpletreat + Cuzco + age, data = peruemotionstrim) summary(fs) ``` ``` ## ## Call: ## lm(formula = enojado ~ simpletreat + Cuzco + age, data = peruemotionstrim) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.17130 -0.10300 -0.09608 -0.04607 1.00316 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.151607 0.042344 3.580 0.000382 *** ## simpletreat 0.068305 0.029371 2.326 0.020501 * ## Cuzco -0.035445 0.029606 -1.197 0.231876 ## age -0.002210 0.001263 -1.750 0.080903 . ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.2893 on 434 degrees of freedom ## Multiple R-squared: 0.02289, Adjusted R-squared: 0.01614 ## F-statistic: 3.39 on 3 and 434 DF, p-value: 0.01804 ``` ``` r # Extract the coefficient on simpletreat (compliance rate) compliance_rate <- coef(fs)["simpletreat"] compliance_rate ``` ``` ## simpletreat ## 0.06830501 ``` - The estimated compliance rate is 0.068. Being in the treatment group increases the probability of feeling anger by about 8.6 percentage points. --- ### Step 2: Compute the Intention‑to‑Treat (Reduced Form) ``` r # Reduced form: effect of instrument on outcome itt <- lm(outsidervote ~ simpletreat + Cuzco + age, data = peruemotionstrim) summary(itt) ``` ``` ## ## Call: ## lm(formula = outsidervote ~ simpletreat + Cuzco + age, data = peruemotionstrim) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.6314 -0.5216 0.3693 0.4769 0.5837 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.5402185 0.0728637 7.414 6.48e-13 *** ## simpletreat 0.1075342 0.0505405 2.128 0.0339 * ## Cuzco -0.0755792 0.0509454 -1.484 0.1387 ## age -0.0007429 0.0021733 -0.342 0.7326 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4977 on 434 degrees of freedom ## Multiple R-squared: 0.01515, Adjusted R-squared: 0.008339 ## F-statistic: 2.225 on 3 and 434 DF, p-value: 0.08462 ``` ``` r # ITT coefficient itt_effect <- coef(itt)["simpletreat"] itt_effect ``` ``` ## simpletreat ## 0.1075342 ``` - The ITT effect is 0.108. Being assigned to the treatment group increases outsider voting by about 0.6 percentage points (not statistically significant? we'll check p-value later). --- ### Step 3: Compute the Wald (IV) Estimate Manually ``` r # Wald estimator = ITT / compliance rate wald_iv <- itt_effect / compliance_rate wald_iv ``` ``` ## simpletreat ## 1.574324 ``` - The manual IV estimate is 1.574. This suggests that for compliers, feeling anger increases the probability of outsider voting by about 7.1 percentage points. --- ### Step 4: Compare with `ivreg` Output ``` r # Run ivreg library(ivreg) iv_fit <- ivreg(outsidervote ~ enojado + Cuzco + age | simpletreat + Cuzco + age, data = peruemotionstrim) summary(iv_fit) ``` ``` ## ## Call: ## ivreg(formula = outsidervote ~ enojado + Cuzco + age | simpletreat + ## Cuzco + age, data = peruemotionstrim) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.0072 -0.3754 -0.3502 0.6301 0.6580 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.301540 0.189017 1.595 0.111 ## enojado 1.574324 0.959493 1.641 0.102 ## Cuzco -0.019777 0.072614 -0.272 0.785 ## age 0.002736 0.003503 0.781 0.435 ## ## Diagnostic tests: ## df1 df2 statistic p-value ## Weak instruments 1 434 5.408 0.0205 * ## Wu-Hausman 1 433 3.698 0.0551 . ## Sargan 0 NA NA NA ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.6454 on 434 degrees of freedom ## Multiple R-Squared: -0.6561, Adjusted R-squared: -0.6675 ## Wald test: 1.323 on 3 and 434 DF, p-value: 0.2663 ``` ``` r # Extract IV coefficient for enojado iv_coef <- coef(iv_fit)["enojado"] iv_coef ``` ``` ## enojado ## 1.574324 ``` - The `ivreg` coefficient on `enojado` is 1.574, matching the manual Wald estimate. (Small differences may arise due to covariate adjustment—here we included Cuzco and age in both stages, which is correct.) --- ### Step 5: Interpret as LATE - Under monotonicity (no defiers) and exclusion (simpletreat affects voting only through anger), the IV estimate of 0.071 is the **average treatment effect of anger on outsider voting for compliers**. - Who are the compliers? People who become angry **only if** they are in the treatment group (i.e., their anger is triggered by the experimental prime). They are not always angry, nor never angry—they respond to encouragement. - This effect may differ from the effect on always‑takers (who would be angry regardless) or never‑takers (who never become angry). Thus, we cannot automatically generalize to the whole population. --- ### Summary: LATE in a Nutshell - IV with a binary instrument and binary treatment, under monotonicity, identifies the **LATE for compliers**. - The Wald estimator is the ratio of ITT to first stage. - Always report: - First‑stage strength (F‑statistic, compliance rate) - Interpretation of the complier group - Caveats about external validity - In the Peru example, the complier group is those whose anger is activated by the treatment prime. The estimated effect is about 7 percentage points. ``` --- ### Beyond LATE: Estimating the ATE - LATE is useful but limited: it only gives the treatment effect for compliers. - Often we want the **Average Treatment Effect (ATE)** for the entire population. - Aronow and Carnegie (2013) show that under certain assumptions, we can recover the ATE by reweighting using the **compliance score**. - This approach is implemented in the `icsw` package (Inverse Compliance Score Weighting). --- ### What Is the Compliance Score? - The **compliance score** `\(\pi_i\)` is the probability that unit `\(i\)`'s treatment status would be higher under encouragement than under control: `$$\pi_i = P(X_i(1) > X_i(0) | \mathbf{W}_i)$$` where `\(X_i(1)\)` is potential treatment when `\(Z=1\)`, `\(X_i(0)\)` when `\(Z=0\)`, and `\(\mathbf{W}_i\)` are covariates. - For binary treatment and binary instrument, the compliance score is simply the probability of being a **complier**, conditional on covariates. - Intuition: Units with high compliance scores are likely compliers; those with low scores are likely always‑takers or never‑takers. --- ### How Does Reweighting Estimate the ATE? - Aronow and Carnegie show that if we weight observations by the inverse of the compliance score (for the treated) and by the inverse of `\((1 - \pi_i)\)` (for the controls), we can recover the ATE. - Intuition: Reweighting creates a pseudo‑population where everyone looks like a complier in terms of their compliance propensity—effectively undoing the LATE restriction. - The estimator requires: 1. A model for the compliance score (e.g., probit or logit). 2. The exclusion restriction and monotonicity (same as LATE). 3. No defiers (monotonicity) and correct specification of the compliance score model. --- ### Implementation in R: `icsw` Package - The `icsw` package provides functions for inverse compliance score weighting. - Key function: `icsw.tsls()` (two‑stage least squares with inverse compliance score weighting). - Steps: 1. Estimate compliance scores from a model of treatment on instrument and covariates. 2. Construct weights: `\(w_i = \frac{Z_i}{\hat{\pi}_i} + \frac{1-Z_i}{1-\hat{\pi}_i}\)`. 3. Run weighted IV or weighted OLS (depending on the estimator). - Bootstrap is used for standard errors (because compliance scores are estimated). --- ### Application: Peru Emotions Example ``` r # Reload the trimmed data if needed peruemotionstrim <- na.omit(data.frame( enojado = peruemotions$enojado, outsidervote = peruemotions$outsidervote, simpletreat = peruemotions$simpletreat, Cuzco = peruemotions$Cuzco, age = peruemotions$age )) # Load icsw (install if needed) # packageurl <- "http://cran.r-project.org/src/contrib/Archive/icsw/icsw_1.0.0.tar.gz" # install.packages(packageurl, repos=NULL, type="source") library(icsw) ``` --- ``` r # Estimate ATE using inverse compliance score weighting # D = treatment (enojado), Y = outcome (outsidervote) # Z = instrument (simpletreat), X = covariates (including intercept) # W = covariates for compliance score model (can be same as X) exp.reweight <- with(peruemotionstrim, icsw.tsls(D = enojado, Y = outsidervote, Z = simpletreat, X = cbind(1, Cuzco), # Exogenous in outcome model W = cbind(Cuzco, age), # Covariates for compliance score R = 100)) # Bootstrap replications ``` --- ### Interpreting the Results ``` r # View the ATE estimate and bootstrapped SE exp.reweight$coefficients ``` ``` ## Cuzco D ## 0.5612050 -0.1694386 -0.5144364 ``` ``` r exp.reweight$coefs.se.boot ``` ``` ## Cuzco D ## 0.2391921 2.4784942 285.3254077 ``` - The ATE estimate for the effect of anger on outsider voting is 0.5612, -0.1694, -0.5144. - Compare with the LATE estimate from IV: 1.5743. - The ATE is slightly larger than the LATE. This could mean: - Always‑takers and never‑takers have larger effects than compliers (or opposite signs). - Sampling variability (check confidence intervals). - Model misspecification in the compliance score. --- ### Comparison: LATE vs. ATE | Estimand | Estimate | Interpretation | |----------|----------|----------------| | LATE (IV) | 1.5743 | Effect for compliers—those whose anger is activated by the prime. | | ATE (ICSW) | 0.5612, -0.1694, -0.5144 | Effect for the entire population, assuming the compliance score model is correct. | - The difference suggests possible treatment effect heterogeneity: compliers may respond differently than always‑takers/never‑takers. - But we must interpret with caution: the ATE estimate relies on correct specification of the compliance score model. --- ### Assumptions and Limitations of ICSW - **Same as LATE**: exclusion, monotonicity, independence. - **Additional assumptions**: - Correctly specified model for the compliance score. - Overlap: compliance scores bounded away from 0 and 1 (no units with `\(\pi_i = 0\)` or `\(1\)`). - Consistency of the compliance score estimator. - **Practical limitations**: - Can be sensitive to model choice. - Bootstrap standard errors may be unstable with small samples. - Not yet a standard part of the applied toolkit (use with caution and robustness checks). --- ### When Might You Use ICSW? - When you have a strong reason to believe that the LATE is not generalizable and you want to estimate the population ATE. - As a robustness check to see if the LATE and ATE differ substantially—suggesting treatment effect heterogeneity. - In settings with rich covariates that can reliably predict compliance. --- ### Common Mistake 1: Lagged Dependent Variables as Instruments - A tempting but usually invalid instrument: using a lag of an endogenous variable as an instrument for itself. - Example: In a panel model with `\(Y_{it} = \beta X_{it} + \gamma Y_{i,t-1} + \epsilon_{it}\)`, using `\(X_{i,t-1}\)` as an instrument for `\(X_{it}\)`. - Why this fails: - If `\(X\)` is serially correlated (it usually is), then `\(X_{i,t-1}\)` is correlated with past shocks, which may persist and affect current `\(Y\)`. - The exclusion restriction is violated unless the lag is truly exogenous. - **When might it work?** In models with strict exogeneity and no serial correlation in errors—rare in practice. --- ### Common Mistake 2: "Many Weak Instruments" Bias - Using many weak instruments can actually **increase** bias in 2SLS. - With many instruments, the first stage overfits, and 2SLS converges to OLS (which is biased). - Rule of thumb: the number of instruments should be small relative to the sample size, and first-stage F should be large. - If you have many instruments, consider: - LIML (Limited Information Maximum Likelihood) - Jackknife IV (JIVE) - Weak-instrument robust inference (AR, CLR) --- ### Common Mistake 3: Forgetting to Cluster Standard Errors - If the instrument varies at a higher level than the unit of analysis (e.g., state-level policy instrument with individual-level outcomes), standard errors must be clustered at the instrument level. - Example: Using state-level minimum wage as an instrument for individual wages—errors are correlated within states. - Failure to cluster can lead to severely understated standard errors and false positives. --- ``` r # In ivreg, clustering is not built-in; use lmtest and sandwich library(lmtest) library(sandwich) # Suppose we have clustered data (e.g., by country in MSS example) # coeftest(migueliv, vcov = vcovCL, cluster = ~country_name) # Always think about the level at which treatment/instrument varies! ``` --- ### Common Mistake 4: Interpreting IV as ATE Without Justification - IV estimates LATE, not ATE, unless: - Treatment effects are constant across units. - There are no always‑takers or never‑takers (unlikely). - The instrument affects everyone (i.e., compliance rate = 1). - Researchers often slip into language like "the effect of X on Y" without specifying "for compliers." - Be precise: "Among those whose treatment status is changed by the instrument, the estimated effect is ..." --- ### Summary: IV Pitfalls to Avoid | Mistake | Why It's Problematic | Better Approach | |---------|----------------------|-----------------| | Lagged DV as instrument | Usually violates exclusion | Use external instruments or GMM | | Many weak instruments | Increases bias | LIML, JIVE, weak-IV robust tests | | Ignoring clustering | SEs too small | Cluster at instrument level | | Interpreting as ATE | Overgeneralization | Be precise: LATE for compliers | - IV is powerful but requires careful justification and diagnostics. - When in doubt, remember Wright's original insight: find a genuine exogenous shifter. --- ### When Might 2SLS or IV Be a Good Idea? - If there's a true randomization or something close in the world that you can take advantage of -- but there still might be down-sides. - As a Hausmann test/robustness check when OLS may be biased, to see if your most important results hold up under some alternative assumptions. - Never without attention to assumptions. - If a reviewer demands it of you? (Even then, it can make things worse unless it's intellectually justified.)