4: Instrumental Variables

class: center, middle, inverse, title-slide

.title[
# 4: Instrumental Variables
]
.subtitle[
## Quantitative Causal Inference
]
.author[
### <large>Jaye Seawright</large>
]
.institute[
### <small>Northwestern Political Science</small>
]
.date[
### April 23 and 28, 2026
]

---

class: center, middle

pre[class] {
  max-height: 200px;
}
</style>

###Today's Plan

1.  Why Instrumental Variables (IV)?
2.  The IV estimator and its assumptions
3.  Two-stage least squares
4.  Diagnostics
5.  Heterogeneity and LATE interpretation
6.  Applications and limitations

---
### Endogeneity in OLS

-   `$E(\mathbf{u} | \mathbf{X}) = 0$`?

-   As you'll recall,
    `$E(\hat{\mathbf{\beta}}) = \beta + (\mathbf{X}^{T} \mathbf{X})^{-1} E(\mathbf{X}^{T} \mathbf{u})$`.
    So, if `$E(\mathbf{X}^{T} \mathbf{u}) = \mathbf{\nu} \neq 0$`, then
    `$E(\hat{\mathbf{\beta}} - \mathbf{\beta}) = (\mathbf{X}^{T} \mathbf{X})^{-1} \mathbf{\nu} \neq 0$`.

---
### Consequences of Endogeneity

-   When `$\mathbf{X}$` is endogenous, then our estimates of
    `$\hat{\mathbf{\beta}}$` will be a mixture of the desired relationship
    between `$\mathbf{X}$` and `$\mathbf{y}$` *and* the nuisance
    relationship between `$\mathbf{X}$` and `$\mathbf{u}$`.

---
### How Can Endogeneity Arise?

-   Omitted explanatory variables
-   Measurement error on the right-hand side of the model
-   Simultaneity between the right- and left-hand sides of the model
-   etc.

---
### What to Do When Endogeneity Is a Problem?

1.  Give up.

2.  Try to change the model by including all omitted relevant variables.

3.  Find an instrument.

4.  Find other data.

---
### Instrumental Variables

-   Suppose the model is:
    `$\mathbf{y} = \mathbf{W} \mathbf{\gamma} + \mathbf{x} \beta + \mathbf{\epsilon}$`.
    The `$\mathbf{W}$` variables are exogenous, but the `$\mathbf{x}$`
    variable is endogenous.

-   Now, assume that there exists a variable `$z$` that *doesn't* belong
    in the regression model, with the following two characteristics:

-   `$cov(\mathbf{z}^{T} \mathbf{x}) \neq 0$`

-   `$E(\mathbf{z}^{T} \mathbf{\epsilon}) = 0$`

---
### Instrumental Variables

If these conditions are met (doesn't belong in the regression, related
linearly with `$\mathbf{x}$`, no connection with `$\mathbf{\epsilon})$`,
then `$\mathbf{z}$` meets the mathematical definition of an *instrument*.

---
###Three Core Assumptions

1.  Relevance: `$\mathrm{Cov}(Z,X) \neq 0$`
2.  Exclusion restriction: `$Z$` affects `$Y$` only through `$X$`
3.  Independence / exogeneity: `$Z$` is as good as randomly assigned (or at least uncorrelated with unobservables)

---
### Assumptions Aren't Created Equal

- We can typically just test relevance.
- Independence can sometimes have partial or full justification.
- The exclusion restriction is usually exceptionally challenging.

---
### Relevance Failure

---
### Exclusion Restriction Failure
<img src="4instrumentalvariables_files/figure-html/unnamed-chunk-3-1.png" width="70%" />

---
### Exogeneity Failure
<img src="4instrumentalvariables_files/figure-html/unnamed-chunk-4-1.png" width="70%" />

---
### DAGs and Finding Instruments

If we can specify our causal structure well in advance, mathematical graph theory can help identify which variables are instruments.
---

``` r
library(dagitty)
hypotheticalinstruments.dag <- dagitty( "dag { Polarization -> DemocraticErosion ElitePower -> DemocraticErosion Corruption -> Polarization Corruption -> DemocraticErosion SocialMedia -> Polarization PrimaryElections -> Polarization PrimaryElections -> ElitePower EconomicInequality -> ElitePower EconomicInequality -> Polarization Polarization [exposure]
DemocraticErosion [outcome]}" )
```

---

``` r
plot( hypotheticalinstruments.dag )
```

<img src="4instrumentalvariables_files/figure-html/unnamed-chunk-6-1.png" width="50%" />
---

``` r
instrumentalVariables(hypotheticalinstruments.dag)
```

```
##  EconomicInequality |  ElitePower
##  PrimaryElections |  ElitePower
##  SocialMedia
```

---

Obviously, the output from a DAG relies on the causal structure put into the DAG!

---
### Bivariate IV

-   Let's momentarily consider a bivariate regression,
    `$\mathbf{y} = \mathbf{x} \beta + \mathbf{\epsilon}$`, with instrument
    `$\mathbf{z}$`.

-   The OLS estimate of `$\beta$` is
    `$(\mathbf{x}^{T}\mathbf{x})^{-1} \mathbf{x}^{T}\mathbf{y}$`.

---
### Bivariate IV

-   Consider instead the IV estimate of `$\beta$`:
    `$(\mathbf{z}^{T}\mathbf{x})^{-1} \mathbf{z}^{T}\mathbf{y}$`.

-   `$E(\hat{\beta}_{IV}) = E((\mathbf{z}^{T}\mathbf{x})^{-1} \mathbf{z}^{T}\mathbf{y})$`

-   `$E(\hat{\beta}_{IV}) = E((\mathbf{z}^{T}\mathbf{x})^{-1} \mathbf{z}^{T} (\mathbf{x} \beta + \mathbf{\epsilon}))$`

-   `$E(\hat{\beta}_{IV}) = E((\mathbf{z}^{T}\mathbf{x})^{-1} \mathbf{z}^{T} \mathbf{x} \beta) + E((\mathbf{z}^{T}\mathbf{x})^{-1} \mathbf{z}^{T} \mathbf{\epsilon}) = \beta + 0$`

---
### Instrumental Variables

-   Now let's consider a multivariate regression,
    `$\mathbf{Y} = \mathbf{X} \mathbf{\beta} + \mathbf{\epsilon}$`, with
    some `$t \leq k$` of the `$\mathbf{X}$` variables endogenous, and with
    `$t$` instruments `$\mathbf{z}_{1} \ldots \mathbf{z}_{t}$`.

---
### Multivariate IV

-   The OLS estimate of `$\mathbf{\beta}$` is
    `$(\mathbf{X}^{T}\mathbf{X})^{-1} \mathbf{X}^{T}\mathbf{y}$`.

-   Form the matrix `$\mathbf{Z}$`, containing the `$t$` instruments, as
    well as the `$k - t$` exogenous elements from `$\mathbf{X}$`.

-   The IV estimate of `$\mathbf{\beta}$` is:
    `$(\mathbf{Z}^{T}\mathbf{X})^{-1} \mathbf{Z}^{T}\mathbf{y}$`.

---
### Multivariate IV

-   As in the bivariate situation, given the IV assumptions, the IV
    estimator eliminates the problem of endogeneity.

-   This estimator only works if the number of instruments is exactly
    equal to the number of endogenous variables.

---
### A Brief History of Instrumental Variables

-   The method of instrumental variables has surprisingly deep roots in econometrics.

-   **Philip Wright (1928)** is credited with the first explicit use of IV in his book *The Tariff on Animal and Vegetable Oils*.

-   Wright faced a classic simultaneity problem: estimating supply and demand curves for flaxseed oil.

---
### A Brief History of Instrumental Variables

-   Price and quantity are jointly determined.

-   OLS would give a mixture of supply and demand elasticities.

-   Wright's solution: use exogenous shifters of supply (e.g., weather) to trace out the demand curve, and shifters of demand (e.g., tariff changes) to trace out the supply curve.

---
### Wright's Insight

-   Wright realized that if you have a variable that shifts one curve but not the other, you can identify the other curve's parameters.

-   This is exactly the modern IV intuition: an instrument `$Z$` affects `$X$` (e.g., quantity supplied) but has no direct effect on `$Y$` (e.g., price) except through `$X$`.

---
### Wright's Insight

-   Wright's work remained largely unknown for decades; it was rediscovered by econometricians in the 1970s and 1980s.

-   Today, IV is one of the most widely used methods for causal inference with observational data.

### Wright's Flaxseed Oil Example: A Simple Simulation

``` r
# Simulate a supply-demand system with an instrument
set.seed(2026)
n <- 1000

# Instrument: weather shock (shifts supply only)
weather <- rnorm(n)

# Unobserved demand shifter (income shifts demand but NOT supply)
# This is the confounder that makes OLS biased
income <- rnorm(n)

# Structural equations:
# Demand: P = 10 - 0.5*Q + 0.3*income + e_d  (income shifts demand curve)
# Supply:  P = 2  + 0.8*Q - 0.4*weather + e_s (weather shifts supply curve)
#
# Solve for equilibrium Q and P:
# 10 - 0.5*Q + 0.3*income + e_d = 2 + 0.8*Q - 0.4*weather + e_s
# => 1.3*Q = 8 + 0.3*income + 0.4*weather + (e_d - e_s)
# => Q = (8 + 0.3*income + 0.4*weather + u) / 1.3

e_d <- rnorm(n)
e_s <- rnorm(n)

Q <- (8 + 0.3*income + 0.4*weather + (e_d - e_s)) / 1.3
# Equilibrium P from the demand curve (true demand slope = -0.5)
P <- 10 - 0.5*Q + 0.3*income + e_d
```

---

``` r
# OLS of P on Q (biased: income is unobserved, correlated with both Q and P)
summary(lm(P ~ Q))
```

```
## 
## Call:
## lm(formula = P ~ Q)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.60393 -0.50202  0.00584  0.52191  2.34754 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.17201    0.12916  47.786  < 2e-16 ***
## Q            0.11832    0.02071   5.714 1.46e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7515 on 998 degrees of freedom
## Multiple R-squared:  0.03168,	Adjusted R-squared:  0.0307 
## F-statistic: 32.65 on 1 and 998 DF,  p-value: 1.459e-08
```

---

``` r
# IV using weather as instrument for Q
# Weather satisfies exclusion: it enters Q (via supply) but not P directly
library(ivreg)
summary(ivreg(P ~ Q | weather))
```

```
## 
## Call:
## ivreg(formula = P ~ Q | weather)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -4.09368 -0.74977 -0.00313  0.75565  2.90942 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  10.2898     0.6837  15.050  < 2e-16 ***
## Q            -0.5534     0.1114  -4.968 7.95e-07 ***
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value    
## Weak instruments   1 998     76.28  <2e-16 ***
## Wu-Hausman         1 997     87.37  <2e-16 ***
## Sargan             0  NA        NA      NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.077 on 998 degrees of freedom
## Multiple R-Squared: -0.9891,	Adjusted R-squared: -0.9911 
## Wald test: 24.68 on 1 and 998 DF,  p-value: 7.953e-07
```

---
### Wright's Flaxseed Oil Example: A Simple Simulation

-   OLS gives a biased estimate of the demand elasticity (price-quantity relationship).
-   IV using weather (a supply shifter) recovers the demand curve.

---
### Intuition: Two-Stage Least Squares

IV can be thought of as a two-step process:

1. **First stage**: Regress the endogenous X on the instrument Z (and controls). This isolates the exogenous variation in X—the part predicted by Z.
2. **Second stage**: Regress Y on the predicted values from the first stage.

This "purified" X is no longer correlated with the error term, giving us consistent estimates.

The algebra on the next slide shows why this is equivalent to the IV estimator we already derived.

---
### 2SLS

-   Let's partition the independent variables into two matrices,
    `$\mathbf{W}$`, which has the `$k - t$` exogenous variables in the model
    of `$\mathbf{y}$`, and `$\mathbf{X}$`, which has the `$t$` endogenous
    variables.

-   So the `$\mathbf{Z}$` matrix is the `$\mathbf{W}$` matrix with `$t$` extra
    columns containing the instruments.

---
### 2SLS

-   Suppose we regress each column of the `$\mathbf{X}$` matrix on the
    matrix `$\mathbf{Z}$` and form the fitted values.

-   `$\hat{\mathbf{X}} = \mathbf{Z} (\mathbf{Z}^{T} \mathbf{Z})^{-1} \mathbf{Z}^{T} \mathbf{X}$`

-   Now use `$\hat{\mathbf{X}}$` in the place of `$\mathbf{X}$` in the OLS
    regression formula.

---
### 2SLS

$$
`\begin{split}
\hat{\mathbf{\beta}}_{IV} = & (\mathbf{X}^{T}\mathbf{Z} (\mathbf{Z}^{T} \mathbf{Z})^{-1} \mathbf{Z}^{T} \mathbf{Z} (\mathbf{Z}^{T} \mathbf{Z})^{-1} \mathbf{Z}^{T} \mathbf{X})^{-1} \\
& \mathbf{X}^{T}\mathbf{Z} (\mathbf{Z}^{T} \mathbf{Z})^{-1} \mathbf{Z}^{T} \mathbf{y} = \\
& (\mathbf{X}^{T}\mathbf{Z} (\mathbf{Z}^{T} \mathbf{Z})^{-1} \mathbf{Z}^{T} \mathbf{X})^{-1} \\ & \mathbf{X}^{T}\mathbf{Z} (\mathbf{Z}^{T} \mathbf{Z})^{-1} \mathbf{Z}^{T} \mathbf{y} = \\
& (\mathbf{Z}^{T}\mathbf{X})^{-1} \mathbf{Z}^{T}\mathbf{y}
\end{split}`
$$

-   The instrumental variables estimator gives the same coefficient
    estimates as running an OLS regression using `$\hat{\mathbf{X}}$` as predicted by `$\mathbf{Z}$` in
    the place of `$\mathbf{X}$`.

---
### Variance in Instrumental Variables

-   `$\hat{\mathbf{X}}$` is a random variable, so the normal OLS standard
    errors will underestimate uncertainty when using IV.

-   Instead, the correct estimate of the standard errors of the
    coefficient estimates in IV is:

-   `$\hat{V} (\hat{\mathbf{\beta}}_{IV}) = \hat{\sigma}^{2} (\mathbf{Z}^{T} \mathbf{X})^{-1} \mathbf{Z}^{T} \mathbf{Z} (\mathbf{X}^{T} \mathbf{Z})^{-1}$`

---
### Examples of Proposed Instruments

-   Suppose we're interested in the relationship between education and
    some political variable.

-   One proposed instrument for education, due to David Card (1995),
        is residential proximity to a college or university.

-   A second proposed instrument for education, due to Angrist and
        Krueger (1991) is month of birth.

-   A third instrument, from Nguyen et al. (2016), involves genetic
        risk score for years of schooling.

---
### Examples of Proposed Instruments

-   Suppose our focus is on the relationship between economic
    performance and civil war in agricultural countries.

-   Miguel, Satyanath, Sergenti, E. (2004) suggest using rainfall as
        an instrument for economic performance.

---

``` r
library(haven)
mss_repdata_1_ <- read_dta("https://github.com/jnseawright/PS406/raw/main/data/mss_repdata%20(1).dta")
```

---

``` r
library(ivreg)
migueliv <- ivreg(any_prio ~ gdp_g + gdp_g_l + y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest | GPCP_g + GPCP_g_l+ y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest, data=mss_repdata_1_)
summary(migueliv)
```

```
## 
## Call:
## ivreg(formula = any_prio ~ gdp_g + gdp_g_l + y_0 + polity2l + 
##     ethfrac + relfrac + Oil + lpopl1 + lmtnest | GPCP_g + GPCP_g_l + 
##     y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest, 
##     data = mss_repdata_1_)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0098 -0.3114 -0.1342  0.3796  2.0431 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.438746   0.137120  -3.200  0.00143 ** 
## gdp_g       -0.528454   1.517953  -0.348  0.72784    
## gdp_g_l     -2.076062   1.781017  -1.166  0.24413    
## y_0         -0.042668   0.020714  -2.060  0.03977 *  
## polity2l     0.002769   0.003220   0.860  0.39005    
## ethfrac      0.225661   0.090639   2.490  0.01301 *  
## relfrac     -0.236262   0.103205  -2.289  0.02235 *  
## Oil          0.043934   0.056533   0.777  0.43733    
## lpopl1       0.067683   0.017231   3.928 9.38e-05 ***
## lmtnest      0.077338   0.014966   5.168 3.06e-07 ***
## 
## Diagnostic tests:
##                            df1 df2 statistic  p-value    
## Weak instruments (gdp_g)     2 733     8.646 0.000194 ***
## Weak instruments (gdp_g_l)   2 733     5.943 0.002752 ** 
## Wu-Hausman                   2 731     0.744 0.475485    
## Sargan                       0  NA        NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4421 on 733 degrees of freedom
## Multiple R-Squared: 0.01679,	Adjusted R-squared: 0.004723 
## Wald test: 10.27 on 9 and 733 DF,  p-value: 5.189e-15
```

---
### Reading IV Regression Output

Key elements to examine:

1\. **Coefficient on endogenous variable**: `gdp_g` = -0.053 — a 1 percentage point increase in growth reduces conflict probability by 5.3 points

2\. **Standard errors**: We have very large standard errors and aren't getting significant results. This is probably due to country clustering.

---
### Reading IV Regression Output

3\. **First-stage statistics** (not shown here but available via `summary(..., diagnostics=TRUE)`): Check instrument strength

4\. **Model fit**: IV `$R^2$` can be negative—don't interpret as usual

---

``` r
library(lmtest)
library(sandwich)
```

---

``` r
coeftest(migueliv, vcov = vcovCL(migueliv, cluster = ~country_name))
```

```
## 
## t test of coefficients:
## 
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.4387459  0.3532897 -1.2419  0.21468  
## gdp_g       -0.5284537  1.4250511 -0.3708  0.71087  
## gdp_g_l     -2.0760619  1.0241329 -2.0271  0.04301 *
## y_0         -0.0426678  0.0483408 -0.8826  0.37772  
## polity2l     0.0027692  0.0044092  0.6281  0.53016  
## ethfrac      0.2256606  0.2757338  0.8184  0.41339  
## relfrac     -0.2362620  0.2397070 -0.9856  0.32464  
## Oil          0.0439336  0.2123598  0.2069  0.83616  
## lpopl1       0.0676828  0.0498531  1.3576  0.17499  
## lmtnest      0.0773375  0.0385422  2.0066  0.04516 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---
### What's Wrong with Weak Instruments?

-   For an IV estimate of a regression with only one independent
    variable and only one instrument, the IV estimator is:
    `$(\mathbf{z}^{T} \mathbf{x})^{-1} \mathbf{z}^{T} \mathbf{y}$`, which
    is the same as
    `$cov(\mathbf{z}, \mathbf{y})/cov(\mathbf{z}, \mathbf{x})$`.

---
### What's Wrong with Weak Instruments?

-   The `$cov(\mathbf{z}, \mathbf{y})$` may be thought of as a combination
    of three components:

-   the direct effect of `$\mathbf{z}$` on `$\mathbf{y}$`,

-   the indirect effect of `$\mathbf{z}$` on `$\mathbf{y}$` via
        `$\mathbf{x}$`,

-   and any correlation between `$\mathbf{z}$` and `$\mathbf{u}$`.

---
### What's Wrong with Weak Instruments?

-   If `$cov(\mathbf{z}, \mathbf{x})$` is big, then a moderate amount of
    contamination of `$cov(\mathbf{z}, \mathbf{y})$` with undesirable
    information will have only a small effect on the estimate.

-   If `$cov(\mathbf{z}, \mathbf{x})$` is very small, then even a small
    amount of contamination of `$cov(\mathbf{z}, \mathbf{y})$` with
    undesirable information will lead to serious bias in the estimate.

---

``` r
summary(migueliv)
```

---
### What if the exclusion restriction fails?

The bias formula for instrumental variables is:

`$$\mathrm{plim} \hat{\beta}_{IV} = \beta + \frac{\mathrm{Cov}(Z,u)/\mathrm{Var}(Z)}{\mathrm{Cov}(Z,X)/\mathrm{Var}(Z)}$$`
---
### What if the exclusion restriction fails?

- Even a small `$\mathrm{Cov}(Z,u)$` can cause large bias if `$\mathrm{Cov}(Z,X)$` is small (weak instruments).

- If `$\mathrm{Cov}(Z,X)$` is large but `$\mathrm{Cov}(Z,u)$` is also large (invalid instrument) we can also end up with large bias.

---
### Probing the Exclusion Restriction: A Falsification Check

The exclusion restriction requires that rainfall affects conflict only through its effect on economic growth — not directly.
-  We can probe this by asking: does rainfall predict conflict even after we control for growth itself?
The regression below includes both rainfall and growth, plus country-specific time trends to absorb slow-moving country characteristics. If the exclusion restriction holds, the rainfall coefficients (GPCP_g, GPCP_g_l) should be small and statistically insignificant.

---

``` r
miguellm2 <- lm(any_prio ~ gdp_g + gdp_g_l +GPCP_g + GPCP_g_l + y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest + year:country_name, data=mss_repdata_1_)

coef(summary(miguellm2))[1:5, ]
```

```
##                  Estimate  Std. Error    t value     Pr(>|t|)
## (Intercept) -124.01197898 25.51197476 -4.8609322 1.447644e-06
## gdp_g         -0.41076801  0.16198183 -2.5358895 1.143572e-02
## gdp_g_l       -0.08587699  0.15735236 -0.5457623 5.854057e-01
## GPCP_g        -0.02772879  0.05904314 -0.4696362 6.387633e-01
## GPCP_g_l      -0.13151155  0.05968035 -2.2035987 2.788218e-02
```

---

What the Falsification Check Shows

The coefficient on `GPCP_g` (current rainfall) in this model is small in magnitude insignificant at conventional levels, consistent with the exclusion restriction holding. This isn't equally true for `GPCP_g_l`, which has a moderately negative coefficient, and which is significant at the 0.05 level.

-   This may tend to undermine the exclusion restriction.

---
### Diagnostics in Instrumental Variables

-   After estimating an IV model, we must diagnose whether the instrument(s) are valid and strong enough.

---
### Diagnostics in Instrumental Variables

-   Four key diagnostic families:

1.  **First‑stage diagnostics**: relevance of instruments.

2.  **Overidentification tests**: validity when instruments > endogenous variables.

3.  **Weak instrument robust tests**: inference that remains valid with weak instruments.

4.  **Endogeneity tests**: whether IV is actually needed.

---

### First‑Stage Diagnostics: Relevance

-   The first stage is the regression of the endogenous variable(s) on all instruments (and exogenous covariates).

-   For a single endogenous regressor, the **first‑stage F‑statistic** tests whether the instruments jointly have explanatory power.

---

### First‑Stage Diagnostics: Relevance

-   A common rule of thumb (Stock & Yogo, 2005): F < 10 indicates weak instruments → IV bias can be large.

-   Also useful: **partial R²** – the share of variation in the endogenous variable explained by the instruments after controlling for exogenous variables.

---

``` r
# First-stage regression for the Miguel et al. example
# Endogenous: gdp_g (growth), instrument: GPCP_g (rainfall)
# Include all exogenous controls as in the original model

first_stage <- lm(gdp_g ~ GPCP_g + GPCP_g_l + y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest, 
                  data = mss_repdata_1_)

# F-statistic for the instrument (GPCP_g) – use linearHypothesis from car package
library(car)
f_test <- linearHypothesis(first_stage, c("GPCP_g = 0"))
f_test
```

```
## 
## Linear hypothesis test:
## GPCP_g = 0
## 
## Model 1: restricted model
## Model 2: gdp_g ~ GPCP_g + GPCP_g_l + y_0 + polity2l + ethfrac + relfrac + 
##     Oil + lpopl1 + lmtnest
## 
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1    734 3.6941                                  
## 2    733 3.6124  1  0.081616 16.561 5.224e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---

``` r
# Partial R²: proportion of variance explained by instruments after controls
# Full model R²
r2_full <- summary(first_stage)$r.squared

# Model without the instrument
first_stage_noZ <- lm(gdp_g ~ GPCP_g_l + y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest, 
                      data = mss_repdata_1_)
r2_noZ <- summary(first_stage_noZ)$r.squared

# Partial R²
partial_r2 <- r2_full - r2_noZ
partial_r2
```

```
## [1] 0.0220067
```

---

-   The F‑statistic on the instrument is 16.56 with a p‑value of 10^{-4}.

-   This F is well above 10, suggesting the instruments are reasonably strong. (But recall: strength alone does not guarantee validity.)

-   There were weak-instrument test results reported in the main IV table above, in which both instruments were individually significant. This means they may be weak. This is a bit ambiguous, and we may want to treat them with caution.

---

### Overidentification Tests

-   When the number of instruments (`$m$`) exceeds the number of endogenous variables (`$k$`), we have **overidentification**.

-   The extra instruments allow us to test the joint validity of all instruments (the exclusion restriction) – under the assumption that at least `$k$` instruments are valid.

---

### Overidentification Tests

-   Common tests:
    -   **Sargan test** (homoskedastic errors)
    -   **Hansen J test** (heteroskedasticity‑robust)

-   Null hypothesis: all instruments are valid (i.e., uncorrelated with the error term).

-   Rejection implies at least one instrument is invalid, but the test cannot tell us which one.

---

``` r
# The ivreg package can produce Sargan (or Hansen) test with summary()
summary(migueliv, diagnostics = TRUE)
```

---

-   The diagnostic output includes:
    -   **Sargan test**: statistic NA, p‑value NA.

-   With exactly the same number of instruments as endogenous variables, the Sargan test returns NA because it isn't defined. Even if it were available, we would have to remember the test’s limitations: low power, and it assumes at least one instrument is valid.

---

### Weak Instrument Robust Tests

-   If instruments are weak, conventional IV standard errors and confidence intervals can be misleading – coverage rates can be far from nominal.

-   **Weak‑instrument‑robust tests** provide inference that remains valid regardless of instrument strength.

---

### Weak Instrument Robust Tests

-   Popular choices:
    -   **Anderson–Rubin (AR) test**: tests the structural parameter `$\beta$` by examining the reduced form. It is robust to weak instruments but can have low power when many instruments are weak.
    -   **Conditional Likelihood Ratio (CLR) test** (Moreira, 2003): often more powerful than AR.

---

### Weak Instrument Robust Tests

-   In R, the `ivreg` package does not implement these directly, but the `AER::ivreg` function can be used with the `diagnostics = TRUE` option, which includes a weak‑instrument test (Cragg–Donald F) and an AR confidence interval if requested.

---

``` r
# For AR confidence intervals, we can use the AER package with the `AR` option
library(AER)
miguelweakinstiv <- AER::ivreg(any_prio ~ gdp_g + gdp_g_l + y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest | GPCP_g + GPCP_g_l+ y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest, data=mss_repdata_1_, diagnostics=TRUE)

ar_ci <- confint(miguelweakinstiv, type = "AR")
ar_ci
```

```
##                    2.5 %       97.5 %
## (Intercept) -0.707941172 -0.169550681
## gdp_g       -3.508508001  2.451600616
## gdp_g_l     -5.572563560  1.420439858
## y_0         -0.083334265 -0.002001351
## polity2l    -0.003552018  0.009090406
## ethfrac      0.047716701  0.403604533
## relfrac     -0.438873981 -0.033650092
## Oil         -0.067051355  0.154918592
## lpopl1       0.033854607  0.101511022
## lmtnest      0.047956027  0.106719055
```

---

-   The Anderson–Rubin 95% confidence interval for `gdp_g` is `[-3.509, 2.452]`. This interval is valid even if instruments are weak.

-   Compare with the conventional IV confidence interval: -3.509 to 2.452. The AR interval is slightly wider when instruments are weak.

---

### Testing for Endogeneity

-   If a variable is actually exogenous, OLS is more efficient than IV. Testing for endogeneity helps decide whether IV is necessary.

-   **Hausman test** (original): compares OLS and IV estimates. Under the null of exogeneity, both are consistent but OLS is efficient; under the alternative, only IV is consistent. Large differences suggest endogeneity.

---

### Testing for Endogeneity

-   **Durbin–Wu–Hausman (DWH) test** (regression‑based form): easier to implement and robust to heteroskedasticity.

-   Important caveat: the test requires that the instruments are valid. If instruments are invalid, the test can be misleading.

---

``` r
# Regression-based Durbin-Wu-Hausman test
# Step 1: First stage residuals
first_stage_resid <- resid(first_stage)

# Step 2: Include residuals in the structural model
dwh_model <- lm(any_prio ~ gdp_g + gdp_g_l + y_0 + polity2l + ethfrac + relfrac + Oil + lpopl1 + lmtnest + first_stage_resid, 
                data = mss_repdata_1_)

# Test significance of first_stage_resid
summary(dwh_model)$coefficients["first_stage_resid", ]
```

```
##   Estimate Std. Error    t value   Pr(>|t|) 
##  0.3282907  1.4464948  0.2269560  0.8205213
```

---

-   The coefficient on the first‑stage residual is 0.3283 with p = 0.8205. A significant residual indicates that OLS would be inconsistent – i.e., `gdp_g` is endogenous.

-   Here, the residual is not significant (p > 0.05), suggesting that OLS might actually be acceptable for this variable. But this test hinges on instrument validity.

---
### Interpreting the Endogeneity Test

The DWH test fails to reject the null that `gdp_g` is exogenous (p = 0.82).

**This does NOT mean**: 
- The instrument is valid (that's a separate assumption)
- IV is unnecessary (the test has low power)

---
### Interpreting the Endogeneity Test

**This DOES mean**:
- OLS and IV estimates are not statistically distinguishable
- If you believe the instrument is valid, OLS might be preferred for efficiency
- But the reduced form results suggest caution

---
### Summary: IV Diagnostics Checklist

| Diagnostic | Purpose | R implementation |
|------------|---------|------------------|
| First‑stage F | Check instrument strength | `linearHypothesis()` in first‑stage lm | 
| Overidentification (Sargan/Hansen) | Test instrument validity | `summary(ivreg, diagnostics = TRUE)` |

---
### Summary: IV Diagnostics Checklist

| Diagnostic | Purpose | R implementation |
|------------|---------|------------------|
| Weak‑instrument robust CI | Valid inference under weak IV | `confint(..., type = "AR")` | 
| Endogeneity test (DWH) | Check if IV is needed | Regression of residuals |

---
### Summary: IV Diagnostics Checklist

-   Always report first‑stage F and, if overidentified, an overidentification test.
-   Consider weak‑instrument robust inference if F is low.
-   Interpret all tests with caution: they are diagnostic tools, not definitive proof.

---
### Encouragement Designs in Experiments

-   Intent-to-treat analysis

-   Use the treatment assignment as an instrument, the actual treatment
    received as the treatment variable, and the outcome as normal.

---

``` r
library(readr)
peruemotions <- read_csv("https://github.com/jnseawright/PS406/raw/main/data/peruemotions.csv")
```

---

``` r
summary(lm(outsidervote~simpletreat, data=peruemotions))
```

```
## 
## Call:
## lm(formula = outsidervote ~ simpletreat, data = peruemotions)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6093 -0.4916  0.3907  0.5084  0.5084 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.49164    0.02874  17.104   <2e-16 ***
## simpletreat  0.11763    0.04962   2.371   0.0182 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.497 on 448 degrees of freedom
## Multiple R-squared:  0.01239,	Adjusted R-squared:  0.01018 
## F-statistic:  5.62 on 1 and 448 DF,  p-value: 0.01818
```

---

``` r
summary(lm(outsidervote~enojado, data=peruemotions))
```

```
## 
## Call:
## lm(formula = outsidervote ~ enojado, data = peruemotions)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6905 -0.5147  0.3095  0.4853  0.4853 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.51471    0.02463   20.90   <2e-16 ***
## enojado      0.17577    0.08062    2.18   0.0298 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4975 on 448 degrees of freedom
## Multiple R-squared:  0.0105,	Adjusted R-squared:  0.00829 
## F-statistic: 4.753 on 1 and 448 DF,  p-value: 0.02976
```

---

``` r
summary(ivreg(outsidervote~enojado|simpletreat,data=peruemotions))
```

```
## 
## Call:
## ivreg(formula = outsidervote ~ enojado | simpletreat, data = peruemotions)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0804 -0.3716 -0.3716  0.6284  0.6284 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.37162    0.09586   3.877 0.000122 ***
## enojado      1.70882    0.96993   1.762 0.078788 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6688 on 448 degrees of freedom
## Multiple R-Squared: -0.7881,	Adjusted R-squared: -0.7921 
## Wald test: 3.104 on 1 and 448 DF,  p-value: 0.07879
```

---

``` r
summary(lm(outsidervote~enojado+simpletreat, data=peruemotions))
```

```
## 
## Call:
## lm(formula = outsidervote ~ enojado + simpletreat, data = peruemotions)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7439 -0.4807  0.3630  0.5193  0.5193 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.48066    0.02921  16.453   <2e-16 ***
## enojado      0.15639    0.08081   1.935   0.0536 .  
## simpletreat  0.10687    0.04978   2.147   0.0324 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4955 on 447 degrees of freedom
## Multiple R-squared:  0.0206,	Adjusted R-squared:  0.01621 
## F-statistic:   4.7 on 2 and 447 DF,  p-value: 0.009551
```

---
### LATE: What Does IV Actually Estimate?

-   In an encouragement design (e.g., randomized treatment assignment with imperfect compliance), the instrument `$Z$` is random assignment, but the treatment `$X$` is what subjects actually receive.

-   IV does **not** estimate the average treatment effect (ATE) for the whole population—unless certain strong assumptions hold.

-   Instead, under reasonable assumptions, IV estimates the **Local Average Treatment Effect (LATE)**—the effect for a specific subgroup: **compliers**.

---
### Four Compliance Types

-   Imagine a binary treatment `$X$` and a binary instrument `$Z$` (e.g., encouragement).

-   Each individual falls into one of four latent groups based on how they respond to the instrument:

---
### Four Compliance Types

| Type           | Behavior                                                   |
|----------------|------------------------------------------------------------|
| **Compliers**  | `$X=1$` when `$Z=1$`, `$X=0$` when `$Z=0$` (do as encouraged)     |
| **Always‑takers** | `$X=1$` regardless of `$Z$`                                |
| **Never‑takers**  | `$X=0$` regardless of `$Z$`                                |
| **Defiers**    | `$X=0$` when `$Z=1$`, `$X=1$` when `$Z=0$` (do opposite)          |

-   In most encouragement designs, we cannot observe an individual's type directly—we only see one potential treatment status.

---
### Key Assumptions for LATE

1.  **Independence**: `$Z$` is as good as randomly assigned (unconfoundedness).
2.  **Exclusion**: `$Z$` affects the outcome `$Y$` only through the treatment `$X$` (no direct effect).
3.  **First stage**: `$Z$` has a nonzero average effect on `$X$` (relevance).
4.  **Monotonicity**: No defiers exist (or the proportion of defiers is zero).  
    *In a one‑sided encouragement design (e.g., only encouragement can increase treatment), monotonicity is automatically satisfied.*

---
### When Is Monotonicity Plausible?

- Monotonicity requires that the instrument moves everyone in the same direction — no unit does the opposite of what the instrument "pushes."

This is automatically satisfied when:

- The instrument can only increase (or only decrease) treatment — one-sided design (E.g., a lottery that offers treatment to winners; losers can't access it)

- But in most observational IV applications, monotonicity is a substantive assumption that requires justification, not just assertion.

---
### Monotonicity in the MSS Rainfall Example

Does negative rainfall growth always reduce GDP growth in sub-Saharan Africa?

- For rain-fed agricultural economies heavily dependent on staple crops, the answer is plausibly yes — drought hurts output

- But consider: some countries might benefit from low rainfall (reduced flooding, pest control, different crop mixes), or have irrigation infrastructure that inverts the relationship

- A country with a large export sector in water-intensive mining might respond differently than a subsistence-farming economy

---

The monotonicity question here: Is there any country for which more rainfall would reduce GDP growth? If yes, that country is a defier, and the Wald estimator is no longer LATE for compliers — it's a weighted mixture that can be hard to interpret.

**Practical guidance: Look for heterogeneity in the first stage. If the first-stage coefficient on rainfall flips sign in meaningful subsamples (e.g., by region, income level, or crop type), monotonicity may be violated.**

---
### Key Assumptions for LATE

Under these assumptions, the IV estimand (Wald estimator) equals the **average treatment effect for compliers** (LATE).

---
### The Wald Estimator = ITT / Compliance Rate

-   The **Intention‑to‑Treat (ITT)** effect: the effect of being assigned to encouragement on the outcome:
    `$$ITT = E[Y | Z=1] - E[Y | Z=0]$$`

-   The **Compliance rate** (first stage): the effect of encouragement on treatment uptake:
    `$$FS = E[X | Z=1] - E[X | Z=0]$$`

---
### The Wald Estimator = ITT / Compliance Rate

-   The Wald (IV) estimator is the ratio:
    `$$\hat{\beta}_{IV} = \frac{ITT}{FS}$$`

-   Intuition: The ITT is diluted by non‑compliers. Scaling by the compliance rate recovers the effect for those who actually comply with encouragement.

---
### Why LATE? A Decomposition

-   Under monotonicity and exclusion, the ITT can be written as:

`$$\small ITT = \text{(Effect on compliers)} \times \text{(Proportion compliers)}$$`

because always‑takers and never‑takers are unaffected by `$Z$` (exclusion) and defiers are absent.

---
### Why LATE? A Decomposition

-   Hence:
    
`$$\small\text{LATE} = \frac{ITT}{\text{Proportion compliers}} = \frac{ITT}{FS}$$`

-   This shows that IV isolates the treatment effect **only for compliers**—not for always‑takers or never‑takers.

---
### Limitations of LATE

-   The LATE may **not generalize** to the entire population:
    -   Always‑takers might have different treatment effects than compliers.
    -   Never‑takers might also differ.

-   The instrument defines the subpopulation of compliers. Different instruments can identify different LATEs.

---
### Limitations of LATE

-   External validity: we cannot automatically extrapolate the LATE to other settings or populations without additional assumptions.

-   Policy relevance: if a policy targets the same compliers identified by the instrument, LATE is directly useful. Otherwise, caution is needed.

---
### Applying LATE: The Peru Emotions Experiment

Let's work through a concrete example to see how LATE works in practice.

We have:
- **Random assignment** to a treatment that primes anger (`simpletreat`)
- **Actual anger** (`enojado`) is only partially caused by the treatment
- **Outcome**: voting for an outsider candidate (`outsidervote`)

We'll compute each component step by step.

---
### Illustration: Peru Emotions Experiment

-   Recall the Peru emotions example:
    -   **Instrument `$Z$`**: `simpletreat` (randomized encouragement)
    -   **Treatment `$X$`**: `enojado` (anger emotion, possibly endogenous)
    -   **Outcome `$Y$`**: `outsidervote` (voting for an outsider candidate)

-   We suspect that `enojado` is endogenous (unobserved confounders like personality traits). The random assignment `simpletreat` is a potential instrument.

---

``` r
# Load data if not already done
# peruemotions <- read_csv("https://github.com/jnseawright/PS406/raw/main/data/peruemotions.csv")

# Keep only complete cases for simplicity (as in earlier slide)
# Note: We drop missing values for simplicity. In practice, consider multiple imputation or other methods.
peruemotionstrim <- na.omit(data.frame(
  enojado = peruemotions$enojado,
  outsidervote = peruemotions$outsidervote,
  simpletreat = peruemotions$simpletreat,
  Cuzco = peruemotions$Cuzco,
  age = peruemotions$age
))
```

---
### Step 1: Compute the First Stage (Compliance Rate)

``` r
# First stage: effect of instrument on treatment
fs <- lm(enojado ~ simpletreat + Cuzco + age, data = peruemotionstrim)
summary(fs)
```

```
## 
## Call:
## lm(formula = enojado ~ simpletreat + Cuzco + age, data = peruemotionstrim)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.17130 -0.10300 -0.09608 -0.04607  1.00316 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.151607   0.042344   3.580 0.000382 ***
## simpletreat  0.068305   0.029371   2.326 0.020501 *  
## Cuzco       -0.035445   0.029606  -1.197 0.231876    
## age         -0.002210   0.001263  -1.750 0.080903 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2893 on 434 degrees of freedom
## Multiple R-squared:  0.02289,	Adjusted R-squared:  0.01614 
## F-statistic:  3.39 on 3 and 434 DF,  p-value: 0.01804
```

``` r
# Extract the coefficient on simpletreat (compliance rate)
compliance_rate <- coef(fs)["simpletreat"]
compliance_rate
```

```
## simpletreat 
##  0.06830501
```

-   The estimated compliance rate is 0.068.  
    Being in the treatment group increases the probability of feeling anger by about 6.8 percentage points.

---
### Step 2: Compute the Intention‑to‑Treat (Reduced Form)

``` r
# Reduced form: effect of instrument on outcome
itt <- lm(outsidervote ~ simpletreat + Cuzco + age, data = peruemotionstrim)
summary(itt)
```

```
## 
## Call:
## lm(formula = outsidervote ~ simpletreat + Cuzco + age, data = peruemotionstrim)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6314 -0.5216  0.3693  0.4769  0.5837 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.5402185  0.0728637   7.414 6.48e-13 ***
## simpletreat  0.1075342  0.0505405   2.128   0.0339 *  
## Cuzco       -0.0755792  0.0509454  -1.484   0.1387    
## age         -0.0007429  0.0021733  -0.342   0.7326    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4977 on 434 degrees of freedom
## Multiple R-squared:  0.01515,	Adjusted R-squared:  0.008339 
## F-statistic: 2.225 on 3 and 434 DF,  p-value: 0.08462
```

``` r
# ITT coefficient
itt_effect <- coef(itt)["simpletreat"]
itt_effect
```

```
## simpletreat 
##   0.1075342
```

-   The ITT effect is 0.108.  
    Being assigned to the treatment group increases outsider voting by about 0.6 percentage points (not statistically significant? we'll check p-value later).

---
### Step 3: Compute the Wald (IV) Estimate Manually

``` r
# Wald estimator = ITT / compliance rate
wald_iv <- itt_effect / compliance_rate
wald_iv
```

```
## simpletreat 
##    1.574324
```

-   The manual IV estimate is 1.574.  
    This suggests that for compliers, feeling anger increases the probability of outsider voting by about 157 percentage points. This is obviously too high.
-   The small compliance rate (0.068) means we're dividing the ITT by a small number, inflating the point estimate. That same weak first stage produces the large standard error.
---
### Step 4: Compare with `ivreg` Output

``` r
# Run ivreg
library(ivreg)
iv_fit <- ivreg(outsidervote ~ enojado + Cuzco + age | simpletreat + Cuzco + age, 
                data = peruemotionstrim)
summary(iv_fit)
```

```
## 
## Call:
## ivreg(formula = outsidervote ~ enojado + Cuzco + age | simpletreat + 
##     Cuzco + age, data = peruemotionstrim)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0072 -0.3754 -0.3502  0.6301  0.6580 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.301540   0.189017   1.595    0.111
## enojado      1.574324   0.959493   1.641    0.102
## Cuzco       -0.019777   0.072614  -0.272    0.785
## age          0.002736   0.003503   0.781    0.435
## 
## Residual standard error: 0.6454 on 434 degrees of freedom
## Multiple R-Squared: -0.6561,	Adjusted R-squared: -0.6675 
## Wald test: 1.323 on 3 and 434 DF,  p-value: 0.2663
```

``` r
# Extract IV coefficient for enojado
iv_coef <- coef(iv_fit)["enojado"]
iv_coef
```

```
##  enojado 
## 1.574324
```

---

-   The `ivreg` coefficient on `enojado` is 1.574, matching the manual Wald estimate.  
    (Small differences may arise due to covariate adjustment; here we included Cuzco and age in both stages, which is correct.)
-   We can see that there is indeed a very large standard error, suggesting lots of uncertainty and not necessarily a causal effect actually over 100%.

---
### Step 5: Interpret as LATE

-   Under monotonicity (no defiers) and exclusion (simpletreat affects voting only through anger), the IV estimate of 1.57 is the **average treatment effect of anger on outsider voting for compliers**.

---
### Step 5: Interpret as LATE

-   Who are the compliers?  
    People who become angry **only if** they are in the treatment group (i.e., their anger is triggered by the experimental prime). They are not always angry, nor never angry—they respond to encouragement.

-   This effect may differ from the effect on always‑takers (who would be angry regardless) or never‑takers (who never become angry). Thus, we cannot automatically generalize to the whole population.

---
### Summary: LATE in a Nutshell

-   IV with a binary instrument and binary treatment, under monotonicity, identifies the **LATE for compliers**.

-   The Wald estimator is the ratio of ITT to first stage.

---
### Summary: LATE in a Nutshell

-   Always report:
    -   First‑stage strength (F‑statistic, compliance rate)
    -   Interpretation of the complier group
    -   Caveats about external validity

---
### Beyond LATE: Estimating the ATE

-   LATE is useful but limited: it only gives the treatment effect for compliers.

-   Often we want the **Average Treatment Effect (ATE)** for the entire population.

---
### Beyond LATE: Estimating the ATE

-   Aronow and Carnegie (2013) show that under certain assumptions, we can recover the ATE by reweighting using the **compliance score**.

-   This approach is implemented in the `icsw` package (Inverse Compliance Score Weighting).

---
### What Is the Compliance Score?

-   The **compliance score** `$\pi_i$` is the probability that unit `$i$`'s treatment status would be higher under encouragement than under control:

`$$\pi_i = P(X_i(1) > X_i(0) | \mathbf{W}_i)$$`

where `$X_i(1)$` is potential treatment when `$Z=1$`, `$X_i(0)$` when `$Z=0$`, and `$\mathbf{W}_i$` are covariates.

---
### What Is the Compliance Score?

-   For binary treatment and binary instrument, the compliance score is simply the probability of being a **complier**, conditional on covariates.

-   Intuition: Units with high compliance scores are likely compliers; those with low scores are likely always‑takers or never‑takers.

---
### How Does Reweighting Estimate the ATE?

-   Aronow and Carnegie show that if we weight observations by the inverse of the compliance score (for the treated) and by the inverse of `$(1 - \pi_i)$` (for the controls), we can recover the ATE.

-   Intuition: Reweighting creates a pseudo‑population where everyone looks like a complier in terms of their compliance propensity—effectively undoing the LATE restriction.

---
### How Does Reweighting Estimate the ATE?

-   The estimator requires:
    1.  A model for the compliance score (e.g., probit or logit).
    2.  The exclusion restriction and monotonicity (same as LATE).
    3.  No defiers (monotonicity) and correct specification of the compliance score model.

---
### Implementation in R: `icsw` Package

-   The `icsw` package provides functions for inverse compliance score weighting.

-   Key function: `icsw.tsls()` (two‑stage least squares with inverse compliance score weighting).

---
### Implementation in R: `icsw` Package

-   Steps:
    1.  Estimate compliance scores from a model of treatment on instrument and covariates.
    2.  Construct weights: `$w_i = \frac{Z_i}{\hat{\pi}_i} + \frac{1-Z_i}{1-\hat{\pi}_i}$`.
    3.  Run weighted IV or weighted OLS (depending on the estimator).

-   Bootstrap is used for standard errors (because compliance scores are estimated).

---
### Application: Peru Emotions Example

``` r
# Reload the trimmed data if needed
peruemotionstrim <- na.omit(data.frame(
  enojado = peruemotions$enojado,
  outsidervote = peruemotions$outsidervote,
  simpletreat = peruemotions$simpletreat,
  Cuzco = peruemotions$Cuzco,
  age = peruemotions$age
))

# Load icsw (install if needed)
# packageurl <- "http://cran.r-project.org/src/contrib/Archive/icsw/icsw_1.0.0.tar.gz"
# install.packages(packageurl, repos=NULL, type="source")
library(icsw)
```

---

``` r
# Estimate ATE using inverse compliance score weighting
# D = treatment (enojado), Y = outcome (outsidervote)
# Z = instrument (simpletreat), X = covariates (including intercept)
# W = covariates for compliance score model (can be same as X)

exp.reweight <- with(peruemotionstrim, 
                     icsw.tsls(D = enojado, 
                               Y = outsidervote, 
                               Z = simpletreat, 
                               X = cbind(1, Cuzco),      # Exogenous in outcome model
                               W = cbind(Cuzco, age),    # Covariates for compliance score
                               R = 100))                  # Bootstrap replications
```

---
### Interpreting the Results

``` r
# View the ATE estimate and bootstrapped SE
exp.reweight$coefficients
```

```
##                 Cuzco          D 
##  0.5612050 -0.1694386 -0.5144364
```

``` r
exp.reweight$coefs.se.boot
```

```
##                   Cuzco           D 
##   0.2391921   2.4784942 285.3254077
```

---
### Interpreting the Results

-   The ATE estimate for the effect of anger on outsider voting is -0.514.

-   Compare with the LATE estimate from IV: 1.5743.

---

-   The ATE is pretty different from the LATE. This could mean:
    -   Always‑takers and never‑takers have different effects than compliers (or opposite signs).
    -   Sampling variability (a real possibility given the enormous standard error).
    -   Model misspecification in the compliance score.

---
### Comparison: LATE vs. ATE

| Estimand | Estimate | Interpretation |
|----------|----------|----------------|
| LATE (IV) | 1.5743 | Effect for compliers—those whose anger is activated by the prime. |
| ATE (ICSW) | -0.5144 | Effect for the entire population, assuming the compliance score model is correct. |

---
### Comparison: LATE vs. ATE

-   The difference suggests possible treatment effect heterogeneity: compliers may respond differently than always‑takers/never‑takers.

-   But we must interpret with caution: the ATE estimate relies on correct specification of the compliance score model.

---
### Assumptions and Limitations of ICSW

-   **Same as LATE**: exclusion, monotonicity, independence.
-   **Additional assumptions**:
    -   Correctly specified model for the compliance score.
    -   Overlap: compliance scores bounded away from 0 and 1 (no units with `$\pi_i = 0$` or `$1$`).
    -   Consistency of the compliance score estimator.

---
### Assumptions and Limitations of ICSW

-   **Practical limitations**:
    -   Can be sensitive to model choice.
    -   Bootstrap standard errors may be unstable with small samples.
    -   Not yet a standard part of the applied toolkit (use with caution and robustness checks).

---
### When Might You Use ICSW?

-   When you have a strong reason to believe that the LATE is not generalizable and you want to estimate the population ATE.

-   As a robustness check to see if the LATE and ATE differ substantially—suggesting treatment effect heterogeneity.

-   In settings with rich covariates that can reliably predict compliance.

---
### Common Mistake 1: Lagged Dependent Variables as Instruments

-   A tempting but usually invalid instrument: using a lag of an endogenous variable as an instrument for itself.

-   Example: In a panel model with `$Y_{it} = \beta X_{it} + \gamma Y_{i,t-1} + \epsilon_{it}$`, using `$X_{i,t-1}$` as an instrument for `$X_{it}$`.

---
### Common Mistake 1: Lagged Dependent Variables as Instruments

-   Why this fails:
    -   If `$X$` is serially correlated (it usually is), then `$X_{i,t-1}$` is correlated with past shocks, which may persist and affect current `$Y$`.
    -   The exclusion restriction is violated unless the lag is truly exogenous.

-   **When might it work?** In models with strict exogeneity and no serial correlation in errors—rare in practice.

---
### Common Mistake 2: "Many Weak Instruments" Bias

-   Using many weak instruments can actually **increase** bias in 2SLS.

-   With many instruments, the first stage overfits, and 2SLS converges to OLS (which is biased).

---
### Common Mistake 2: "Many Weak Instruments" Bias

-   Rule of thumb: the number of instruments should be small relative to the sample size, and first-stage F should be large.

-   If you have many instruments, consider:
    -   LIML (Limited Information Maximum Likelihood)
    -   Jackknife IV (JIVE)
    -   Weak-instrument robust inference (AR, CLR)

---
### Common Mistake 3: Forgetting to Cluster Standard Errors

-   If the instrument varies at a higher level than the unit of analysis (e.g., state-level policy instrument with individual-level outcomes), standard errors must be clustered at the instrument level.

-   Example: Using state-level minimum wage as an instrument for individual wages—errors are correlated within states.

-   Failure to cluster can lead to severely understated standard errors and false positives.

---

``` r
# In ivreg, clustering is not built-in; use lmtest and sandwich
library(lmtest)
library(sandwich)

# Suppose we have clustered data (e.g., by country in MSS example)
# coeftest(migueliv, vcov = vcovCL, cluster = ~country_name)

# Always think about the level at which treatment/instrument varies!
```

---
### Common Mistake 4: Interpreting IV as ATE Without Justification

-   IV estimates LATE, not ATE, unless:
    -   Treatment effects are constant across units.
    -   There are no always‑takers or never‑takers (unlikely).
    -   The instrument affects everyone (i.e., compliance rate = 1).

-   Researchers often slip into language like "the effect of X on Y" without specifying "for compliers."

---
### Common Mistake 4: Interpreting IV as ATE Without Justification

-   Be precise: "Among those whose treatment status is changed by the instrument, the estimated effect is ..."

---
### Summary: IV Pitfalls to Avoid

| Mistake | Why It's Problematic | Better Approach |
|---------|----------------------|-----------------|
| Lagged DV as instrument | Usually violates exclusion | Use external instruments or GMM |
| Many weak instruments | Increases bias | LIML, JIVE, weak-IV robust tests |

---
### Summary: IV Pitfalls to Avoid

| Mistake | Why It's Problematic | Better Approach |
|---------|----------------------|-----------------|
| Ignoring clustering | SEs too small | Cluster at instrument level |
| Interpreting as ATE | Overgeneralization | Be precise: LATE for compliers |

-   IV is powerful but requires careful justification and diagnostics.
-   When in doubt, remember Wright's original insight: find a genuine exogenous shifter.

---
### When Might 2SLS or IV Be a Good Idea?

-   If there's a true randomization or something close in the world that you can take
    advantage of -- but there still might be down-sides.

-   As a Hausmann test/robustness check when OLS may be biased, to see if your most important results hold up under some alternative assumptions.

-   Never without attention to assumptions.

-   IV is most compelling when the instrument reflects genuine exogenous variation in the world, not when it's chosen instrumentally (pun intended) to satisfy a reviewer.