Instrumental Variables

.title[
# Instrumental Variables
]
.subtitle[
## PS 312
]
.author[
### Jaye Seawright
]
.date[
### 2026-04-29
]

---

## Today's Roadmap

1. **Hook & Activation:** The logic of instrumental variables  
2. **Concept Introduction:** The IV model, relevance, and exclusion  
3. **Instrument on Trial** – Does rainfall pass the test?  
4. **Diagnostics & Assumptions:** Weak instruments and overidentification  
5. **Instruments Brainstorm:** Inventing instruments that might work in political science  
6. **Core Graded Activity:** Write your paragraph for the TA  
7. **Wrap‑Up:** Cheat sheet for IV designs

**Goal:** Move from "I've heard of IV" to "I can design, run, and critically evaluate an instrumental‑variables analysis in R."

---

# 1. Hook & Activation  
### The Logic of Instrumental Variables

---

## Scenario: Returns to Education

You want to know the causal effect of an extra year of schooling on wages. You run a regression:

`$$\text{Wage}_i = \beta_0 + \beta_1 \text{Education}_i + u_i$$`

- Why might `$\beta_1$` be biased? (Hint: Ability is unobserved and correlated with both education and wages.)

If we could find a variable that **affects education** but **has no direct effect on wages** (other than through education), we could use it as an **instrument**. A famous example is **quarter of birth**: people born in different quarters have slightly different compulsory schooling lengths, but birth quarter should not directly affect wages decades later.

---

## The IV Intuition

1. **Instrumental variable (Z):** A variable that is correlated with the endogenous regressor (`X`) but uncorrelated with the error term (`u`).  
2. **First stage:** Regress `X` on `Z` (and any exogenous covariates) to get the predicted values `$\hat{X}$`.  
3. **Second stage:** Regress `Y` on `$\hat{X}$` (and the covariates). The coefficient on `$\hat{X}$` is the **IV estimate**.

The IV estimator isolates variation in `X` that is *as good as random* (driven by `Z`), purging the bias from unobserved confounders.

---

# 2. Concept Introduction  
### The IV Model, Relevance, and Exclusion

---

## The Two IV Assumptions

1. **Relevance:** The instrument must be strongly correlated with the endogenous regressor.  
   - *Check:* First‑stage F‑statistic > 10 (rule of thumb).
2. **Exclusion restriction:** The instrument must affect the outcome **only through** the endogenous regressor.  
   - *Cannot be formally tested*—must be defended theoretically.

For our activity, we will use **election‑day rainfall and snowfall** as instruments for **voter turnout**. Do these instruments satisfy the assumptions?

---

## The 2SLS Estimator in R

We will use the `ivreg()` function from the `AER` package. The formula syntax is:

`y ~ x1 + x2 | z1 + z2 + x2`

- `y` is the outcome.
- `x1` is the **endogenous** regressor (turnout).
- `x2` are **exogenous** covariates (included in both stages).
- `z1` and `z2` are the **instruments** (rain, snow).

`ivreg()` automatically performs two‑stage least squares and can report diagnostics with `summary(..., diagnostics = TRUE)`.

---

# 3. Instrument on Trial  
### Does Rainfall Pass the Test?

---

## The Setup

You've seen the code that loads the data and runs the IV. Now we put the instruments—`Rain` and `Snow`—**on trial**.

**The evidence:** The R output from the analysis (OLS, IV, first‑stage, Sargan test).

---

**Procedure:**
1. (3 min) We need volunteers to act as the prosecution and the defense. Each side will have a few minutes to prepare and to meet with the instructor.
2. (5 min) Defense presents first; Prosecution rebuts.
3. (2 min) Class votes: **Valid** or **Invalid**?

---

## Loading the Data

The dataset `rainfallections.csv` contains county‑level election returns, battleground status, and weather for U.S. presidential elections from 1948 to 2000.

``` r
iv_data <- read_csv("data/rainfallelections.csv")

# Keep relevant variables and remove missing
iv_clean <- iv_data %>%
  select(Year, State, County, Turnout, battleground, Rain, Snow) %>%
  filter(!is.na(Turnout), !is.na(battleground), !is.na(Rain), !is.na(Snow))

glimpse(iv_clean)
```

```
## Rows: 12,378
## Columns: 7
## $ Year         <dbl> 1988, 1992, 1996, 2000, 1988, 1992, 1996, 2000, 1988, 199…
## $ State        <chr> "ALABAMA", "ALABAMA", "ALABAMA", "ALABAMA", "ALABAMA", "A…
## $ County       <chr> "AUTAUGA", "AUTAUGA", "AUTAUGA", "AUTAUGA", "BALDWIN", "B…
## $ Turnout      <dbl> 52.04409, 61.34813, 55.16013, 56.39563, 53.34472, 58.9024…
## $ battleground <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Rain         <dbl> 0.14, 0.00, 0.00, 0.57, 0.00, 0.02, 0.07, 0.46, 0.01, 0.0…
## $ Snow         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
```

---

## Exhibit A: The OLS Regression

``` r
ols_model <- lm(battleground ~ Turnout, data = iv_clean)
summary(ols_model)
```

```
## 
## Call:
## lm(formula = battleground ~ Turnout, data = iv_clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4328 -0.3155 -0.2908  0.6680  0.8033 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.4505992  0.0233417  19.304  < 2e-16 ***
## Turnout     -0.0025390  0.0004046  -6.276 3.59e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4603 on 12376 degrees of freedom
## Multiple R-squared:  0.003172,	Adjusted R-squared:  0.003092 
## F-statistic: 39.39 on 1 and 12376 DF,  p-value: 3.595e-10
```

---

## Exhibit B: The IV Regression (2SLS)

``` r
iv_model <- ivreg(battleground ~ Turnout | Rain + Snow,
                  data = iv_clean)
summary(iv_model, diagnostics = TRUE)
```

```
## 
## Call:
## ivreg(formula = battleground ~ Turnout | Rain + Snow, data = iv_clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6051 -0.3280 -0.2694  0.6331  0.9530 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.647310   0.100455   6.444 1.21e-10 ***
## Turnout     -0.006003   0.001768  -3.396 0.000685 ***
## 
## Diagnostic tests:
##                    df1   df2 statistic p-value    
## Weak instruments     2 12375    344.19  <2e-16 ***
## Wu-Hausman           1 12375      4.08  0.0434 *  
## Sargan               1    NA    241.42  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4617 on 12376 degrees of freedom
## Multiple R-Squared: -0.002734,	Adjusted R-squared: -0.002815 
## Wald test: 11.53 on 1 and 12376 DF,  p-value: 0.0006853
```
---

**Defense highlights:**
- First‑stage F‑statistic > 10 → instruments are **relevant**.
- Wu‑Hausman test significant → turnout is indeed endogenous; IV is justified.

**Prosecution highlights:**
- Sargan test p‑value is small → **overidentification rejected**; at least one instrument may be invalid.
- The coefficient changed quite a bit from OLS, which could be a sign of weak‑instrument bias even if F > 10.

---

## Exhibit C: The First Stage (Relevance)

``` r
first_stage <- lm(Turnout ~ Rain + Snow, data = iv_clean)
summary(first_stage)
```

```
## 
## Call:
## lm(formula = Turnout ~ Rain + Snow, data = iv_clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -49.139  -6.835  -0.079   6.485  43.876 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  56.1241     0.1035 542.243   <2e-16 ***
## Rain          0.7670     0.3777   2.031   0.0423 *  
## Snow          5.0335     0.1954  25.760   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.955 on 12375 degrees of freedom
## Multiple R-squared:  0.0527,	Adjusted R-squared:  0.05254 
## F-statistic: 344.2 on 2 and 12375 DF,  p-value: < 2.2e-16
```

---

## Exhibit D: The Exclusion Restriction (Conceptual)

*No table for this one—it's about theory.*

**Defense needs to argue that...:**
> Rainfall on Election Day affects battleground status **only** by reducing turnout.

**Prosecution needs to argue that...:**
> Weather could affect long‑term political competitiveness in some way other than by reducing turnout. (Ideally, provide one or more examples of such an effect.)

---

## Your Verdict

After hearing both sides, **raise your hand**:

- **Valid:** The instruments are credible.
- **Invalid:** The instruments fail the relevance or exclusion requirement.

---

## Why This Matters for Your Project

In your own analysis (using `GOPVoteShare` as the outcome), you will face the **exact same debate**. You must decide whether to trust the IV results. The tools you just used—first‑stage F, Sargan test, theoretical reasoning—are what you'll apply to defend your own conclusions.

**Remember:** No instrument is perfect. The goal is to be **transparent** about the threats and to show that your results are robust to reasonable challenges.

---

# 4. Diagnostics & Assumptions  
### Weak Instruments and Overidentification

---

## Weak Instrument Test

A weak instrument leads to biased IV estimates and inflated standard errors. The rule of thumb: **first‑stage F‑statistic > 10**.

We can examine the first stage directly (as in Exhibit C) or extract the F‑statistic from the `ivreg` diagnostics.

``` r
# Extract first‑stage F‑statistic from ivreg object
summary(iv_model, diagnostics = TRUE)$diagnostics["Weak instruments", "statistic"]
```

```
## [1] 344.1884
```

If this value is less than 10, the instruments are **weak** and the IV estimates should be interpreted with caution.

---

## Overidentification Test (Sargan)

When we have **more instruments than endogenous regressors**, we can test whether the instruments are uncorrelated with the second‑stage error term.

``` r
summary(iv_model, diagnostics = TRUE)$diagnostics["Sargan", "p-value"]
```

```
## [1] 1.928659e-54
```

- **Null hypothesis:** The instruments are valid (exclusion restriction holds).
- A small p‑value (< 0.05) suggests that at least one instrument may be invalid.

---

## Robustness Checks

Do the results change if we use only one instrument (`Rain`)? What if we add controls like year fixed effects or state fixed effects?

``` r
# IV with only rain
iv_rain <- ivreg(battleground ~ Turnout | Rain, data = iv_clean)
summary(iv_rain, diagnostics = TRUE)
```

```
## 
## Call:
## ivreg(formula = battleground ~ Turnout | Rain, data = iv_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -6.18378 -0.96637  0.03676  0.99365  7.02417 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -7.41550    1.70568  -4.348 1.39e-05 ***
## Turnout      0.13599    0.03004   4.527 6.03e-06 ***
## 
## Diagnostic tests:
##                    df1   df2 statistic  p-value    
## Weak instruments     1 12376     23.56 1.23e-06 ***
## Wu-Hausman           1 12375    227.28  < 2e-16 ***
## Sargan               0    NA        NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.49 on 12376 degrees of freedom
## Multiple R-Squared: -9.441,	Adjusted R-squared: -9.442 
## Wald test:  20.5 on 1 and 12376 DF,  p-value: 6.029e-06
```

``` r
# With state fixed effects
iv_fe <- ivreg(battleground ~ Turnout + factor(State) | Rain + Snow + factor(State),
               data = iv_clean)
summary(iv_fe, diagnostics = TRUE)
```

```
## 
## Call:
## ivreg(formula = battleground ~ Turnout + factor(State) | Rain + 
##     Snow + factor(State), data = iv_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -6.49020 -0.62691  0.03098  0.65917  5.42817 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 -7.02956    1.63083  -4.310 1.64e-05 ***
## Turnout                      0.13532    0.03137   4.314 1.62e-05 ***
## factor(State)ARIZONA         0.36824    0.15708   2.344 0.019083 *  
## factor(State)ARKANSAS        0.37436    0.09382   3.990 6.65e-05 ***
## factor(State)CALIFORNIA     -0.31592    0.21165  -1.493 0.135552    
## factor(State)COLORADO       -1.34480    0.43763  -3.073 0.002125 ** 
## factor(State)CONNECTICUT    -1.30275    0.36117  -3.607 0.000311 ***
## factor(State)DELAWARE       -0.17606    0.31518  -0.559 0.576438    
## factor(State)FLORIDA         0.59620    0.09451   6.308 2.92e-10 ***
## factor(State)GEORGIA         1.69311    0.28713   5.897 3.81e-09 ***
## factor(State)IDAHO          -1.83457    0.43752  -4.193 2.77e-05 ***
## factor(State)ILLINOIS       -0.86483    0.27152  -3.185 0.001450 ** 
## factor(State)INDIANA        -0.40575    0.12681  -3.200 0.001380 ** 
## factor(State)IOWA           -1.06112    0.31527  -3.366 0.000766 ***
## factor(State)KANSAS         -1.52298    0.36263  -4.200 2.69e-05 ***
## factor(State)KENTUCKY        0.67536    0.09055   7.458 9.37e-14 ***
## factor(State)LOUISIANA      -0.54375    0.25906  -2.099 0.035840 *  
## factor(State)MAINE          -1.66374    0.52278  -3.182 0.001464 ** 
## factor(State)MARYLAND        0.12205    0.12912   0.945 0.344559    
## factor(State)MASSACHUSETTS  -1.42457    0.36507  -3.902 9.58e-05 ***
## factor(State)MICHIGAN       -0.42281    0.28545  -1.481 0.138573    
## factor(State)MINNESOTA      -1.99231    0.52688  -3.781 0.000157 ***
## factor(State)MISSISSIPPI    -0.16845    0.09556  -1.763 0.077950 .  
## factor(State)MISSOURI       -0.07383    0.20765  -0.356 0.722182    
## factor(State)MONTANA        -2.38652    0.61866  -3.858 0.000115 ***
## factor(State)NEBRASKA       -1.69028    0.40091  -4.216 2.50e-05 ***
## factor(State)NEVADA         -0.40208    0.20865  -1.927 0.053993 .  
## factor(State)NEW HAMPSHIRE  -0.69276    0.32967  -2.101 0.035630 *  
## factor(State)NEW JERSEY      0.14863    0.19229   0.773 0.439561    
## factor(State)NEW MEXICO     -0.53853    0.31996  -1.683 0.092372 .  
## factor(State)NEW YORK       -0.35987    0.16940  -2.124 0.033658 *  
## factor(State)NORTH CAROLINA  0.94337    0.13249   7.120 1.14e-12 ***
## factor(State)NORTH DAKOTA   -2.04490    0.48393  -4.226 2.40e-05 ***
## factor(State)OHIO           -0.03526    0.20101  -0.175 0.860779    
## factor(State)OKLAHOMA       -0.78865    0.20310  -3.883 0.000104 ***
## factor(State)OREGON         -1.40650    0.42100  -3.341 0.000838 ***
## factor(State)PENNSYLVANIA    0.96751    0.10447   9.261  < 2e-16 ***
## factor(State)RHODE ISLAND   -0.82478    0.31118  -2.650 0.008048 ** 
## factor(State)SOUTH CAROLINA  0.97478    0.24768   3.936 8.34e-05 ***
## factor(State)SOUTH DAKOTA   -2.43195    0.57119  -4.258 2.08e-05 ***
## factor(State)TENNESSEE       1.13702    0.17013   6.683 2.44e-11 ***
## factor(State)TEXAS          -0.01199    0.09476  -0.127 0.899287    
## factor(State)UTAH           -1.84149    0.44281  -4.159 3.22e-05 ***
## factor(State)VERMONT        -1.58280    0.39855  -3.971 7.19e-05 ***
## factor(State)VIRGINIA        0.04395    0.07980   0.551 0.581780    
## factor(State)WASHINGTON     -0.81333    0.32259  -2.521 0.011707 *  
## factor(State)WEST VIRGINIA   0.56074    0.12031   4.661 3.18e-06 ***
## factor(State)WISCONSIN      -1.23416    0.41193  -2.996 0.002741 ** 
## factor(State)WYOMING        -1.63522    0.40009  -4.087 4.39e-05 ***
## 
## Diagnostic tests:
##                    df1   df2 statistic  p-value    
## Weak instruments     2 12328    10.475 2.85e-05 ***
## Wu-Hausman           1 12328   141.359  < 2e-16 ***
## Sargan               1    NA     7.825  0.00515 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.059 on 12329 degrees of freedom
## Multiple R-Squared: -4.258,	Adjusted R-squared: -4.278 
## Wald test: 15.73 on 48 and 12329 DF,  p-value: < 2.2e-16
```

---

# 5. Instruments Brainstorm  
### Inventing Instruments That Might Work in Political Science

---

## The Challenge

Finding a valid instrument is **hard**. The best instruments come from **institutional quirks, natural experiments, or random assignment** that is plausibly exogenous.

In small groups, take **5 minutes** to brainstorm an instrument for one of the following research questions:

---

## Some Classic Examples (For Inspiration)

---

## Share Out

Each group will share their **best instrument idea** and defend why it satisfies:

1. **Relevance** – Strongly correlated with the endogenous regressor.  
2. **Exclusion** – Affects the outcome *only through* that regressor.

The class will vote on the **most creative but credible instrument**.

---

# 6. Core Graded Activity  
### Write Your Paragraph for the TA

---

## Instructions

**By the end of class today, email your TA a short paragraph that includes:**

1. Your **research question** (one sentence).  
2. A brief description of the **IV design** (endogenous regressor, instruments, outcome).  
3. The **key result** (the IV coefficient from `ivreg()`) and how it differs from the OLS estimate.  
4. An assessment of the **instrument strength** (first‑stage F‑statistic) and what it implies.  
5. **One critical reflection** on the validity of rainfall/snowfall as instruments (exclusion restriction).

---

## Example Paragraph (for a different question)

> *Our group asks: Does education increase wages? We use quarter of birth as an instrument for years of schooling. The OLS estimate suggests a 10% return per year of education, but the IV estimate is only 7% (p < 0.01). The first‑stage F‑statistic is 12.3, above the rule‑of‑thumb threshold of 10, suggesting the instrument is adequately strong. The exclusion restriction requires that birth quarter affects wages only through schooling; while this is plausible, critics note that birth quarter may correlate with season‑of‑birth effects on health. Overall, the IV results support a causal effect of education, albeit smaller than OLS suggests.*

---

## Reminders

- One submission per student.

---

# 7. Wrap‑Up

> **Single most important rule for IV:** Always report the first‑stage F‑statistic. A weak instrument can be worse than no instrument at all.