3: Natural Experiments: Conceptual Introduction

class: center, middle, inverse, title-slide

.title[
# 3: Natural Experiments: Conceptual Introduction
]
.subtitle[
## Quantitative Causal Inference
]
.author[
### <large>Jaye Seawright</large>
]
.institute[
### <small>Northwestern Political Science</small>
]
.date[
### April 16 and 21, 2026
]

---

class: center, middle

pre[class] {
  max-height: 200px;
}
</style>

### What Is a Natural Experiment?

- A **natural experiment** is an observational study in which the treatment (or exposure) of interest is assigned by **factors outside the researcher’s control**, in a way that mimics random assignment.

- The researcher does **not** manipulate the treatment – it occurs naturally, due to:
  - Policy changes,
  - Geographic or historical circumstances,
  - Random shocks (weather, lotteries, etc.),
  - Institutional rules (e.g., cutoffs).

---
### Natural Experiments vs. True Experiments

| Feature | True Experiment | Natural Experiment |
|--------|-----------------|---------------------|
| Treatment assigned by | Researcher | Nature, policy, accident |
| Randomization | Controlled by researcher | "As‑if random" – must be argued |
| Setting | Often lab or field | Real‑world |

---
### Natural Experiments vs. True Experiments

| Feature | True Experiment | Natural Experiment |
|--------|-----------------|---------------------|
| Internal validity | High (by design) | Requires strong assumptions |
| External validity | May be limited | Often high (real‑world context) |

---

- In a true experiment, randomization ensures that treatment and control groups are comparable **on average**.
- In a natural experiment, we must **defend** the claim that treatment is as‑if randomly assigned.

---
### Key Features of a Natural Experiment

1. **Exogenous variation**: The source of variation in treatment is unrelated to potential outcomes (conditional on observables).

2. **No researcher manipulation**: The treatment arises from real‑world processes – policy discontinuities, weather, lotteries, historical events.

3. **Real‑world context**: Results often generalize better than lab experiments, but assumptions are harder to verify.

---
### Why Are Natural Experiments Valuable?

- **Ethics**: We cannot randomly assign people to war, colonization, or unemployment.
- **Feasibility**: Many interesting treatments (e.g., living in a democracy) are impossible to randomize.
- **External validity**: Natural experiments study real populations in real settings.
- They provide a bridge between observational studies and experiments, using **design** rather than statistical adjustment to identify causal effects.

---
### The Core Assumption: As‑If Random

The central claim of any natural experiment is that treatment assignment is **"as‑if random"** – i.e., independent of potential outcomes, at least after conditioning on observable covariates.

This assumption is **not testable directly**; we can only provide indirect evidence:
- Balance tests (compare covariates across treatment and control).
- Placebo tests (check for effects where none should exist).
- Institutional knowledge (explain why assignment is haphazard).

---
### Five Types of Natural Experiments

We will explore five common designs:

1. **Classic natural experiment**: A single treatment is as‑if randomly assigned (e.g., Snow’s cholera study, draft lottery).
2. **Instrumental‑variables natural experiment**: An as‑if random variable affects treatment, but not outcome directly (e.g., settler mortality as instrument for institutions).
3. **Regression‑discontinuity design**: Treatment is determined by a cutoff on a continuous variable (e.g., Maimonides' rule).

---
### Five Types of Natural Experiments

4\.  **Difference-in-differences design**: treatment changes at unpredictable or meaningless times in some units but not others (e.g., minimum-wage study)

5\.  **Synthetic-control design**: weight control cases to estimate the trajectory over time of the treatment case if it were in the control condition (e.g., Brexit study)

---
### Classic Natural Experiment

1.  "Nature" randomizes the treatment.

-   No discretion is involved in assigning treatments, or the
        relevant information is unavailable or unused.

2.  Randomized treatment has the same effect as non-randomized treatment
    would have.

---
### Snow: the Most Famous Natural Experiment
<img src="images/jonsnow.png" width="100%" style="display: block; margin: auto;" />

---
### Snow: the Most Famous Natural Experiment
<img src="images/johnsnow.png" width="100%" style="display: block; margin: auto;" />

---
### Snow: the Most Famous Natural Experiment
<img src="images/snowpubandpump.png" width="50%" style="display: block; margin: auto;" />

---
### Snow on Cholera
<img src="images/snowcrossdistrict.jpg" width="40%" style="display: block; margin: auto;" />

---
''Although the facts shown in the above table afford very strong evidence
of the powerful influence which the drinking of water containing the
sewage of a town exerts over the spread of cholera, when that disease is
present, yet the question does not end here; for the intermixing of the
water supply of the Southwark and Vauxhall Company with that of the
Lambeth Company, over an extensive part of London, admitted of the
subject being sifted in such a way as to yield the most incontrovertible
proof on one side or the other.''

---
''In the sub-districts enumerated in the above table as being supplied by
both Companies, the mixing of the supply is of the most intimate kind.
The pipes of each Company go down all the streets, and into nearly all
the courts and alleys. A few houses are supplied by one Company and a
few by the other, according to the decision of the owner or occupier at
that time when the Water Companies were in active competition. In many
cases a single house has a supply different from that on either side.
Each company supplies both rich and poor, both large houses and small;
there is no difference either in the condition or occupation of the
persons receiving the water of the different Companies.''

---

---

``` r
snowtable8 <- read_csv("https://github.com/jnseawright/PS406/raw/main/data/snowtable8.csv")
snowlm <- lm(deathsOverall ~ supplier, data=snowtable8)
```
---

``` r
summary(snowlm)
```

```
## 
## Call:
## lm(formula = deathsOverall ~ supplier, data = snowtable8)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -69.333 -16.896  -1.625  16.812  54.667 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           4.50      14.93   0.302 0.765175    
## supplierSouthwarkVauxhall            65.83      17.23   3.820 0.000651 ***
## supplierSouthwarkVauxhall_Lambeth    36.25      16.69   2.172 0.038139 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29.85 on 29 degrees of freedom
## Multiple R-squared:  0.3575,	Adjusted R-squared:  0.3132 
## F-statistic:  8.07 on 2 and 29 DF,  p-value: 0.001636
```

---

``` r
snowlm2 <- lm(I(log(pop1851))~ supplier, data=snowtable8)
summary(snowlm2)
```

```
## 
## Call:
## lm(formula = I(log(pop1851)) ~ supplier, data = snowtable8)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90380 -0.16029 -0.00353  0.28874  0.80617 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         8.3014     0.1886  44.019  < 2e-16 ***
## supplierSouthwarkVauxhall           1.1731     0.2178   5.387 8.65e-06 ***
## supplierSouthwarkVauxhall_Lambeth   1.5134     0.2108   7.178 6.70e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3772 on 29 degrees of freedom
## Multiple R-squared:  0.6402,	Adjusted R-squared:  0.6153 
## F-statistic:  25.8 on 2 and 29 DF,  p-value: 3.661e-07
```

---
### Vietnam War Draft Lottery

---
<img src="images/draftrandom.JPG" width="100%" style="display: block; margin: auto;" />

---

``` r
draft1970 <- read_csv("https://github.com/jnseawright/PS406/raw/main/data/draft1970.csv")
```
---

``` r
boxplot(rank~month, data=draft1970)
```

---

``` r
draftlm <- lm(rank ~ day, data=draft1970)
```
---

``` r
summ(draftlm)
```

<table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;">
<tbody>
  <tr>
   <td style="text-align:left;font-weight: bold;"> Observations </td>
   <td style="text-align:right;"> 366 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> Dependent variable </td>
   <td style="text-align:right;"> rank </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> Type </td>
   <td style="text-align:right;"> OLS linear regression </td>
  </tr>
</tbody>
</table> <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;">
<tbody>
  <tr>
   <td style="text-align:left;font-weight: bold;"> F(1,364) </td>
   <td style="text-align:right;"> 19.54 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> R² </td>
   <td style="text-align:right;"> 0.05 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> Adj. R² </td>
   <td style="text-align:right;"> 0.05 </td>
  </tr>
</tbody>
</table> <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;border-bottom: 0;">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> Est. </th>
   <th style="text-align:right;"> S.E. </th>
   <th style="text-align:right;"> t val. </th>
   <th style="text-align:right;"> p </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;font-weight: bold;"> (Intercept) </td>
   <td style="text-align:right;"> 224.91 </td>
   <td style="text-align:right;"> 10.81 </td>
   <td style="text-align:right;"> 20.80 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> day </td>
   <td style="text-align:right;"> -0.23 </td>
   <td style="text-align:right;"> 0.05 </td>
   <td style="text-align:right;"> -4.42 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
</tbody>
<tfoot><tr><td style="padding: 0; " colspan="100%">
<sup></sup> Standard errors: OLS</td></tr></tfoot>
</table>

---

What does this tell us about assumptions of randomization?

---

``` r
draft1971 <- read_csv("https://github.com/jnseawright/PS406/raw/main/data/draft1971.csv")
```
---

``` r
boxplot(rank~month, data=draft1971)
```

---

``` r
draft71lm <- lm(rank ~ day, data=draft1971)
```
---

``` r
summ(draft71lm)
```

<table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;">
<tbody>
  <tr>
   <td style="text-align:left;font-weight: bold;"> Observations </td>
   <td style="text-align:right;"> 365 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> Dependent variable </td>
   <td style="text-align:right;"> rank </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> Type </td>
   <td style="text-align:right;"> OLS linear regression </td>
  </tr>
</tbody>
</table> <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;">
<tbody>
  <tr>
   <td style="text-align:left;font-weight: bold;"> F(1,363) </td>
   <td style="text-align:right;"> 0.07 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> R² </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> Adj. R² </td>
   <td style="text-align:right;"> -0.00 </td>
  </tr>
</tbody>
</table> <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;border-bottom: 0;">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> Est. </th>
   <th style="text-align:right;"> S.E. </th>
   <th style="text-align:right;"> t val. </th>
   <th style="text-align:right;"> p </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;font-weight: bold;"> (Intercept) </td>
   <td style="text-align:right;"> 180.39 </td>
   <td style="text-align:right;"> 11.08 </td>
   <td style="text-align:right;"> 16.28 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> day </td>
   <td style="text-align:right;"> 0.01 </td>
   <td style="text-align:right;"> 0.05 </td>
   <td style="text-align:right;"> 0.27 </td>
   <td style="text-align:right;"> 0.79 </td>
  </tr>
</tbody>
<tfoot><tr><td style="padding: 0; " colspan="100%">
<sup></sup> Standard errors: OLS</td></tr></tfoot>
</table>

---

Were randomization issues fixed?

---
<img src="images/angrist.JPG" width="85%" style="display: block; margin: auto;" />

---
### Lottery Winners and Political Attitudes

---
<img src="images/Doherty1.png" width="60%" style="display: block; margin: auto;" />

---
<img src="images/Doherty2.png" width="80%" style="display: block; margin: auto;" />

---
<img src="images/Doherty3.png" width="100%" style="display: block; margin: auto;" />
---

Do we think that randomized earnings have the same causal effect as nonrandomized earnings?

---

If not, this may be a SUTVA violation; different individuals may be receiving causally different versions of the treatment.

---
### Classic Natural Experiments: Summary

**The Big Idea**: Treatment is **"as‑if randomly assigned"** by nature, policy, or accident.

**In Potential Outcomes Terms**:

`$$(Y_i(1), Y_i(0)) \perp\!\!\!\perp D_i \mid X_i$$`

- Treatment assignment $D_i$ is independent of potential outcomes, **conditional on covariates** $X_i$ (if needed).
- This is exactly the **unconfoundedness (selection on observables)** assumption from Week 2.

---

### What This Means

- If the natural experiment is credible, we don't need to control for many (or any) covariates – the "as‑if random" assignment does the work.
- In practice, we often check balance on pre‑treatment covariates to support the claim.
- The estimand is the **Average Treatment Effect (ATE)** – the same as in a randomized experiment.

---

### Key Conditions for a Valid Classic Natural Experiment

1. **Ignorability / Unconfoundedness**: Treatment is independent of potential outcomes (given covariates, if any).

2. **Overlap**: For every level of covariates, there are both treated and control units.

3. **SUTVA**: No interference, no hidden treatment variations (as in experiments).

---

### Takeaway

When nature randomizes, we can analyze the data **as if it came from an experiment** – using difference in means, regression with few controls, or randomization inference.

The burden of proof is on the researcher to convince us that the assignment really is **as‑if random**.

---
### IV Natural Experiment

1.  "Nature" randomizes a cause of the treatment.

-   Call the treatment `$X$`.

-   Call the randomized cause of the treatment `$Z$`.

2.  `$Z$` only affects `$Y$` through its effects on `$X$`.

3.  Treatment caused by the randomized cause has the same effect as
    treatment with any other cause would have.

---
### IV Natural Experiment: Colonial Origins of Development

- **Causal question**: Do institutions cause economic development?
- **Problem**: Institutions are endogenous – countries with better institutions may differ in unobserved ways (culture, geography, human capital).
- **Proposed solution**: Use **settler mortality** as an instrument for institutions.

---
### The Acemoglu, Johnson & Robinson (2001) IV Setup

![](3naturalexperiments_files/figure-html/ajr_dag-1.png)

---

- **Instrument** (`$Z$`): Settler mortality rates among European colonists in the 17th–19th centuries.
- **Treatment** (`$D$`): Quality of institutions today (e.g., protection against expropriation).
- **Outcome** (`$Y$`): Log GDP per capita today.
- **Confounder** (`$U$`): Unobserved factors like culture, geography, human capital.

---
### The Logic: Why Settler Mortality?

AJR's argument:

1. **Where settlers faced high mortality**, they did not settle permanently. Instead, they set up **extractive institutions** (to extract resources) that persisted over time.
2. **Where settlers faced low mortality**, they settled in large numbers and established **institutions protecting property rights** – again, persisting to the present.
3. Settler mortality in the 1600s–1800s might be **exogenous** to modern economic development, except through its effect on institutions.

---
### How the IV Assumptions Are Met (or Argued)

| Assumption | Meaning | In the AJR Setting |
|------------|---------|---------------------|
| **Relevance** | `$Z$` predicts `$D$` | High settler mortality → extractive institutions → weaker property rights today. First stage is strong. |

---
### How the IV Assumptions Are Met (or Argued)

| Assumption | Meaning | In the AJR Setting |
|------------|---------|---------------------|
| **Exclusion** | `$Z$` affects `$Y$` only through `$D$` | Settler mortality centuries ago has no direct effect on modern GDP, except via the institutions it shaped. (Must defend: no effect through culture, disease environment, etc.) |

---
### How the IV Assumptions Are Met (or Argued)

| Assumption | Meaning | In the AJR Setting |
|------------|---------|---------------------|
| **Independence** | `$Z$` is as‑good‑as randomly assigned | Conditional on covariates? Settler mortality is argued to be a historical accident, not correlated with other determinants of development. |

---
### How the IV Assumptions Are Met (or Argued)

| Assumption | Meaning | In the AJR Setting |
|------------|---------|---------------------|
| **Monotonicity** | No defiers | Implicit: Higher mortality doesn't lead to better institutions in some places and worse in others – the effect is consistent in direction. |

---
### The Exclusion Restriction: The Key Challenge

The exclusion restriction is the most debated assumption in AJR:

- Could high settler mortality affect development **through other channels**?
  - **Disease environment**: High mortality areas may still have malaria, etc., affecting health and productivity today.
  - **Culture**: Different settlement patterns may have created different cultural legacies.
  - **Human capital**: Settlers brought skills; extractive colonies did not.

---

AJR address these concerns by:
- Controlling for contemporary malaria risk.
- Using other instruments (e.g., population density in 1500).
- Showing robustness across specifications.

---
### What Does the IV Estimate?

Under the assumptions, the IV estimate identifies the **Local Average Treatment Effect (LATE)** for *compliers* – countries whose institutions were changed by settler mortality conditions.

These are countries that:
- Would have had good institutions if mortality was low.
- Would have had bad institutions if mortality was high.
- (Not countries that would have had the same institutions regardless.)

---

This is exactly the set of countries where colonial settlement patterns mattered – arguably the relevant group for testing the institutional theory.

---

---

---

---

---
### Why This Is a Famous IV Natural Experiment

- The instrument is **historical** – clearly pre-dating modern outcomes.
- The logic is **theoretically grounded** in theories of colonialism and institutional persistence.
- The assumptions are **explicitly stated and debated** – a model of transparent causal inference.
- It spawned an entire literature on institutions and development, and remains a standard teaching example for IV.

---
### RDD

1.  There is an assignment variable, `$Z$`.

2.  Cases are assigned to treatment if and only if `$Z$` is greater than a
    predetermined threshold value, `$T$`.

3.  There are enough cases that lots have scores of `$Z$` that are just
    above and just below `$T$`.