4: Natural Experiments

class: center, middle, inverse, title-slide

.title[
# 4: Natural Experiments
]
.subtitle[
## Quantitative Causal Inference
]
.author[
### <large>J. Seawright</large>
]
.institute[
### <small>Northwestern Political Science</small>
]
.date[
### April 24, 2025
]

---

class: center, middle

pre[class] {
  max-height: 200px;
}
</style>

---
### Typology of Natural Experiments

-   Classic Natural Experiment

-   Instrumental Variables-Type Natural Experiment

-   Regression-Discontinuity Design

---
### Key Ideas for Natural Experiments

The cause of the cause, and the cause of the cause of the cause

---
### Classic Natural Experiment

1.  "Nature" randomizes the treatment.

-   No discretion is involved in assigning treatments, or the
        relevant information is unavailable or unused.

2.  Randomized treatment has the same effect as non-randomized treatment
    would have.

---
### Snow: the Most Famous Natural Experiment
<img src="images/jonsnow.png" width="100%" style="display: block; margin: auto;" />

---
### Snow: the Most Famous Natural Experiment
<img src="images/johnsnow.png" width="100%" style="display: block; margin: auto;" />

---
### Snow: the Most Famous Natural Experiment
<img src="images/snowpubandpump.png" width="50%" style="display: block; margin: auto;" />

---
### Snow on Cholera
<img src="images/snowcrossdistrict.jpg" width="40%" style="display: block; margin: auto;" />

---
''Although the facts shown in the above table afford very strong evidence
of the powerful influence which the drinking of water containing the
sewage of a town exerts over the spread of cholera, when that disease is
present, yet the question does not end here; for the intermixing of the
water supply of the Southwark and Vauxhall Company with that of the
Lambeth Company, over an extensive part of London, admitted of the
subject being sifted in such a way as to yield the most incontrovertible
proof on one side or the other.''

---
''In the sub-districts enumerated in the above table as being supplied by
both Companies, the mixing of the supply is of the most intimate kind.
The pipes of each Company go down all the streets, and into nearly all
the courts and alleys. A few houses are supplied by one Company and a
few by the other, according to the decision of the owner or occupier at
that time when the Water Companies were in active competition. In many
cases a single house has a supply different from that on either side.
Each company supplies both rich and poor, both large houses and small;
there is no difference either in the condition or occupation of the
persons receiving the water of the different Companies.''

---

---

``` r
snowtable8 <- read_csv("https://github.com/jnseawright/PS406/raw/main/data/snowtable8.csv")
```

```
## Rows: 32 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): subDistrict, supplier, lambethdegree, last3imputed, district
## dbl (7): pop1851, deathsOverall, deathsSouthwark, deathsLambeth, deathsPump,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
```

``` r
snowlm <- lm(deathsOverall ~ supplier, data=snowtable8)
```
---

``` r
summary(snowlm)
```

```
## 
## Call:
## lm(formula = deathsOverall ~ supplier, data = snowtable8)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -69.333 -16.896  -1.625  16.812  54.667 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           4.50      14.93   0.302 0.765175    
## supplierSouthwarkVauxhall            65.83      17.23   3.820 0.000651 ***
## supplierSouthwarkVauxhall_Lambeth    36.25      16.69   2.172 0.038139 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29.85 on 29 degrees of freedom
## Multiple R-squared:  0.3575,	Adjusted R-squared:  0.3132 
## F-statistic:  8.07 on 2 and 29 DF,  p-value: 0.001636
```

---

``` r
snowlm2 <- lm(I(log(pop1851))~ supplier, data=snowtable8)
summary(snowlm2)
```

```
## 
## Call:
## lm(formula = I(log(pop1851)) ~ supplier, data = snowtable8)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90380 -0.16029 -0.00353  0.28874  0.80617 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         8.3014     0.1886  44.019  < 2e-16 ***
## supplierSouthwarkVauxhall           1.1731     0.2178   5.387 8.65e-06 ***
## supplierSouthwarkVauxhall_Lambeth   1.5134     0.2108   7.178 6.70e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3772 on 29 degrees of freedom
## Multiple R-squared:  0.6402,	Adjusted R-squared:  0.6153 
## F-statistic:  25.8 on 2 and 29 DF,  p-value: 3.661e-07
```

---
### Brady and McNulty on Costs of Voting

---
<img src="images/bradymcnulty.JPG" width="100%" style="display: block; margin: auto;" />

---
### Vietnam War Draft Lottery

---
<img src="images/draftrandom.JPG" width="100%" style="display: block; margin: auto;" />

---

``` r
draft1970 <- read_csv("https://github.com/jnseawright/PS406/raw/main/data/draft1970.csv")
```

```
## Rows: 366 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): day, rank, month
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
```
---

``` r
boxplot(rank~month, data=draft1970)
```

---

``` r
draftlm <- lm(rank ~ day, data=draft1970)
```
---

``` r
summ(draftlm)
```

<table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;">
<tbody>
  <tr>
   <td style="text-align:left;font-weight: bold;"> Observations </td>
   <td style="text-align:right;"> 366 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> Dependent variable </td>
   <td style="text-align:right;"> rank </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> Type </td>
   <td style="text-align:right;"> OLS linear regression </td>
  </tr>
</tbody>
</table> <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;">
<tbody>
  <tr>
   <td style="text-align:left;font-weight: bold;"> F(1,364) </td>
   <td style="text-align:right;"> 19.54 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> R² </td>
   <td style="text-align:right;"> 0.05 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> Adj. R² </td>
   <td style="text-align:right;"> 0.05 </td>
  </tr>
</tbody>
</table> <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;border-bottom: 0;">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> Est. </th>
   <th style="text-align:right;"> S.E. </th>
   <th style="text-align:right;"> t val. </th>
   <th style="text-align:right;"> p </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;font-weight: bold;"> (Intercept) </td>
   <td style="text-align:right;"> 224.91 </td>
   <td style="text-align:right;"> 10.81 </td>
   <td style="text-align:right;"> 20.80 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> day </td>
   <td style="text-align:right;"> -0.23 </td>
   <td style="text-align:right;"> 0.05 </td>
   <td style="text-align:right;"> -4.42 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
</tbody>
<tfoot><tr><td style="padding: 0; " colspan="100%">
<sup></sup> Standard errors: OLS</td></tr></tfoot>
</table>
---

``` r
draft1971 <- read_csv("https://github.com/jnseawright/PS406/raw/main/data/draft1971.csv")
```

```
## Rows: 365 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): day, rank, month
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
```
---

``` r
boxplot(rank~month, data=draft1971)
```

---

``` r
draft71lm <- lm(rank ~ day, data=draft1971)
```
---

``` r
summ(draft71lm)
```

<table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;">
<tbody>
  <tr>
   <td style="text-align:left;font-weight: bold;"> Observations </td>
   <td style="text-align:right;"> 365 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> Dependent variable </td>
   <td style="text-align:right;"> rank </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> Type </td>
   <td style="text-align:right;"> OLS linear regression </td>
  </tr>
</tbody>
</table> <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;">
<tbody>
  <tr>
   <td style="text-align:left;font-weight: bold;"> F(1,363) </td>
   <td style="text-align:right;"> 0.07 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> R² </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> Adj. R² </td>
   <td style="text-align:right;"> -0.00 </td>
  </tr>
</tbody>
</table> <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;border-bottom: 0;">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> Est. </th>
   <th style="text-align:right;"> S.E. </th>
   <th style="text-align:right;"> t val. </th>
   <th style="text-align:right;"> p </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;font-weight: bold;"> (Intercept) </td>
   <td style="text-align:right;"> 180.39 </td>
   <td style="text-align:right;"> 11.08 </td>
   <td style="text-align:right;"> 16.28 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;"> day </td>
   <td style="text-align:right;"> 0.01 </td>
   <td style="text-align:right;"> 0.05 </td>
   <td style="text-align:right;"> 0.27 </td>
   <td style="text-align:right;"> 0.79 </td>
  </tr>
</tbody>
<tfoot><tr><td style="padding: 0; " colspan="100%">
<sup></sup> Standard errors: OLS</td></tr></tfoot>
</table>

---
<img src="images/angrist.JPG" width="85%" style="display: block; margin: auto;" />

---
### Lottery Winners and Political Attitudes

---
<img src="images/Doherty1.png" width="60%" style="display: block; margin: auto;" />

---
<img src="images/Doherty2.png" width="80%" style="display: block; margin: auto;" />

---
<img src="images/Doherty3.png" width="100%" style="display: block; margin: auto;" />

---
### IV Natural Experiment

1.  "Nature" randomizes a cause of the treatment.

-   Call the treatment `\(X\)`.

-   Call the randomized cause of the treatment `\(Z\)`.

2.  `\(Z\)` only affects `\(Y\)` through its effects on `\(X\)`.

3.  Treatment caused by the randomized cause has the same effect as
    treatment with any other cause would have.

---
### Colonialism and Development

<img src="images/settlers1.png" width="520" height="100%" style="display: block; margin: auto;" />
---
### Colonialism and Development

<img src="images/settlers2.png" width="968" height="100%" style="display: block; margin: auto;" />
---

### Colonialism and Development

---
### Colonialism and Development

---
### Vietnam Draft Lottery and Returns to Education

---
<img src="images/Angrist2.png" width="50%" style="display: block; margin: auto;" />

---
<img src="images/Pierskalla.JPG" height="100%" style="display: block; margin: auto;" />

---
<img src="images/PierskallaInstrumentArgument.JPG" height="100%" style="display: block; margin: auto;" />

---
<img src="images/PierskallaInstrumentFootnote.JPG" height="100%" style="display: block; margin: auto;" />

---
<img src="images/PierskallaInstrumentResults.JPG" height="100%" style="display: block; margin: auto;" />

---
### RDD

1.  There is an assignment variable, `\(Z\)`.

2.  Cases are assigned to treatment if and only if `\(Z\)` is greater than a
    predetermined threshold value, `\(T\)`.

3.  There are enough cases that lots have scores of `\(Z\)` that are just
    above and just below `\(T\)`.