class: center, middle, inverse, title-slide .title[ # 4: Natural Experiments ] .subtitle[ ## Quantitative Causal Inference ] .author[ ###
J. Seawright
] .institute[ ###
Northwestern Political Science
] .date[ ### April 24, 2025 ] --- class: center, middle <style type="text/css"> pre { max-height: 400px; overflow-y: auto; } pre[class] { max-height: 200px; } </style> --- ### Typology of Natural Experiments - Classic Natural Experiment - Instrumental Variables-Type Natural Experiment - Regression-Discontinuity Design --- ### Key Ideas for Natural Experiments The cause of the cause, and the cause of the cause of the cause --- ### Classic Natural Experiment 1. "Nature" randomizes the treatment. - No discretion is involved in assigning treatments, or the relevant information is unavailable or unused. 2. Randomized treatment has the same effect as non-randomized treatment would have. --- ### Snow: the Most Famous Natural Experiment <img src="images/jonsnow.png" width="100%" style="display: block; margin: auto;" /> --- ### Snow: the Most Famous Natural Experiment <img src="images/johnsnow.png" width="100%" style="display: block; margin: auto;" /> --- ### Snow: the Most Famous Natural Experiment <img src="images/snowpubandpump.png" width="50%" style="display: block; margin: auto;" /> --- ### Snow on Cholera <img src="images/snowcrossdistrict.jpg" width="40%" style="display: block; margin: auto;" /> --- ''Although the facts shown in the above table afford very strong evidence of the powerful influence which the drinking of water containing the sewage of a town exerts over the spread of cholera, when that disease is present, yet the question does not end here; for the intermixing of the water supply of the Southwark and Vauxhall Company with that of the Lambeth Company, over an extensive part of London, admitted of the subject being sifted in such a way as to yield the most incontrovertible proof on one side or the other.'' --- ''In the sub-districts enumerated in the above table as being supplied by both Companies, the mixing of the supply is of the most intimate kind. The pipes of each Company go down all the streets, and into nearly all the courts and alleys. A few houses are supplied by one Company and a few by the other, according to the decision of the owner or occupier at that time when the Water Companies were in active competition. In many cases a single house has a supply different from that on either side. Each company supplies both rich and poor, both large houses and small; there is no difference either in the condition or occupation of the persons receiving the water of the different Companies.'' --- <img src="images/snowwithindistrict.jpg" width="40%" style="display: block; margin: auto;" /> --- ``` r snowtable8 <- read_csv("https://github.com/jnseawright/PS406/raw/main/data/snowtable8.csv") ``` ``` ## Rows: 32 Columns: 12 ## ── Column specification ──────────────────────────────────────────────────────── ## Delimiter: "," ## chr (5): subDistrict, supplier, lambethdegree, last3imputed, district ## dbl (7): pop1851, deathsOverall, deathsSouthwark, deathsLambeth, deathsPump,... ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. ``` ``` r snowlm <- lm(deathsOverall ~ supplier, data=snowtable8) ``` --- ``` r summary(snowlm) ``` ``` ## ## Call: ## lm(formula = deathsOverall ~ supplier, data = snowtable8) ## ## Residuals: ## Min 1Q Median 3Q Max ## -69.333 -16.896 -1.625 16.812 54.667 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.50 14.93 0.302 0.765175 ## supplierSouthwarkVauxhall 65.83 17.23 3.820 0.000651 *** ## supplierSouthwarkVauxhall_Lambeth 36.25 16.69 2.172 0.038139 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 29.85 on 29 degrees of freedom ## Multiple R-squared: 0.3575, Adjusted R-squared: 0.3132 ## F-statistic: 8.07 on 2 and 29 DF, p-value: 0.001636 ``` --- ``` r snowlm2 <- lm(I(log(pop1851))~ supplier, data=snowtable8) summary(snowlm2) ``` ``` ## ## Call: ## lm(formula = I(log(pop1851)) ~ supplier, data = snowtable8) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.90380 -0.16029 -0.00353 0.28874 0.80617 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 8.3014 0.1886 44.019 < 2e-16 *** ## supplierSouthwarkVauxhall 1.1731 0.2178 5.387 8.65e-06 *** ## supplierSouthwarkVauxhall_Lambeth 1.5134 0.2108 7.178 6.70e-08 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.3772 on 29 degrees of freedom ## Multiple R-squared: 0.6402, Adjusted R-squared: 0.6153 ## F-statistic: 25.8 on 2 and 29 DF, p-value: 3.661e-07 ``` --- ### Brady and McNulty on Costs of Voting --- <img src="images/bradymcnulty.JPG" width="100%" style="display: block; margin: auto;" /> --- ### Vietnam War Draft Lottery --- <img src="images/draftrandom.JPG" width="100%" style="display: block; margin: auto;" /> --- ``` r draft1970 <- read_csv("https://github.com/jnseawright/PS406/raw/main/data/draft1970.csv") ``` ``` ## Rows: 366 Columns: 3 ## ── Column specification ──────────────────────────────────────────────────────── ## Delimiter: "," ## dbl (3): day, rank, month ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. ``` --- ``` r boxplot(rank~month, data=draft1970) ``` <img src="4naturalexperiments_files/figure-html/unnamed-chunk-13-1.png" width="70%" style="display: block; margin: auto;" /> --- ``` r draftlm <- lm(rank ~ day, data=draft1970) ``` --- ``` r summ(draftlm) ``` <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Observations </td> <td style="text-align:right;"> 366 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Dependent variable </td> <td style="text-align:right;"> rank </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Type </td> <td style="text-align:right;"> OLS linear regression </td> </tr> </tbody> </table> <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> F(1,364) </td> <td style="text-align:right;"> 19.54 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> R² </td> <td style="text-align:right;"> 0.05 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Adj. R² </td> <td style="text-align:right;"> 0.05 </td> </tr> </tbody> </table> <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;border-bottom: 0;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> S.E. </th> <th style="text-align:right;"> t val. </th> <th style="text-align:right;"> p </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> (Intercept) </td> <td style="text-align:right;"> 224.91 </td> <td style="text-align:right;"> 10.81 </td> <td style="text-align:right;"> 20.80 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> day </td> <td style="text-align:right;"> -0.23 </td> <td style="text-align:right;"> 0.05 </td> <td style="text-align:right;"> -4.42 </td> <td style="text-align:right;"> 0.00 </td> </tr> </tbody> <tfoot><tr><td style="padding: 0; " colspan="100%"> <sup></sup> Standard errors: OLS</td></tr></tfoot> </table> --- ``` r draft1971 <- read_csv("https://github.com/jnseawright/PS406/raw/main/data/draft1971.csv") ``` ``` ## Rows: 365 Columns: 3 ## ── Column specification ──────────────────────────────────────────────────────── ## Delimiter: "," ## dbl (3): day, rank, month ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. ``` --- ``` r boxplot(rank~month, data=draft1971) ``` <img src="4naturalexperiments_files/figure-html/unnamed-chunk-17-1.png" width="70%" style="display: block; margin: auto;" /> --- ``` r draft71lm <- lm(rank ~ day, data=draft1971) ``` --- ``` r summ(draft71lm) ``` <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Observations </td> <td style="text-align:right;"> 365 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Dependent variable </td> <td style="text-align:right;"> rank </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Type </td> <td style="text-align:right;"> OLS linear regression </td> </tr> </tbody> </table> <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> F(1,363) </td> <td style="text-align:right;"> 0.07 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> R² </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Adj. R² </td> <td style="text-align:right;"> -0.00 </td> </tr> </tbody> </table> <table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; margin-left: auto; margin-right: auto;border-bottom: 0;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> S.E. </th> <th style="text-align:right;"> t val. </th> <th style="text-align:right;"> p </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> (Intercept) </td> <td style="text-align:right;"> 180.39 </td> <td style="text-align:right;"> 11.08 </td> <td style="text-align:right;"> 16.28 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> day </td> <td style="text-align:right;"> 0.01 </td> <td style="text-align:right;"> 0.05 </td> <td style="text-align:right;"> 0.27 </td> <td style="text-align:right;"> 0.79 </td> </tr> </tbody> <tfoot><tr><td style="padding: 0; " colspan="100%"> <sup></sup> Standard errors: OLS</td></tr></tfoot> </table> --- <img src="images/angrist.JPG" width="85%" style="display: block; margin: auto;" /> --- ### Lottery Winners and Political Attitudes --- <img src="images/Doherty1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="images/Doherty2.png" width="80%" style="display: block; margin: auto;" /> --- <img src="images/Doherty3.png" width="100%" style="display: block; margin: auto;" /> --- ### IV Natural Experiment 1. "Nature" randomizes a cause of the treatment. - Call the treatment `\(X\)`. - Call the randomized cause of the treatment `\(Z\)`. 2. `\(Z\)` only affects `\(Y\)` through its effects on `\(X\)`. 3. Treatment caused by the randomized cause has the same effect as treatment with any other cause would have. --- ### Colonialism and Development <img src="images/settlers1.png" width="520" height="100%" style="display: block; margin: auto;" /> --- ### Colonialism and Development <img src="images/settlers2.png" width="968" height="100%" style="display: block; margin: auto;" /> --- ### Colonialism and Development <img src="images/settlers3.png" width="948" height="100%" style="display: block; margin: auto;" /> --- ### Colonialism and Development <img src="images/settlers4.png" width="80%" style="display: block; margin: auto;" /> --- ### Vietnam Draft Lottery and Returns to Education <img src="images/Angrist1.png" width="80%" style="display: block; margin: auto;" /> --- <img src="images/Angrist2.png" width="50%" style="display: block; margin: auto;" /> --- <img src="images/Pierskalla.JPG" height="100%" style="display: block; margin: auto;" /> --- <img src="images/PierskallaInstrumentArgument.JPG" height="100%" style="display: block; margin: auto;" /> --- <img src="images/PierskallaInstrumentFootnote.JPG" height="100%" style="display: block; margin: auto;" /> --- <img src="images/PierskallaInstrumentResults.JPG" height="100%" style="display: block; margin: auto;" /> --- ### RDD 1. There is an assignment variable, `\(Z\)`. 2. Cases are assigned to treatment if and only if `\(Z\)` is greater than a predetermined threshold value, `\(T\)`. 3. There are enough cases that lots have scores of `\(Z\)` that are just above and just below `\(T\)`. --- ### Example: Maimonides' Rule > "The number of pupils assigned to each teacher is twenty-five. If > there are fifty, we appoint two teachers. If there are forty, we > appoint an assistant, at the expense of the town." (Baba Bathra, > Chapter II, page 21a; translated by Epstein 1976: 214) --- ### Example: Maimonides' Rule > "Twenty-five children may be put in charge of one teacher. If the > number in the class exceeds twenty-five but is not more than forty, he > should have an assistant to help with the instruction. If there are > more than forty, two teachers must be appointed." (Maimonides, given > in Hyamson 1937: 58b) --- ### Example: Maimonides' Rule - Maimonides' Rule is used to determine class sizes in Israel. - Angrist and Lavy (1999) use this to carry out an RDD analysis of the effects of class size on educational outcomes. --- ### Example: Maimonides' Rule <img src="images/maimonverb.jpeg" width="65%" style="display: block; margin: auto;" /> --- ### Example: Maimonides' Rule <img src="images/maimonmath.jpeg" width="65%" style="display: block; margin: auto;" /> --- ### Example: Maimonides' Rule If the RDD is a success, then the groups just above and below the cutpoint on `\(Z\)` should be balanced, both in terms of the number of cases and in terms of any measured background variables. 1. It may be a good idea to do a balance test between treatment and control cases within a bandwidth around the cutpoint. 2. At the very least, look at a histogram of cases to check that there are about the same number of cases just above and just below the threshold. --- <img src="images/Angrist2019title.PNG" width="85%" style="display: block; margin: auto;" /> --- <img src="images/Angrist2019-1.PNG" width="85%" style="display: block; margin: auto;" /> --- <img src="images/Angrist2019-2.PNG" width="85%" style="display: block; margin: auto;" /> --- <img src="images/Broockman.JPG" width="65%" style="display: block; margin: auto;" /> --- <img src="images/BroockmanRDDVisual.JPG" width="65%" style="display: block; margin: auto;" /> --- <img src="images/BroockmanSuccess.JPG" width="65%" style="display: block; margin: auto;" /> --- <img src="images/BroockmanFailure.JPG" width="65%" style="display: block; margin: auto;" /> --- ### RDD RDD isn't a good idea if: - Actors are aware of the discontinuity and adjust their behavior accordingly. - The variable which assigns the discontinuity is so coarsely measured or distributed that the cases nearest to the divide are not close to each other.