class: center, middle, inverse, title-slide .title[ # 9: Correlation and Regression ] .subtitle[ ## Empirical Methods ] .author[ ###
J. Seawright
] .institute[ ###
Northwestern Political Science
] .date[ ### Oct. 14, 2025 ] --- class: center, middle <style type="text/css"> pre { max-height: 400px; overflow-y: auto; } pre[class] { max-height: 200px; } </style> --- We have two problems: 1. What if our independent variable is quantitative? 2. What if we have multiple control variables? --- ### Regression Analysis - Regression analysis is a tool that fits a line to a scatter plot. --- Remember from your studies that the algebraic formula for a line is: `\(Y = mX + b\)`. --- ### Regression Analysis When we pivot from geometry to regression, two important changes! 1. Instead of `\(m\)` usually call the slope `\(\beta\)` or `\(b\)`, and instead of `\(b\)` we usually call the intercept `\(\alpha\)`, `\(a\)`, or sometimes `\(\beta_0\)`. 2. We add a term at the end of the equation for error/noise above and below the line. This term is called `\(e_i\)` or `\(\epsilon_i\)`. --- - How do we get numbers for `\(\alpha\)` and `\(\beta\)` in regression analysis? - Data plus calculus! --- ``` r anes2024 <- read.csv("https://raw.githubusercontent.com/jnseawright/ps210/refs/heads/main/Data/anes2024.csv") anes2024$transbathroom <- ifelse(anes2024$V241372x < 0, NA, anes2024$V241372x) table(anes2024$transbathroom) ``` ``` ## ## 1 2 3 4 5 6 7 ## 779 459 131 1317 102 383 2059 ``` --- ``` r anes2024$carbonregulations <- ifelse(anes2024$V242324x < 0, NA, anes2024$V242324x) table(anes2024$carbonregulations) ``` ``` ## ## 1 2 3 4 5 6 7 ## 1738 928 200 1182 89 263 306 ``` --- ``` r library(ggplot2) library(hrbrthemes) transcarbonplot <- anes2024 %>% ggplot(aes(x=transbathroom,y=carbonregulations)) + geom_point( color="#69b3a2") + theme_ipsum() ``` --- ``` r transcarbonplot ``` ``` ## Warning: Removed 845 rows containing missing values or values outside the scale range ## (`geom_point()`). ``` <img src="Regression_files/figure-html/unnamed-chunk-5-1.png" width="55%" style="display: block; margin: auto;" /> --- ``` r transcarbonjitterplot <- anes2024 %>% ggplot(aes(x=transbathroom,y=carbonregulations)) + geom_point( color="#69b3a2") + geom_jitter() + theme_ipsum() ``` --- ``` r transcarbonjitterplot ``` ``` ## Warning: Removed 845 rows containing missing values or values outside the scale range ## (`geom_point()`). ## Removed 845 rows containing missing values or values outside the scale range ## (`geom_point()`). ``` <img src="Regression_files/figure-html/unnamed-chunk-7-1.png" width="55%" style="display: block; margin: auto;" /> --- `\(CarbonRegulations_{i} = \alpha + \beta TransBathroom_{i} + \epsilon_{i}\)` --- ``` r transcarbonlm <- lm(carbonregulations ~ transbathroom, data=anes2024) summary(transcarbonlm) ``` ``` ## ## Call: ## lm(formula = carbonregulations ~ transbathroom, data = anes2024) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.6524 -1.5278 -0.4033 0.7225 5.5967 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.02846 0.05579 18.43 <2e-16 *** ## transbathroom 0.37484 0.01074 34.89 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.665 on 4674 degrees of freedom ## (845 observations deleted due to missingness) ## Multiple R-squared: 0.2067, Adjusted R-squared: 0.2065 ## F-statistic: 1218 on 1 and 4674 DF, p-value: < 2.2e-16 ``` --- ``` r transcarbonlmplot <- anes2024 %>% ggplot(aes(x=transbathroom,y=carbonregulations)) + geom_point( color="#69b3a2") + geom_jitter() + geom_smooth(method=lm , color="red", se=TRUE) + theme_ipsum() ``` --- ``` r transcarbonlmplot ``` <img src="Regression_files/figure-html/unnamed-chunk-10-1.png" width="55%" style="display: block; margin: auto;" /> --- What are some possible *confounding variables* for this relationship? --- `\(\tiny CarbonRegulations_{i} = \alpha + \beta_{1} TransBathroom_{i} + \beta_{2} Partisanship_{i} + \epsilon_{i}\)` --- ``` r anes2024$partyid <- ifelse(anes2024$V241227x < 0, NA, anes2024$V241227x) table(anes2024$partyid) ``` ``` ## ## 1 2 3 4 5 6 7 ## 1314 616 714 380 716 577 1166 ``` --- ``` r transcarbonlm2 <- lm(carbonregulations ~ transbathroom + partyid, data=anes2024) summary(transcarbonlm2) ``` ``` ## ## Call: ## lm(formula = carbonregulations ~ transbathroom + partyid, data = anes2024) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.1742 -0.9916 -0.1910 0.8090 5.8090 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.69378 0.05454 12.72 <2e-16 *** ## transbathroom 0.20156 0.01245 16.19 <2e-16 *** ## partyid 0.29565 0.01234 23.97 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.571 on 4658 degrees of freedom ## (860 observations deleted due to missingness) ## Multiple R-squared: 0.2932, Adjusted R-squared: 0.2929 ## F-statistic: 966.3 on 2 and 4658 DF, p-value: < 2.2e-16 ``` --- <img src="images/goodcontrol.jpg" width="80%" style="display: block; margin: auto;" /> --- <img src="images/irrelevantcontrol1.jpg" width="80%" style="display: block; margin: auto;" /> --- <img src="images/harmfulcontrol1.jpg" width="80%" style="display: block; margin: auto;" /> --- <img src="images/harmfulcontrol2.jpg" width="80%" style="display: block; margin: auto;" /> --- <img src="images/harmfulcontrol3.jpg" width="80%" style="display: block; margin: auto;" />