class: center, middle, inverse, title-slide .title[ # 2: Starting with Regression ] .subtitle[ ## Linear Models ] .author[ ###
Jaye Seawright
] .institute[ ###
Northwestern Political Science
] .date[ ### Jan. 7, 2026 ] --- class: center, middle <style type="text/css"> pre { max-height: 400px; overflow-y: auto; } pre[class] { max-height: 200px; } </style> Imagine someone interested in predicting U.S. presidential election results based on economic performance. --- ``` r library(tidyverse) ``` ``` ## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ## ✔ forcats 1.0.1 ✔ stringr 1.6.0 ## ✔ lubridate 1.9.4 ✔ tibble 3.3.0 ## ✔ purrr 1.2.0 ✔ tidyr 1.3.1 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ nlme::collapse() masks dplyr::collapse() ## ✖ mice::filter() masks dplyr::filter(), stats::filter() ## ✖ dplyr::lag() masks stats::lag() ## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors ``` ``` r library(rosdata) data("hibbs") ``` --- ``` r econvoteplot <- ggplot(hibbs, aes(x = growth, y = vote, label = year)) + geom_text(size = 3) + scale_x_continuous(labels = function(x) paste0(x, "%")) + scale_y_continuous(labels = function(x) paste0(x, "%")) + labs(title = "Forecasting the Election from the Economy", x = "Average recent growth in personal income", y = "Incumbent party's vote share") + theme_minimal() ``` --- <img src="StartingWithRegression_files/figure-html/unnamed-chunk-4-1.png" width="80%" style="display: block; margin: auto;" /> --- ``` r econvoteplot.line <- ggplot(hibbs, aes(x = growth, y = vote, label = year)) + geom_text(size = 3) + scale_x_continuous(labels = function(x) paste0(x, "%")) + scale_y_continuous(labels = function(x) paste0(x, "%")) + labs(title = "Forecasting the Election from the Economy", x = "Average recent growth in personal income", y = "Incumbent party's vote share") + geom_smooth(method = "lm") + theme_minimal() ``` --- ``` ## `geom_smooth()` using formula = 'y ~ x' ``` <img src="StartingWithRegression_files/figure-html/unnamed-chunk-6-1.png" width="70%" style="display: block; margin: auto;" /> --- ``` r econvote.lm <- lm(vote ~ growth, data = hibbs) ``` --- ``` r library(rstanarm) ``` ``` ## Loading required package: Rcpp ``` ``` ## This is rstanarm version 2.32.2 ``` ``` ## - See https://mc-stan.org/rstanarm/articles/priors for changes to default priors! ``` ``` ## - Default priors may change, so it's safest to specify priors, even if equivalent to the defaults. ``` ``` ## - For execution on a local, multicore CPU with excess RAM we recommend calling ``` ``` ## options(mc.cores = parallel::detectCores()) ``` ``` ## ## Attaching package: 'rstanarm' ``` ``` ## The following objects are masked from 'package:rosdata': ## ## kidiq, roaches, wells ``` ``` r econvote.lm.bayes <- stan_glm(vote ~ growth, data = hibbs) ``` ``` ## ## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1). ## Chain 1: ## Chain 1: Gradient evaluation took 0.003087 seconds ## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 30.87 seconds. ## Chain 1: Adjust your expectations accordingly! ## Chain 1: ## Chain 1: ## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 1: ## Chain 1: Elapsed Time: 0.038 seconds (Warm-up) ## Chain 1: 0.037 seconds (Sampling) ## Chain 1: 0.075 seconds (Total) ## Chain 1: ## ## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2). ## Chain 2: ## Chain 2: Gradient evaluation took 1.4e-05 seconds ## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds. ## Chain 2: Adjust your expectations accordingly! ## Chain 2: ## Chain 2: ## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 2: ## Chain 2: Elapsed Time: 0.041 seconds (Warm-up) ## Chain 2: 0.035 seconds (Sampling) ## Chain 2: 0.076 seconds (Total) ## Chain 2: ## ## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3). ## Chain 3: ## Chain 3: Gradient evaluation took 1.2e-05 seconds ## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds. ## Chain 3: Adjust your expectations accordingly! ## Chain 3: ## Chain 3: ## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 3: ## Chain 3: Elapsed Time: 0.04 seconds (Warm-up) ## Chain 3: 0.036 seconds (Sampling) ## Chain 3: 0.076 seconds (Total) ## Chain 3: ## ## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4). ## Chain 4: ## Chain 4: Gradient evaluation took 1.2e-05 seconds ## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds. ## Chain 4: Adjust your expectations accordingly! ## Chain 4: ## Chain 4: ## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 4: ## Chain 4: Elapsed Time: 0.037 seconds (Warm-up) ## Chain 4: 0.033 seconds (Sampling) ## Chain 4: 0.07 seconds (Total) ## Chain 4: ``` --- ``` r library(modelsummary) library(kableExtra) ``` ``` ## ## Attaching package: 'kableExtra' ``` ``` ## The following object is masked from 'package:dplyr': ## ## group_rows ``` ``` r econvote.summary <- modelsummary( list("Economic Voting Model" = econvote.lm), output = "kableExtra", stars = TRUE, statistic = "std.error", coef_rename = TRUE, escape = FALSE, gof_map = "all") %>% kable_styling( font_size = 24, # Even larger font bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE ) ``` --- <table style="NAborder-bottom: 0; width: auto !important; margin-left: auto; margin-right: auto; font-size: 24px; width: auto !important; margin-left: auto; margin-right: auto;" class="table table table-striped table-hover table-condensed"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> Economic Voting Model </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:center;"> 46.248*** </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (1.622) </td> </tr> <tr> <td style="text-align:left;"> growth </td> <td style="text-align:center;"> 3.061*** </td> </tr> <tr> <td style="text-align:left;box-shadow: 0px 1.5px"> </td> <td style="text-align:center;box-shadow: 0px 1.5px"> (0.696) </td> </tr> <tr> <td style="text-align:left;"> Num.Obs. </td> <td style="text-align:center;"> 16 </td> </tr> <tr> <td style="text-align:left;"> R2 </td> <td style="text-align:center;"> 0.580 </td> </tr> <tr> <td style="text-align:left;"> R2 Adj. </td> <td style="text-align:center;"> 0.550 </td> </tr> <tr> <td style="text-align:left;"> AIC </td> <td style="text-align:center;"> 91.7 </td> </tr> <tr> <td style="text-align:left;"> BIC </td> <td style="text-align:center;"> 94.0 </td> </tr> <tr> <td style="text-align:left;"> Log.Lik. </td> <td style="text-align:center;"> −42.839 </td> </tr> <tr> <td style="text-align:left;"> F </td> <td style="text-align:center;"> 19.321 </td> </tr> <tr> <td style="text-align:left;"> RMSE </td> <td style="text-align:center;"> 3.52 </td> </tr> </tbody> <tfoot><tr><td style="padding: 0; " colspan="100%"> <sup></sup> + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001</td></tr></tfoot> </table> --- ``` r econvote.bayes.summary <- modelsummary( list("Bayesian Voting Model" = econvote.lm.bayes), output = "kableExtra", stars = TRUE, statistic = "conf.int", coef_rename = TRUE, escape = FALSE, gof_map = "all") %>% kable_styling( font_size = 24, # Even larger font bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE ) ``` --- <table style="NAborder-bottom: 0; width: auto !important; margin-left: auto; margin-right: auto; font-size: 24px; width: auto !important; margin-left: auto; margin-right: auto;" class="table table table-striped table-hover table-condensed"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> Bayesian Voting Model </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:center;"> 46.240 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> [42.698, 49.570] </td> </tr> <tr> <td style="text-align:left;"> growth </td> <td style="text-align:center;"> 3.069 </td> </tr> <tr> <td style="text-align:left;box-shadow: 0px 1.5px"> </td> <td style="text-align:center;box-shadow: 0px 1.5px"> [1.617, 4.539] </td> </tr> <tr> <td style="text-align:left;"> Num.Obs. </td> <td style="text-align:center;"> 16 </td> </tr> <tr> <td style="text-align:left;"> R2 </td> <td style="text-align:center;"> 0.551 </td> </tr> <tr> <td style="text-align:left;"> R2 Adj. </td> <td style="text-align:center;"> 0.522 </td> </tr> <tr> <td style="text-align:left;"> Log.Lik. </td> <td style="text-align:center;"> −43.766 </td> </tr> <tr> <td style="text-align:left;"> ELPD </td> <td style="text-align:center;"> −46.4 </td> </tr> <tr> <td style="text-align:left;"> ELPD s.e. </td> <td style="text-align:center;"> 3.9 </td> </tr> <tr> <td style="text-align:left;"> LOOIC </td> <td style="text-align:center;"> 92.8 </td> </tr> <tr> <td style="text-align:left;"> LOOIC s.e. </td> <td style="text-align:center;"> 7.7 </td> </tr> <tr> <td style="text-align:left;"> WAIC </td> <td style="text-align:center;"> 92.2 </td> </tr> <tr> <td style="text-align:left;"> RMSE </td> <td style="text-align:center;"> 3.52 </td> </tr> <tr> <td style="text-align:left;"> Sigma </td> <td style="text-align:center;"> 3.875 </td> </tr> </tbody> <tfoot><tr><td style="padding: 0; " colspan="100%"> <sup></sup> + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001</td></tr></tfoot> </table> --- A large literature in political science worries that Republicans and Democrats are treated differently in terms of economic voting. --- ``` r hibbs$inc_party_candidate ``` ``` ## [1] "Stevenson" "Eisenhower" "Nixon" "Johnson" "Humphrey" ## [6] "Nixon" "Ford" "Carter" "Reagan" "Bush, Sr." ## [11] "Bush, Sr." "Clinton" "Gore" "Bush, Jr." "McCain" ## [16] "Obama" ``` --- ``` r hibbs$inc_party <- 1*(hibbs$inc_party_candidate %in% c("Johnson", "Humphrey", "Carter", "Clinton", "Gore", "Obama")) ``` --- ``` r econvoteplotparty <- ggplot(hibbs, aes(x = growth, y = vote, label = year)) + geom_text(aes(color = as.factor(inc_party)), size = 3) + scale_x_continuous(labels = function(x) paste0(x, "%")) + scale_y_continuous(labels = function(x) paste0(x, "%")) + labs(title = "Forecasting the Election from the Economy", x = "Average recent growth in personal income", y = "Incumbent party's vote share") + scale_color_manual( values = c("1" = "#0015BC", # Classic Democratic blue "0" = "#DE0100") # Classic Republican red ) + theme_minimal() ``` --- <img src="StartingWithRegression_files/figure-html/unnamed-chunk-17-1.png" width="70%" style="display: block; margin: auto;" /> --- ``` r econvoteplotparty.line <- ggplot(hibbs, aes(x = growth, y = vote, color = as.factor(inc_party))) + geom_point() + scale_x_continuous(labels = function(x) paste0(x, "%")) + scale_y_continuous(labels = function(x) paste0(x, "%")) + labs(title = "Forecasting the Election from the Economy", x = "Average recent growth in personal income", y = "Incumbent party's vote share") + scale_color_manual( values = c("1" = "#0015BC", # Classic Democratic blue "0" = "#DE0100") # Classic Republican red ) + geom_smooth(method = "lm") + theme_minimal() ``` --- ``` ## `geom_smooth()` using formula = 'y ~ x' ``` <img src="StartingWithRegression_files/figure-html/unnamed-chunk-19-1.png" width="70%" style="display: block; margin: auto;" /> --- ``` r econvoteparty.lm <- lm(vote ~ growth + inc_party, data = hibbs) ``` --- ``` r econparty.summary <- modelsummary( list("Economic Voting Model" = econvoteparty.lm), output = "kableExtra", stars = TRUE, statistic = "std.error", coef_rename = TRUE, escape = FALSE, gof_map = "all") %>% kable_styling( font_size = 24, # Even larger font bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE ) ``` --- <table style="NAborder-bottom: 0; width: auto !important; margin-left: auto; margin-right: auto; font-size: 24px; width: auto !important; margin-left: auto; margin-right: auto;" class="table table table-striped table-hover table-condensed"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> Economic Voting Model </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:center;"> 46.153*** </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (1.855) </td> </tr> <tr> <td style="text-align:left;"> growth </td> <td style="text-align:center;"> 3.062*** </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.722) </td> </tr> <tr> <td style="text-align:left;"> inc_party </td> <td style="text-align:center;"> 0.245 </td> </tr> <tr> <td style="text-align:left;box-shadow: 0px 1.5px"> </td> <td style="text-align:center;box-shadow: 0px 1.5px"> (2.016) </td> </tr> <tr> <td style="text-align:left;"> Num.Obs. </td> <td style="text-align:center;"> 16 </td> </tr> <tr> <td style="text-align:left;"> R2 </td> <td style="text-align:center;"> 0.580 </td> </tr> <tr> <td style="text-align:left;"> R2 Adj. </td> <td style="text-align:center;"> 0.516 </td> </tr> <tr> <td style="text-align:left;"> AIC </td> <td style="text-align:center;"> 93.7 </td> </tr> <tr> <td style="text-align:left;"> BIC </td> <td style="text-align:center;"> 96.8 </td> </tr> <tr> <td style="text-align:left;"> Log.Lik. </td> <td style="text-align:center;"> −42.830 </td> </tr> <tr> <td style="text-align:left;"> F </td> <td style="text-align:center;"> 8.988 </td> </tr> <tr> <td style="text-align:left;"> RMSE </td> <td style="text-align:center;"> 3.52 </td> </tr> </tbody> <tfoot><tr><td style="padding: 0; " colspan="100%"> <sup></sup> + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001</td></tr></tfoot> </table> --- ``` r econvotepartyinteract.lm <- lm(vote ~ growth + inc_party + growth:inc_party, data = hibbs) ``` --- ``` r econinteract.summary <- modelsummary( list("Economic Voting Model" = econvotepartyinteract.lm), output = "kableExtra", stars = TRUE, statistic = "std.error", coef_rename = TRUE, escape = FALSE, gof_map = "all") %>% kable_styling( font_size = 24, # Even larger font bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE ) ``` --- <table style="NAborder-bottom: 0; width: auto !important; margin-left: auto; margin-right: auto; font-size: 24px; width: auto !important; margin-left: auto; margin-right: auto;" class="table table table-striped table-hover table-condensed"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> Economic Voting Model </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:center;"> 44.990*** </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (2.284) </td> </tr> <tr> <td style="text-align:left;"> growth </td> <td style="text-align:center;"> 3.669** </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.999) </td> </tr> <tr> <td style="text-align:left;"> inc_party </td> <td style="text-align:center;"> 2.692 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (3.426) </td> </tr> <tr> <td style="text-align:left;"> growth × inc_party </td> <td style="text-align:center;"> −1.295 </td> </tr> <tr> <td style="text-align:left;box-shadow: 0px 1.5px"> </td> <td style="text-align:center;box-shadow: 0px 1.5px"> (1.459) </td> </tr> <tr> <td style="text-align:left;"> Num.Obs. </td> <td style="text-align:center;"> 16 </td> </tr> <tr> <td style="text-align:left;"> R2 </td> <td style="text-align:center;"> 0.606 </td> </tr> <tr> <td style="text-align:left;"> R2 Adj. </td> <td style="text-align:center;"> 0.508 </td> </tr> <tr> <td style="text-align:left;"> AIC </td> <td style="text-align:center;"> 94.6 </td> </tr> <tr> <td style="text-align:left;"> BIC </td> <td style="text-align:center;"> 98.5 </td> </tr> <tr> <td style="text-align:left;"> Log.Lik. </td> <td style="text-align:center;"> −42.322 </td> </tr> <tr> <td style="text-align:left;"> F </td> <td style="text-align:center;"> 6.157 </td> </tr> <tr> <td style="text-align:left;"> RMSE </td> <td style="text-align:center;"> 3.41 </td> </tr> </tbody> <tfoot><tr><td style="padding: 0; " colspan="100%"> <sup></sup> + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001</td></tr></tfoot> </table> --- ``` r library(interplot) ``` ``` ## Loading required package: abind ``` ``` ## Loading required package: arm ``` ``` ## Loading required package: MASS ``` ``` ## ## Attaching package: 'MASS' ``` ``` ## The following object is masked from 'package:rosdata': ## ## newcomb ``` ``` ## The following object is masked from 'package:dplyr': ## ## select ``` ``` ## Loading required package: Matrix ``` ``` ## ## Attaching package: 'Matrix' ``` ``` ## The following objects are masked from 'package:tidyr': ## ## expand, pack, unpack ``` ``` ## Loading required package: lme4 ``` ``` ## ## Attaching package: 'lme4' ``` ``` ## The following object is masked from 'package:mice': ## ## toenail ``` ``` ## The following object is masked from 'package:nlme': ## ## lmList ``` ``` ## ## arm (Version 1.14-4, built: 2024-4-1) ``` ``` ## Working directory is C:/Users/jnsno/Documents/GitHub/ps405/Slides ``` ``` ## ## Attaching package: 'arm' ``` ``` ## The following objects are masked from 'package:rstanarm': ## ## invlogit, logit ``` ``` ## The following object is masked from 'package:jtools': ## ## standardize ``` --- ``` r econ.interplot <- interplot(m = econvotepartyinteract.lm, var1 = "growth", var2 = "inc_party") + xlab('Party (1 for Democrats)') + ylab('Marginal Effect of Income Growth') + ggtitle('Interaction between Party and Economics as Predictors of Elections') + theme(plot.title = element_text(face='bold')) ``` --- <img src="StartingWithRegression_files/figure-html/unnamed-chunk-28-1.png" width="70%" style="display: block; margin: auto;" /> --- ###Goals People can and do pursue a lot of different goals with regression. --- ###Goals 1. Summarizing relationships 2. Prediction 3. Theory testing 4. Causal inference