class: center, middle, inverse, title-slide .title[ # Regression Kickoff: Social Media & Violence on Jan 6 ] .subtitle[ ## PS 312 ] .author[ ### Jaye Seawright ] .date[ ### 2026-04-14 ] --- ## Today's Activity - **Goal:** Use regression to understand how electoral support for Trump in 2016 relates to domestic terrorism in subsequent years. - **Deliverable:** One table, one figure, and a short paragraph per group. - **Why regression?** It estimates the **strength** and **uncertainty** of relationships while adjusting for confounders. To warm up, we'll work through a regression example with the **January 6 defendants data**—the same data we used to build DAGs last time. --- ## The Data: January 6 Defendants ``` r # Read the data (adjust path if needed) df_raw <- read_csv("data/final_merged_data.csv") # Quick look at the variables we care about df_raw %>% select("Case ID", Age, contains("SocialMedia"), contains("ChargedWithViolence")) %>% head() ``` ``` ## # A tibble: 6 × 11 ## `Case ID` Age SocialMedia.jaye SocialMediaBroadcast…¹ SocialMedia.evelyn ## <chr> <chr> <dbl> <dbl> <dbl> ## 1 01172023_JA_… # NA NA NA ## 2 01282021_RNAR 40 NA NA 1 ## 3 06132023_RZA 22 NA NA NA ## 4 03112021_JHA 26 NA NA NA ## 5 04072021_DPA… 43 NA NA 1 ## 6 05122021_TBA 39 NA NA NA ## # ℹ abbreviated name: ¹SocialMediaBroadcast.jaye ## # ℹ 6 more variables: SocialMediaBroadcast.evelyn <dbl>, SocialMedia.mia <dbl>, ## # SocialMediaBroadcast.mia <dbl>, ChargedWithViolence.jaye <dbl>, ## # ChargedWithViolence.evelyn <dbl>, ChargedWithViolence.mia <dbl> ``` --- **Challenge:** Three coders (`.jaye`, `.evelyn`, `.mia`) each recorded whether a defendant used social media and whether they were charged with violence. We need a single, combined measure. And we have a messy group membership variable. --- ## Combining Multiple Coders Use `coalesce()` to take the first non‑missing value across the three coder columns. And clean up the group membership variable with 'case_when()'. ``` r df_clean <- df_raw %>% mutate( SocialMedia = coalesce(SocialMedia.jaye, SocialMedia.evelyn, SocialMedia.mia), ChargedWithViolence = coalesce(ChargedWithViolence.jaye, ChargedWithViolence.evelyn, ChargedWithViolence.mia), Age = as.numeric(if_else(Age == "#", NA_character_, Age)), # Create binary group membership GroupMember = case_when( str_detect(`Group affiliation`, regex("no\\s*known|unknown", ignore_case = TRUE)) ~ 0, is.na(`Group affiliation`) | `Group affiliation` == "" ~ 0, TRUE ~ 1 ) ) %>% filter(!is.na(SocialMedia), !is.na(ChargedWithViolence)) ``` --- ``` r # Check the combined variables df_clean %>% count(SocialMedia, ChargedWithViolence) %>% spread(ChargedWithViolence, n, fill = 0) ``` ``` ## # A tibble: 2 × 3 ## SocialMedia `0` `1` ## <dbl> <dbl> <dbl> ## 1 0 79 35 ## 2 1 122 40 ``` ``` r # Check the group count df_clean %>% count(GroupMember) ``` ``` ## # A tibble: 2 × 2 ## GroupMember n ## <dbl> <int> ## 1 0 226 ## 2 1 50 ``` Now we have a clean dataset ready for regression. --- ## Simple Regression: Social Media → Violence Our research question: **Does social media use increase the likelihood of engaging in violence during the January 6 attack?** Because `ChargedWithViolence` is binary (0/1), we use logistic regression. ``` r mod1 <- glm(ChargedWithViolence ~ SocialMedia, data = df_clean, family = binomial) tidy(mod1, conf.int = TRUE, exponentiate = TRUE) %>% filter(term != "(Intercept)") %>% mutate(across(where(is.numeric), ~ round(.x, 3))) ``` ``` ## # A tibble: 1 × 7 ## term estimate std.error statistic p.value conf.low conf.high ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 SocialMedia 0.74 0.273 -1.10 0.27 0.433 1.27 ``` - **Interpretation:** The odds ratio tells us how much the odds of a violence charge increase for social media users. - But this relationship may be **confounded** by age (younger defendants use social media more and may be more impulsive). --- ## Multiple Regression: Add a Confounder (Age) Recall our DAG from last time: **Age** affects both social media use and violence. ``` r mod2 <- glm(ChargedWithViolence ~ SocialMedia + Age, data = df_clean, family = binomial) tidy(mod2, conf.int = TRUE, exponentiate = TRUE) %>% mutate(across(where(is.numeric), ~ round(.x, 3))) ``` ``` ## # A tibble: 3 × 7 ## term estimate std.error statistic p.value conf.low conf.high ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 0.812 0.483 -0.43 0.667 0.313 2.10 ## 2 SocialMedia 0.718 0.278 -1.19 0.233 0.416 1.24 ## 3 Age 0.986 0.011 -1.38 0.169 0.965 1.01 ``` **Observations:** - The coefficient on `SocialMedia` may change after controlling for `Age`. - The adjusted odds ratio gives us an estimate **closer to the causal effect** (under the DAG's assumptions). --- ## Multiple Regression: Add Two Confounders Now we control for both **Age** and **Group Membership**, following our DAG. ``` r mod3 <- glm(ChargedWithViolence ~ SocialMedia + Age + GroupMember, data = df_clean, family = binomial) tidy(mod3, conf.int = TRUE, exponentiate = TRUE) %>% mutate(across(where(is.numeric), ~ round(.x, 3))) ``` ``` ## # A tibble: 4 × 7 ## term estimate std.error statistic p.value conf.low conf.high ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 0.608 0.499 -0.996 0.319 0.227 1.61 ## 2 SocialMedia 0.779 0.285 -0.875 0.381 0.445 1.37 ## 3 Age 0.985 0.011 -1.36 0.173 0.964 1.01 ## 4 GroupMember 3.12 0.329 3.46 0.001 1.64 5.96 ``` --- ## Visualizing Results A coefficient plot makes comparisons easy. ``` r plotme <- mod3 %>% tidy(conf.int = TRUE, exponentiate = TRUE) %>% filter(term != "(Intercept)") %>% ggplot(aes(x = estimate, y = term, xmin = conf.low, xmax = conf.high)) + geom_vline(xintercept = 1, linetype = "dashed", color = "gray50") + geom_pointrange(size = 1) + scale_x_log10(labels = comma) + labs( title = "Odds Ratios for Violence Charge", x = "Odds Ratio (log scale)", y = NULL, caption = "Dashed line at OR = 1 (no effect)" ) + theme_minimal(base_size = 14) ``` --- ``` r plotme ``` <!-- --> --- ## Predicted Probabilities What does the model imply for a 25‑year‑old vs. a 50‑year‑old? ``` r new_data <- crossing( SocialMedia = c(0, 1), GroupMember = c(0, 1), Age = c(25, 50) ) probsoutcome <- new_data %>% mutate( pred_prob = predict(mod3, newdata = ., type = "response") ) %>% arrange(Age, GroupMember, SocialMedia) ``` --- ``` r probsoutcome ``` ``` ## # A tibble: 8 × 4 ## SocialMedia GroupMember Age pred_prob ## <dbl> <dbl> <dbl> <dbl> ## 1 0 0 25 0.297 ## 2 1 0 25 0.247 ## 3 0 1 25 0.568 ## 4 1 1 25 0.506 ## 5 0 0 50 0.226 ## 6 1 0 50 0.185 ## 7 0 1 50 0.477 ## 8 1 1 50 0.415 ``` - **Takeaway:** Social media use might be associated with a lower predicted probability of a violence charge, and age maybe lowers the chance of violence, too. Group membership substantially increases the chance of violence. --- ## Your Turn 1. Load the Trump vote & terrorism data (resources on the class page). 2. Build a regression model that answers the question: *How does 2016 Trump support relate to domestic terrorism incidents?* 3. Consider **confounders** (e.g., population, region, economic conditions). 4. Produce: - One **regression table** (coefficients with standard errors / p‑values) - One **figure** (coefficient plot or marginal effects plot) 5. Write a short paragraph interpreting the key result. 6. Email to your TA. **Resources:** [Class regression help page](https://jnseawright.github.io/PS312/InClass/Regression.html) ``` r # Example code for your analysis model <- lm(terrorism_incidents ~ trump_vote_share + confounder1 + confounder2, data = your_data) summary(model) ``` ```