Regression Kickoff: Social Media & Violence on Jan 6

class: center, middle, inverse, title-slide

.title[
# Regression Kickoff: Social Media & Violence on Jan 6
]
.subtitle[
## PS 312
]
.author[
### Jaye Seawright
]
.date[
### 2026-04-14
]

---

## Today's Activity

- **Goal:** Use regression to understand how electoral support for Trump in 2016 relates to domestic terrorism in subsequent years.
- **Deliverable:** One table, one figure, and a short paragraph per group.
- **Why regression?** It estimates the **strength** and **uncertainty** of relationships while adjusting for confounders.

To warm up, we'll work through a regression example with the **January 6 defendants data**—the same data we used to build DAGs last time.

---

## The Data: January 6 Defendants

``` r
# Read the data (adjust path if needed)
df_raw <- read_csv("data/final_merged_data.csv")

# Quick look at the variables we care about
df_raw %>%
  select("Case ID", Age, contains("SocialMedia"), contains("ChargedWithViolence")) %>%
  head()
```

```
## # A tibble: 6 × 11
##   `Case ID`     Age   SocialMedia.jaye SocialMediaBroadcast…¹ SocialMedia.evelyn
##   <chr>         <chr>            <dbl>                  <dbl>              <dbl>
## 1 01172023_JA_… #                   NA                     NA                 NA
## 2 01282021_RNAR 40                  NA                     NA                  1
## 3 06132023_RZA  22                  NA                     NA                 NA
## 4 03112021_JHA  26                  NA                     NA                 NA
## 5 04072021_DPA… 43                  NA                     NA                  1
## 6 05122021_TBA  39                  NA                     NA                 NA
## # ℹ abbreviated name: ¹SocialMediaBroadcast.jaye
## # ℹ 6 more variables: SocialMediaBroadcast.evelyn <dbl>, SocialMedia.mia <dbl>,
## #   SocialMediaBroadcast.mia <dbl>, ChargedWithViolence.jaye <dbl>,
## #   ChargedWithViolence.evelyn <dbl>, ChargedWithViolence.mia <dbl>
```

---

**Challenge:** Three coders (`.jaye`, `.evelyn`, `.mia`) each recorded whether a defendant used social media and whether they were charged with violence. We need a single, combined measure. And we have a messy group membership variable.

---

## Combining Multiple Coders

Use `coalesce()` to take the first non‑missing value across the three coder columns. And clean up the group membership variable with 'case_when()'.

``` r
df_clean <- df_raw %>%
  mutate(
    SocialMedia = coalesce(SocialMedia.jaye, SocialMedia.evelyn, SocialMedia.mia),
    ChargedWithViolence = coalesce(ChargedWithViolence.jaye, ChargedWithViolence.evelyn, ChargedWithViolence.mia),
    Age = as.numeric(if_else(Age == "#", NA_character_, Age)),
    
    # Create binary group membership
    GroupMember = case_when(
      str_detect(`Group affiliation`, regex("no\\s*known|unknown", ignore_case = TRUE)) ~ 0,
      is.na(`Group affiliation`) | `Group affiliation` == "" ~ 0,
      TRUE ~ 1
    )
  ) %>%
  filter(!is.na(SocialMedia), !is.na(ChargedWithViolence))
```

---

``` r
# Check the combined variables
df_clean %>%
  count(SocialMedia, ChargedWithViolence) %>%
  spread(ChargedWithViolence, n, fill = 0)
```

```
## # A tibble: 2 × 3
##   SocialMedia   `0`   `1`
##         <dbl> <dbl> <dbl>
## 1           0    79    35
## 2           1   122    40
```

``` r
# Check the group count
df_clean %>% count(GroupMember)
```

```
## # A tibble: 2 × 2
##   GroupMember     n
##         <dbl> <int>
## 1           0   226
## 2           1    50
```

Now we have a clean dataset ready for regression.

---

## Simple Regression: Social Media → Violence

Our research question: **Does social media use increase the likelihood of engaging in violence during the January 6 attack?**

Because `ChargedWithViolence` is binary (0/1), we use logistic regression.

``` r
mod1 <- glm(ChargedWithViolence ~ SocialMedia,
            data = df_clean,
            family = binomial)

tidy(mod1, conf.int = TRUE, exponentiate = TRUE) %>%
  filter(term != "(Intercept)") %>%
  mutate(across(where(is.numeric), ~ round(.x, 3)))
```

```
## # A tibble: 1 × 7
##   term        estimate std.error statistic p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
## 1 SocialMedia     0.74     0.273     -1.10    0.27    0.433      1.27
```

- **Interpretation:** The odds ratio tells us how much the odds of a violence charge increase for social media users.
- But this relationship may be **confounded** by age (younger defendants use social media more and may be more impulsive).

---

## Multiple Regression: Add a Confounder (Age)

Recall our DAG from last time: **Age** affects both social media use and violence.

``` r
mod2 <- glm(ChargedWithViolence ~ SocialMedia + Age,
            data = df_clean,
            family = binomial)

tidy(mod2, conf.int = TRUE, exponentiate = TRUE) %>%
  mutate(across(where(is.numeric), ~ round(.x, 3)))
```

```
## # A tibble: 3 × 7
##   term        estimate std.error statistic p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
## 1 (Intercept)    0.812     0.483     -0.43   0.667    0.313      2.10
## 2 SocialMedia    0.718     0.278     -1.19   0.233    0.416      1.24
## 3 Age            0.986     0.011     -1.38   0.169    0.965      1.01
```

**Observations:**
- The coefficient on `SocialMedia` may change after controlling for `Age`.
- The adjusted odds ratio gives us an estimate **closer to the causal effect** (under the DAG's assumptions).

---
## Multiple Regression: Add Two Confounders

Now we control for both **Age** and **Group Membership**, following our DAG.

``` r
mod3 <- glm(ChargedWithViolence ~ SocialMedia + Age + GroupMember,
            data = df_clean,
            family = binomial)

tidy(mod3, conf.int = TRUE, exponentiate = TRUE) %>%
  mutate(across(where(is.numeric), ~ round(.x, 3)))
```

```
## # A tibble: 4 × 7
##   term        estimate std.error statistic p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
## 1 (Intercept)    0.608     0.499    -0.996   0.319    0.227      1.61
## 2 SocialMedia    0.779     0.285    -0.875   0.381    0.445      1.37
## 3 Age            0.985     0.011    -1.36    0.173    0.964      1.01
## 4 GroupMember    3.12      0.329     3.46    0.001    1.64       5.96
```

---

## Visualizing Results

A coefficient plot makes comparisons easy.

``` r
plotme <- mod3 %>%
  tidy(conf.int = TRUE, exponentiate = TRUE) %>%
  filter(term != "(Intercept)") %>%
  ggplot(aes(x = estimate, y = term, xmin = conf.low, xmax = conf.high)) +
  geom_vline(xintercept = 1, linetype = "dashed", color = "gray50") +
  geom_pointrange(size = 1) +
  scale_x_log10(labels = comma) +
  labs(
    title = "Odds Ratios for Violence Charge",
    x = "Odds Ratio (log scale)",
    y = NULL,
    caption = "Dashed line at OR = 1 (no effect)"
  ) +
  theme_minimal(base_size = 14)
```

---

``` r
plotme
```

![](regressionkickoff_files/figure-html/coef-plot-2-1.png)

---

## Predicted Probabilities

What does the model imply for a 25‑year‑old vs. a 50‑year‑old?

``` r
new_data <- crossing(
  SocialMedia = c(0, 1),
  GroupMember = c(0, 1),
  Age = c(25, 50)
)

probsoutcome <- new_data %>%
  mutate(
    pred_prob = predict(mod3, newdata = ., type = "response")
  ) %>%
  arrange(Age, GroupMember, SocialMedia)
```

---

``` r
probsoutcome
```

```
## # A tibble: 8 × 4
##   SocialMedia GroupMember   Age pred_prob
##         <dbl>       <dbl> <dbl>     <dbl>
## 1           0           0    25     0.297
## 2           1           0    25     0.247
## 3           0           1    25     0.568
## 4           1           1    25     0.506
## 5           0           0    50     0.226
## 6           1           0    50     0.185
## 7           0           1    50     0.477
## 8           1           1    50     0.415
```

- **Takeaway:** Social media use might be associated with a lower predicted probability of a violence charge, and age maybe lowers the chance of violence, too. Group membership substantially increases the chance of violence.

---

## Your Turn

1. Load the Trump vote & terrorism data (resources on the class page).
2. Build a regression model that answers the question:
   *How does 2016 Trump support relate to domestic terrorism incidents?*
3. Consider **confounders** (e.g., population, region, economic conditions).
4. Produce:
   - One **regression table** (coefficients with standard errors / p‑values)
   - One **figure** (coefficient plot or marginal effects plot)
5. Write a short paragraph interpreting the key result.
6. Email to your TA.

**Resources:**  
[Class regression help page](https://jnseawright.github.io/PS312/InClass/Regression.html)

``` r
# Example code for your analysis
model <- lm(terrorism_incidents ~ trump_vote_share + confounder1 + confounder2,
            data = your_data)
summary(model)
```
```