hw1.knit

Homework 1

There has been a great deal of discussion from Americans and others worldwide about the patterns of immigration arrests carried out by ICE during 2025. In this assignment, we will look at some data about these arrests, collected by the Deportation Data Project.

For this assignment, you are free to work individually or in groups. If you choose to work collaboratively, please include your name and the names of the people you worked with on your submitted assignment. You will not be penalized for collaborating.

Problem 1

Let’s figure out whether there have been equal numbers of arrests in states with Democratic and Republican governors. We have a dataset pulled from Wikipedia that lists details about governors, and the deportation data. Let’s start by getting both of these loaded into our R workspace.

governor_data <- read.csv("https://github.com/jnseawright/ps210/raw/refs/heads/main/Data/stategovs.csv")

ice_arrests <- read.csv("https://github.com/jnseawright/ps210/raw/refs/heads/main/Data/icearrests.csv")

Now what we want to do is create a new variable in the ICE data that records the partisanship of the state governor where the arrest happens.

#This command creates a new empty variable called Party.
ice_arrests$Party <- NA

#This block of commands is going to loop through the ice_arrests database and 
#check the relevant partisanship of the governor for each arrest
for (i in 1:nrow(ice_arrests)){
  #This command is checking if a given arrest happened in one of the 50 states.
  #Some arrests have no recorded location, some happen in international travel,
  #some happen on military bases, etc. For those, we'll record the party as
  #missing.
  if (!ice_arrests$State[i] %in% levels(as.factor(governor_data$State)))
    ice_arrests$Party[i] <- NA
    #When the party isn't missing, we'll set it from the governor data.
    else ice_arrests$Party[i] <- governor_data$Party[governor_data$State==ice_arrests$State[i]]
}

We can now ask where arrests are actually happening.

table(ice_arrests$Party)

What does this tell us? Explain in your own words what we learn from this about partisanship and ICE arrest patterns. What conclusions can we draw, and what further information do you want before we start interpreting these results?

Problem 2

Are there any challenges involved in interpreting the relationship between partisanship and ICE arrests as a causal relationship? In answering, please work through the four questions for causal inference discussed in class.

Propose at least three possible confounding variables that you’d like to see explored, and explain why you believe each of these variables would meet the criteria for a confounding variable.

Problem 3

Let’s add one additional variable to the analysis. We’ll look at state populations, partisanship, and ICE arrests all at once. Let’s start by adding data on population to our ice_arrests dataset.

state_pops <- read.csv("https://github.com/jnseawright/ps210/raw/refs/heads/main/Data/statepops.csv")

Now we can use a version of our code from above that copies in state populations instead of partisanship.

#Run the following two commands once if you haven't before, without the hashtag:
#install.packages("tidyverse")
#install.packages("ggplot2")

#Load libraries
library(tidyverse)
library(ggplot2)

#This command creates a new empty variable called Population.
ice_arrests$Population <- NA

#This block of commands is going to loop through the ice_arrests database and 
#check the relevant population of the state for each arrest
for (i in 1:nrow(ice_arrests)){
  #This command is checking if a given arrest happened in one of the 50 states.
  #Some arrests have no recorded location, some happen in international travel,
  #some happen on military bases, etc. For those, we'll record the population as
  #missing.
  if (!ice_arrests$State[i] %in% levels(as.factor(state_pops$State)))
    ice_arrests$Population[i] <- NA
    #When the population isn't missing, we'll set it from the state population data.
    else ice_arrests$Population[i] <-   
        state_pops$Population2024[state_pops$State==ice_arrests$State[i]]
}

#This variable often reads in with commas and gets treated as text, so we'll 
#make sure to convert it to an actual number.
ice_arrests$Population <- parse_number(ice_arrests$Population)

#Group the states into population categories.
ice_arrests %>% mutate(Population_group = cut(Population, 
                                              breaks = quantile(Population, 
                                                                probs = seq(0, 1, 0.2), 

                                                                na.rm = TRUE),
                                                                                                                                                            labels = c("Very Small", 
                                                                  "Small", "Medium",
                                                                  "Large", 
                                                                  "Very Large"),
                                                       include.lowest = TRUE)) %>% 
#Plot the relationship between arrests, population, and party.
  ggplot(aes(x = Population_group, fill = Party)) +
  geom_bar(position = "dodge") +
  labs(x = "Population", y = "Number of Arrests", 
       title = "Number of Arrests by Population and Party") +
  scale_fill_manual(values = c("blue", "red")) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  theme_minimal()

What do you see? What does adding information about population do to the relationship between party and numbers of arrests? (You might consider answering that this additional information eliminates the previous relationship, reverses it, leaves it unchanged, or complicates it in other ways you explain.) Explain your reasoning.

Problem 4

Is this an area of research where a social-science experiment makes sense as a research tool? If so, suggest an example of an experiment that you think would help illuminate the questions at work here. If not, explain your reasoning for why this is a poor area for using experiments.

Please submit your completed assignment at:

https://canvas.northwestern.edu/courses/235934/assignments/1648222