Process tracing

Qualitative researchers have done exceptional work assembling exercises and examples to help scholars work through ideas connected with process tracing. The exercises assembled by David Collier are an outstanding exemplar. As a check on understanding, a skill-building exercise, or even for the sheer pleasure of the intellectual exercise, we recommend working through at least several of them.

Focus group for process tracing

Join a group of about 4 other students. Come up with (a) a causal question, (b) a main hypothesis for which students in this class are a part of the relevant population and a focus group could be a source of potentially useful causal process observations, and (c) a set of plausible alternative hypotheses. Think about possible implications of each hypothesis and devise questions and activities for your focus group to test at least some of those implications.

Plan a focus group (if this is a new method for you, Jennifer Cyr’s book is an exceptional resource), preparing a list of questions, topics, or tasks that have a good chance of producing relevant causal-process observations. List these out explicitly, and explain your reasoning. Pair up with another group and carry out your focus group plan. Return the favor by also serving as a focus group for them.

Now, with your group, carry out process tracing. Did the focus group produce evidence that fits your theory but is hard to explain given other theories, or not? Alternatively, did you discover ideas for new theories that you did not initially expect?

How could a survey be most helpful in expanding on what you have learned?

Process-tracing elite messaging on climate change

Public opinion research suggests that American public opinion on climate change follows many of the same patterns as the other, polarized, party-encrusted divides that have prevented policy movement on issues such as inequality, gun violence, immigration, public health, and many other areas of concern to Americans and the rest of the world.

How and especially why have American elites chosen specific messages about climate change over time? Surely we can all come up with our own ideas and hypotheses, and there is value to this exercise! So start by listing your own preferred hypothesis about how American political elites talk about climate change, and why.

Of course, as we know, good process tracing requires the best available understanding of the subject matter in order to provide us with the most comprehensive possible set of alternative explanations. Many of us are not subject-matter experts on the politics of climate change! How can we efficiently move in the direction of this kind of understanding?

Following the research design proposed in the book, we need a collection of texts in which American political elites speak (about climate change and, perhaps, other topics), as well as data about the extent to which Americans are thinking about climate. We can get a useful collection of elite speech from examining transcripts of Meet the Press, downloaded from NBC’s website. These transcripts are, of course, under copyright, so we can’t provide the raw data. We can, however, provide processed data showing word usage by date.

The following command will load the processed data directly onto your computer from github:

library(RCurl)
## 
## Attaching package: 'RCurl'
## The following object is masked from 'package:tidyr':
## 
##     complete
meetpressurl <- getURL("https://raw.githubusercontent.com/jnseawright/practice-of-multimethod/refs/heads/main/meet-the-press/meetpressdtm.csv")
meetpressDTM <- read.csv(text = meetpressurl)

We are also interested in looking at the extent to which the general public is paying attention to climate as a topic. (In deeper investigation, we would want to look at whether this is positive, negative, or polarized attention, but this is a good starting place.) To measure this, we have monthly data from Google Trends on the amount of U.S. search traffic related to the topic of climate:

climategoogletrendsdataurl <- getURL("https://raw.githubusercontent.com/jnseawright/practice-of-multimethod/refs/heads/main/meet-the-press/climate.csv")
climategoogletrendsdata <- read.csv(text = climategoogletrendsdataurl)

We can now patch the two together using dates to connect them. The inner join here combines together the rows that have matching months.

#install.packages("parsedate")
#install.packages("tidyverse")
#install.packages("lubridate")
library(parsedate)
## 
## Attaching package: 'parsedate'
## The following object is masked from 'package:readr':
## 
##     parse_date
library(tidyverse)
library(lubridate)
climategoogletrendsdata$month <- parse_date(climategoogletrendsdata$month)
meetpressDTM$month <- floor_date(parse_date(meetpressDTM$doc_id),unit="month")
climatecombined.data <- inner_join(climategoogletrendsdata, meetpressDTM, by="month")

In the combined data, the month'' variable is first, followed by the Google Trendsclimate’’ variable. After that come two ID variables for each specific show of Meet the Press. Over 48,000 other variables related to word use on Meet the Press follow. Hence, we can run a Lasso regression using the following code, which excludes the month and ID variables from the analysis and splits the outcome variable from the vocabulary:

#install.packages("glmnet")
library(glmnet)
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## Loaded glmnet 4.1-8
climatelasso <- glmnet(climatecombined.data[,5:ncol(climatecombined.data)], climatecombined.data$climate.x)
climatecoefs <- coef(climatelasso, s = 0.1)
rownames(climatecoefs)[order(abs(climatecoefs), decreasing=TRUE)][2:21]

The code above runs the Lasso regression using the glmnet command, stores the gigantic list of resulting coefficents (most of which are zero) in the object climatecoefs, and then finds the twenty terms with the largest coefficents using the last command. Looking at those terms, can you form any hypotheses about social or political dynamics in elite messaging that could be driving US public opinion about climate change?

NOTE: some computers may run into difficulty with the above code, depending on the available memory and other configuration details. If that is the case for your computer, try running the lasso on only some columns of the matrix of words in the command above. Instead of [,5:ncol(climatecombined.data)], try choosing a smaller number for the upper limit. This will arbitrarily leave some words out of the analysis, so it is suboptimal, but it should reduce memory demands and processing time if these become a constraint.

How would you proceed to set up a process-tracing research design to explore those hypotheses? Spell out your plan as carefully as possible, pointing to possible data sources and modes of analysis.

Discussion Questions

In what ways does it make a difference for multi-method design whether a qualitative scholar is engaged in a form of process tracing versus some other kind of qualitative method?

This book’s discussion of process tracing has drawn on Bayesian ideas, but has not gone all the way toward full formal Bayesian math. Would it be helpful for multi-method research if qualitative scholars fully formalized their research as Bayesian statistical problems, or would that get in the way of the qualitative scholarship?