A simple regression example

What causes inequality around the world? This question has received a great deal of research attention from social scientists of all kinds, and we will not solve it in these exercises. Nevertheless, an overly simplistic approach to the question can be useful as a stylized example. Consider the inequality dataset, available on the github site:

inequality <- read.csv("https://github.com/jnseawright/practice-of-multimethod/raw/main/data/inequality.csv")

We can start by briefly examining the contents of the dataset:

summary(inequality)

Let’s check whether democracy is connected with lower levels of economic equality, compared with dictatorships. We can do this by carrying out a bivariate regression, using the Gini coefficient (a measure of inequality) as the dependent variable and the Polity variable (a measure of democracy) as the independent variable. Recall that Gini coefficients increase as inequality gets worse, and also that the Polity measure increases as democracy goes up.

#install.packages(jtools)
library(jtools)
## 
## Attaching package: 'jtools'
## The following object is masked from 'package:openintro':
## 
##     movies
inequalitylm.1 <- lm(Gini ~ Polity, data=inequality)
summ(inequalitylm.1)

What do we see here? Describe and explain the results, as well as any assumptions needed in order for them to be meaningful.

One might like to incorporate control variables, obviously. The most commonly used control variable in regressions of this kind is logged GDP. We can include that in a simple multivariate regression:

inequalitylm.2 <- lm(Gini ~ Polity + I(log(GDP)), data=inequality)
summ(inequalitylm.2)

What does this regression show us? What assumptions do we need in order for these results to be taken seriously? What, in fact, should we do next?

Integrative case-study followup

Devise a research plan for case-study analysis that could test key causal inference assumptions for this regression, using resources available to you online. The two cases you should focus on are the Republic of Yemen and Zimbabwe. Describe the research you have designed to test each assumption of interest, carry out your plan, and discuss any modifications to the regression analysis that your research implies.

Be sure to look carefully at the historical causes of the cause (levels of democracy) in each country, searching for possible confounding variables as well as sources of measurement error. You may also wish to look for evidence about the causal pathway between democracy and inequality, as a way of finding such problems. Your work may lead you to discover other sorts of issues, as well; be creative in exploring the cases!

Case-selection Detective Work

Using the data, determine which case-selection rule was used to select these cases. Conduct one similar case study using surprising causes case selection. How do the insights generated by this case study compare with those produced by yesterday’s case studies?

Improving the Regression Using Qualitative Insights

On the basis of the case studies you have carried out, using the variables in the inequality data set and any others that you can add, create a refined regression analysis that does a better job of meeting key assumptions. Using your refined regression, carry out case selection using the deviant, extreme, typical, and any other rules of interest. Do you end up selecting the same cases as with the simpler regression model, or different cases? Choose one case and conduct a case study focused on it. Can you find further issues that could still be improved in the analysis?

Multi-method Interview

Access Lewis-Beck and Ratto’s 2013 Electoral Studies article, ``Economic voting in Latin America: A general model.’’ Which assumptions in this analysis could potentially be tested using in-depth interviews of voters? Carefully design an in-depth interview schedule to test as many assumptions of the model as possible. For each part of the interview, explain what multi-method task it is intended to fulfill and clarify the qualitative causal inferential strategy connected with it.

Now, find someone in the class and administer your interview. Take careful notes, analyze your results, and describe any issues for the model that arise from your interview. What next steps would you take if you wanted to pursue this line of research further?

Control Variables in a Literature

This problem works well as a group activity.

Find three to five quantitative articles on the same outcome (for example, three articles on the causes of civil wars). Make a master list of all the control variables used in those articles, as well as which articles use each variable. For each variable, determine whether it might be a good control variable that helps eliminate confounding — but also whether it might be a bad control variable that is either an instrument or post-treatment. How close do your articles come to including the good control variables and excluding the bad ones?

How much do the articles across a particular subject area agree about the control variables that should be used, and how much do they disagree? When they disagree, is it because they are exploring different causes, or is it for some other reason?

Case Selection in a Literature

For a research topic that interests you, find five published case studies or multi-method studies with a significant case-study component. How, if at all, is case selection described in these studies? What advantages and disadvantages do you see in the case selection processes used? Describe how you would select cases if you were to repeat these studies, and explain why.

Discussion Questions

Find an example of research using regression in a way that you regard as successful from your area of research. What makes this application of regression successful? What is regression used for, what assumptions are needed, and to what extent do you regard the results as credible?

What difference does it make for the qualitative part of multi-method causal inference when the cases under study are historical vs. contemporary? What about when they are individuals vs. organizations vs. countries?