We will be analyzing a data set on inequality for a collection of 118 countries during the mid-1990s.
Begin by loading the data set into R, using the following command:
inequality <- read.csv("https://raw.githubusercontent.com/jnseawright/PS406/main/data/Inequality.csv")Look at the names of the variables in the data set, to make sure your copy downloaded correctly:
## [1] "X" "Code" "Country"
## [4] "Year" "Gini" "Polity"
## [7] "GDP" "Industry" "FuelExports"
## [10] "CommunistLegacy" "Region" "Expropriation"
## [13] "SettlerMortality" "vdem_corr" "militaryexpenditure"
## [16] "educationexpenditure" "electricityaccess" "netmigration"
Now carry out a bivariate regression, using the Gini coefficient (a measure of inequality) as the dependent variable and the Polity variable (a measure of democracy) as the independent variable. Suppose that the results are causal; what do they mean? What assumption is required? Draw a DAG representing your causal assumption. Identify at least two confounders and explain why they might bias the estimate.
You should now expand the model by incorporating logged GDP as a second independent variable.
Update your DAG to include this new variable. Does controlling for log(GDP) block any backdoor paths? Are there any potential bad controls (mediators, colliders) in this setup?
Add an interaction between Polity and log(GDP). Interpret the interaction coefficient. Use the interactions package to plot marginal effects. Discuss whether the interaction is substantively meaningful.
Create a dichotomous treatment variable: democracy = 1 if Polity > 0 else 0. Use lmw to estimate the ATE with URI (regression weights) and MRI (uniform weights). Plot the URI weights. Are any cases highly influential? Compare the two ATE estimates. What does this tell you about the sensitivity of your regression to weighting?
The library we used to carry out these tasks in class was lmw; you can find its documentation here.
As is often the case, a first recommended step in using this package is to make a special dataframe that includes only the variables you plan to use and then to get rid of missing data.