Due Date: April 18, 2025

Problem 1: Bivariate Regression in R

We will be analyzing a data set on inequality for a collection of 118 countries during the mid-1990s.

Begin by loading the data set into R, using the following command:

inequality <- read.csv("https://raw.githubusercontent.com/jnseawright/PS406/main/data/Inequality.csv")

Look at the names of the variables in the data set, to make sure your copy downloaded correctly:

names(inequality)

##  [1] "X"                    "Code"                 "Country"             
##  [4] "Year"                 "Gini"                 "Polity"              
##  [7] "GDP"                  "Industry"             "FuelExports"         
## [10] "CommunistLegacy"      "Region"               "Expropriation"       
## [13] "SettlerMortality"     "vdem_corr"            "militaryexpenditure" 
## [16] "educationexpenditure" "electricityaccess"    "netmigration"

Now carry out a bivariate regression, using the Gini coefficient (a measure of inequality) as the dependent variable and the Polity variable (a measure of democracy) as the independent variable. Suppose that the results are causal; what do they mean? What assumption is needed to claim that the findings are causal? (State the assumption as formally as you can and explain what it means substantively.)

Problem 2: Multivariate Regression in R

You should now expand the model by incorporating logged GDP as a second independent variable. The R commands generalize straightforwardly:

ineqlm2 <- lm(Gini ~ Polity + log(GDP), data=inequality)

summary(ineqlm2)

(Note that we are able to incorporate functions like log directly into the regression command.)

As in the last problem, interpret the results of this regression in causal terms and explain as carefully as possible what is assumed in the process.

Problem 3: Interaction Terms in R

You should now expand the model by including an interaction between the Polity variable and logged GDP:

ineqlm3 <- lm(Gini ~ Polity + log(GDP) + Polity:log(GDP), data=inequality)

Analyze and explain the results. Discuss the assumptions that would be involved in treating these results as causal.

Problem 4: Weights

Calculate the multivariate regression from problem 2 using URI to determine the implied weights for each case. Are they relatively even, or are some cases far more important than others?

To calculate an average treatment effect that weights every case equally, calculate the effect using MRI. Are the results substantively similar, or do the weights make a difference?

The library we used to carry out these tasks in class was lmw; you can find its documentation here.

As is often the case, a first recommended step in using this package is to make a special dataframe that includes only the variables you plan to use and then to get rid of missing data. You should also calculate a dichotomous version of the Polity variable that is a 1 if and only if Polity is greater than 0. Use that dichotomous version as your treatment variable to set up an lmw command that looks something like the following:

ineqlmw.uri <- lmw(~ TREATMENT_VARIABLE + CONTROL_VARIABLE(s) + INTERACTION_TERM(s), data=DATAFRAME,
                estimand = "ATE", method = "URI", treat = "DICHOTOMOUS_POLITY")

You can then do a plot of the result to see the implied weights across cases.

To calculate a regression using even weights, switch the method from URI to MRI. Then combine the weights with the outcome variable:

ineqest.mri <- lmw_est(ineqlmw.mri, outcome = "OUTCOME_VARIABLE")

Political Science 406 Lab 2: Regression

2025-03-24

Due Date: April 18, 2025

Problem 1: Bivariate Regression in R

Problem 2: Multivariate Regression in R

Problem 3: Interaction Terms in R

Problem 4: Weights