ps2.knit

Problem 4

Let’s consider the widely studied relationship between wealth and democracy. This problem will guide you through an analysis using the Quality of Governance dataset, helping you contrast the Conditional Expectation Function (CEF) with the Best Linear Predictor (BLP).

Data Loading: Run the following code to load the data. If you encounter issues with the rqog package, use the alternative CSV file provided.

# Option 1: Using the rqog package (preferred)
devtools::install_github("ropengov/rqog")
library(rqog)
qogts <- read_qog(which_data = "standard", data_type = "time-series")

# Option 2: Alternative if rqog doesn't work (uncomment and run)
# qogts <- read.csv("https://github.com/jnseawright/ps405/raw/refs/heads/main/Data/qog_sample.csv")

# Clean the data for analysis
library(dplyr)
qog_clean <- qogts %>%
  select(wdi_gdpcappppcon2017, vdem_libdem) %>%
  filter(!is.na(wdi_gdpcappppcon2017), !is.na(vdem_libdem)) %>%
  rename(gdp_pc = wdi_gdpcappppcon2017, democracy = vdem_libdem)

4a.

Create a visualization of the relationship between wealth (wdi_gdpcappppcon2017 or gdp_pc in the cleaned data) and democracy (vdem_libdem or democracy). Your plot should include:

The raw data points (use transparency if there are many observations)
A LOESS curve (to approximate the CEF)
A linear regression line (estimate of the BLP)
Clear labels, titles, and a legend

# Your code for 4a here
library(ggplot2)

# Create the plot with both LOESS (CEF approximation) and linear (BLP) fits
dem_wealth_plot <- ggplot(qog_clean, aes(x = gdp_pc, y = democracy)) +
  geom_point(alpha = 0.3, color = "gray50") +  # Raw data
  geom_smooth(method = "loess", se = TRUE, color = "blue", 
              aes(color = "LOESS (CEF approx)"), size = 1.2) +
  geom_smooth(method = "lm", se = TRUE, color = "red", 
              aes(color = "Linear (BLP)"), size = 1.2) +
  scale_color_manual(values = c("LOESS (CEF approx)" = "blue", 
                                "Linear (BLP)" = "red")) +
  labs(title = "Wealth and Democracy: CEF vs. BLP",
       x = "GDP per capita (constant 2017 USD)",
       y = "Liberal Democracy Score (VDem)",
       color = "Fit Type") +
  theme_minimal() +
  theme(legend.position = "bottom")

# Display the plot
dem_wealth_plot

Questions for 4a: 1. Describe what each curve (LOESS and linear) suggests about the relationship between wealth and democracy. 2. Which fit seems more appropriate for these data and why? 3. Based on the LOESS curve, does the relationship appear to be linear throughout the range of GDP values?

4b.

Fit the empirical approximation of the Best Linear Predictor connecting democracy and wealth. Report and interpret the coefficients.

# Fit the linear model (BLP)
blp_model <- lm(democracy ~ gdp_pc, data = qog_clean)

# Display model summary
summary(blp_model)

# Alternative: Using modelsummary for nicer output
library(modelsummary)
modelsummary(blp_model, stars = TRUE, output = "markdown")

Questions for 4b: 1. Interpret the intercept and slope coefficients in substantive terms. 2. What is the predicted democracy score for a country with $20,000 GDP per capita? Show your calculation. 3. Calculate and interpret R-squared. What does it tell us about this BLP?

4c.

Now let’s compare the BLP to a simple approximation of the CEF using grouped means:

# Create wealth groups
qog_clean <- qog_clean %>%
  mutate(wealth_group = case_when(
    gdp_pc < 10000 ~ "Low (<$10K)",
    gdp_pc >= 10000 & gdp_pc <= 30000 ~ "Medium ($10K-$30K)",
    gdp_pc > 30000 ~ "High (>$30K)"
  ))

# Calculate group means (simple CEF approximation)
group_means <- qog_clean %>%
  group_by(wealth_group) %>%
  summarize(
    mean_democracy = mean(democracy, na.rm = TRUE),
    mean_gdp = mean(gdp_pc, na.rm = TRUE),
    n = n()
  )

# Display group means
group_means

# Create comparison plot
library(ggplot2)

# Generate predictions from BLP for plotting
blp_predictions <- data.frame(
  gdp_pc = seq(min(qog_clean$gdp_pc, na.rm = TRUE), 
               max(qog_clean$gdp_pc, na.rm = TRUE), 
               length.out = 100)
)
blp_predictions$democracy_pred <- predict(blp_model, newdata = blp_predictions)

# Create the comparison visualization
comparison_plot <- ggplot(qog_clean, aes(x = gdp_pc, y = democracy)) +
  geom_point(alpha = 0.2, color = "gray50") +
  # BLP line
  geom_line(data = blp_predictions, 
            aes(x = gdp_pc, y = democracy_pred, color = "BLP"), 
            size = 1.5) +
  # Group means (simple CEF approximation)
  geom_point(data = group_means, 
             aes(x = mean_gdp, y = mean_democracy, color = "Group Means (CEF approx)"), 
             size = 4, shape = 17) +
  # Vertical lines at group boundaries
  geom_vline(xintercept = c(10000, 30000), linetype = "dashed", alpha = 0.5) +
  scale_color_manual(values = c("BLP" = "red", 
                                "Group Means (CEF approx)" = "darkgreen")) +
  labs(title = "Comparing BLP to Grouped Means (Simple CEF)",
       x = "GDP per capita (constant 2017 USD)",
       y = "Liberal Democracy Score",
       color = "Estimate Type") +
  theme_minimal() +
  theme(legend.position = "bottom")

comparison_plot

Questions for 4c: 1. How well does the BLP approximate the grouped means (our simple CEF approximation)? 2. In which wealth range does the BLP fit best? Where does it fit worst? 3. Discuss: Under what conditions might the BLP be a poor approximation of the true CEF for these data? 4. Calculate the mean squared error (MSE) for both the BLP and the grouped means approach (treating group means as predictions for all observations in that group). Which has lower MSE?

# Calculate MSE for BLP
blp_mse <- mean(residuals(blp_model)^2)

# Calculate MSE for grouped means approach
qog_with_group_preds <- qog_clean %>%
  left_join(select(group_means, wealth_group, mean_democracy), by = "wealth_group") %>%
  mutate(group_residual = democracy - mean_democracy)
group_mse <- mean(qog_with_group_preds$group_residual^2, na.rm = TRUE)

cat("BLP MSE:", round(blp_mse, 4), "\n")
cat("Grouped Means MSE:", round(group_mse, 4), "\n")
cat("Difference (BLP - Grouped):", round(blp_mse - group_mse, 4))

4d.

Modernization theory in political science suggests that democracy increases with wealth but at a decreasing rate (diminishing returns).

# Let's explore a non-linear specification
# Option 1: Polynomial (quadratic) model
poly_model <- lm(democracy ~ poly(gdp_pc, 2, raw = TRUE), data = qog_clean)
summary(poly_model)

# Option 2: Log transformation (common for diminishing returns)
log_model <- lm(democracy ~ log(gdp_pc), data = qog_clean)
summary(log_model)

# Compare models
library(modelsummary)
model_comparison <- list(
  "Linear (BLP)" = blp_model,
  "Quadratic" = poly_model,
  "Log-Linear" = log_model
)

modelsummary(model_comparison, stars = TRUE, output = "markdown")

# Create comparison plot
library(patchwork)

# Generate predictions from all models
comparison_data <- data.frame(
  gdp_pc = seq(min(qog_clean$gdp_pc), max(qog_clean$gdp_pc), length.out = 200)
)

comparison_data$linear_pred <- predict(blp_model, newdata = comparison_data)
comparison_data$quadratic_pred <- predict(poly_model, newdata = comparison_data)
comparison_data$log_pred <- predict(log_model, newdata = comparison_data)

# Reshape for plotting
library(tidyr)
comparison_long <- comparison_data %>%
  pivot_longer(cols = -gdp_pc, names_to = "model", values_to = "prediction") %>%
  mutate(model = factor(model, 
                        levels = c("linear_pred", "quadratic_pred", "log_pred"),
                        labels = c("Linear (BLP)", "Quadratic", "Log-Linear")))

# Plot all models together
model_comparison_plot <- ggplot(qog_clean, aes(x = gdp_pc, y = democracy)) +
  geom_point(alpha = 0.1, color = "gray50") +
  geom_line(data = comparison_long, 
            aes(x = gdp_pc, y = prediction, color = model), 
            size = 1.2) +
  scale_color_brewer(palette = "Set1") +
  labs(title = "Comparing Linear and Non-Linear Specifications",
       x = "GDP per capita",
       y = "Democracy Score",
       color = "Model") +
  theme_minimal() +
  theme(legend.position = "bottom")

model_comparison_plot

Questions for 4d: 1. Does the LOESS curve from 4a support the modernization theory prediction of diminishing returns? 2. Compare the linear (BLP), quadratic, and log-linear models. Which seems to best capture the relationship suggested by the LOESS curve? 3. What are the trade-offs between using a simple linear model (BLP) versus a more flexible specification? 4. If you were writing a paper on wealth and democracy, which model would you choose and why? Consider both statistical fit and substantive interpretability.

Problem Set 2

Problem 1

Problem 2

Problem 3

Problem 4