Due Date: June 5, 2026

Submit your lab here.

Problem 1: Double Selection with LASSO

Load King and Zeng’s dataset on possible predictors of state collapse, kingzeng.csv. The website for these data contains full files and a codebook.

Your goal is to estimate the effect of democracy (or another binary treatment) on homicide rates. Use rlassoEffects from the hdm package to perform double selection: first, select controls predicting the outcome; second, select controls predicting the treatment. Report the final ATE estimate and standard error. List which variables were selected.

Problem 2: Causal Forest

Use causal_forest from the grf package to estimate conditional average treatment effects (CATEs) for the same data. Plot the distribution of CATEs. Which variables appear most important for heterogeneity (use variable_importance)? Produce a violin plot of CATEs by a key covariate (e.g., region or income level).

Problem 3: Compare with Traditional Regression

Run a simple OLS regression of homicide on democracy, controlling for a few key variables (e.g., GDP, population). Compare the OLS estimate to the double‑selection and causal forest ATE (you can average the CATEs to get an ATE). Discuss possible reasons for differences.

Problem 4: Reflection

In a paragraph, explain how the machine learning approaches differ from the regression methods used earlier in the course. What are the trade‑offs?