class: center, middle, inverse, title-slide .title[ # Choosing Control Variables: DAGs & the Back-Door Criterion ] .subtitle[ ## PS 312 ] .author[ ### Jaye Seawright ] .date[ ### 2026-04-16 ] --- ## Today's Roadmap 1. **Hook & Activation:** Quick review of back‑door paths (poll) 2. **"Bad Control" Diagnostic:** Three scenarios to sharpen your instincts 3. **Variable Dump Challenge:** Choose controls from a real dataset 4. **Case Study Deconstruction:** Analyze a published regression table 5. **Core Graded Activity:** Group DAG creation and justification **Goal:** Move from "I can draw a DAG" to "I know exactly which variables belong in my regression—and which don't." --- class: inverse, center, middle # 1. Hook & Activation ### Quick Review of the Back‑Door Criterion --- ## Recall from your Preparation A set of variables \( Z \) satisfies the **back‑door criterion** for \((X, Y)\) if: 1. No node in \( Z \) is a *descendant* of \( X \). 2. \( Z \) blocks **every** path between \( X \) and \( Y \) that starts with an arrow pointing *into* \( X \). --- > **Simple check:** Does adjusting for \( Z \) close all sneaky back‑door paths without opening any new ones? --- ## Let's Discuss **DAG:** `Parental Income` → `Education` → `Earnings` `Parental Income` → `Earnings` **Question:** To estimate the *total* effect of `Education` on `Earnings`, should we control for `Parental Income`? - A) Yes, it's a confounder. - B) No, it's a mediator. - C) Yes, but only if we also control for something else. - D) It doesn't matter. --- class: inverse, center, middle # 2. "Bad Control" Diagnostic ### Three Scenarios to Test Your Instincts --- ## Scenario 1: <!-- --> **Question:** You want the *total* effect of Campaign Contact on Turnout. Should you control for **Political Interest**? - [ ] Yes - [ ] No **Why?** --- ## Scenario 2: <!-- --> **Question:** You are studying whether *Charisma* affects *Talent*. Should you control for **Star Status**? - [ ] Yes - [ ] No **Why?** --- ## Scenario 3: <!-- --> **Question:** Is **Wealth** a confounder? Should you include it? - [ ] Yes - [ ] No **Why?** --- class: inverse, center, middle # 3. Variable Dump Challenge --- ## The Research Question > Does attending religious services increase an individual's annual charitable giving (in dollars)? **Data source:** General Social Survey (GSS) **Main variables:** - **Treatment:** `Religious Attendance` (0 = never, 8 = weekly+) - **Outcome:** `Charitable Giving` (total dollars donated to non‑religious charities) --- ## Available Variables in the Dataset - `age` (years) - `income` (annual household income) - `educ` (years of education) - `parent_relig` (how often parents attended services when R was child) - `altruism` (self‑reported agreement with "I often try to help others") - `pol_ideology` (1=extremely liberal to 7=extremely conservative) - `married` (1=married, 0=otherwise) - `childs` (number of children) - `south` (1=lives in South, 0=otherwise) --- ## Group Tasks (10 minutes) 1. **Draw a minimal DAG** on scratch paper. Include at least `Religious Attendance`, `Charitable Giving`, and two other variables from the list that you suspect are confounders. 2. **Select TWO variables** from the list that you would **definitely include as controls** in a regression. 3. **Select ONE variable** from the list that you are **actively deciding NOT to control for**, even though it's available. **We will discuss your decisions together as a group at the end of 10 minutes.** --- ## Discussion Debrief Let's hear from a few groups: - Which two variables did you pick as controls? Why are they confounders? - Which variable did you **not** control for? Why? **Watch out for:** - `altruism` → likely a **mediator** (attendance → altruism → giving) - `parent_relig` → strong **confounder** (upbringing affects both attendance and giving norms) - `income` → **confounder** (ability to give and possibly attendance patterns) --- class: inverse, center, middle # 4. Case Study Deconstruction ### DAG Thinking in the Wild --- ## Excerpt from a Published Study > **Research Question:** Does receiving mail that publicizes one's voting record (treatment) increase voter turnout (outcome)? **Study:** Gerber, Green, and Larimer (2008) – *Social Pressure and Voter Turnout* **Design:** Large‑scale field experiment with random assignment to treatment conditions. --- ## Regression Table (Simplified) | Variable | Model 1 (No Controls) | Model 2 (With Controls) | | :--- | :--- | :--- | | Treatment (Social Pressure Mail) | 0.081*** (0.008) | 0.076*** (0.008) | | Age | | 0.004*** (0.000) | | Party Registration (Democrat) | | 0.012 (0.010) | | **Voted in Previous Election** | | 0.342*** (0.010) | | Household Size | | -0.002 (0.003) | | *Constant* | 0.302*** | 0.085*** | | Observations | 80,000 | 80,000 | *Standard errors in parentheses. ***p < 0.01.* --- ## Discussion Questions 1. **Coefficient change:** The treatment effect goes from 0.081 to 0.076. Is this a **large** change? What does it tell us about the relationship between treatment and the added controls? 2. **Past turnout as a control:** Is `Voted in Previous Election` a **confounder** in this experiment? Why or why not? 3. **Why include it then?** If treatment was randomly assigned, why might the authors still include `Voted in Previous Election`? What benefit does it provide? 4. **A thought experiment:** Suppose the treatment was **not** randomized but instead a "Get Out the Vote" phone call that campaigns target at likely voters. How would your answer to Q2 change? --- ## Key Takeaways from the Case Study - **In randomized experiments,** covariates are *not* needed for unbiased estimation. - **But** including strong predictors of the outcome (like past turnout) **reduces standard errors** and improves precision. - **In observational studies,** variables like past turnout are often **confounders** that *must* be controlled to avoid bias. > The back‑door criterion tells you **when** to control. > Precision considerations tell you **why** you might still control even when unconfounded. --- class: inverse, center, middle # 5. Core Graded Activity ### Group DAG Creation & Justification --- ## Your Task (20 minutes) In your assigned group, create a DAG for a research question of your choice (or use the one provided below). **Requirements:** 1. **Draw the DAG** (on paper, whiteboard, or using DAGitty). Take a photo/screenshot. 2. **Identify at least three variables** that are **confounders** for the main relationship. These should be variables you would include as controls. 3. **Explain why one of these variables meets the definition of a confounder.** (1–2 sentences: "Variable Z is a confounder because it is a common cause of both X and Y...") 4. **Identify one variable that you explicitly chose NOT to include** as a control. Explain why it does *not* meet the definition (e.g., it's a mediator, collider, or post‑treatment). --- ## Submission **Each student** should submit to the TA via email **by the end of today**: - The DAG image (photo/screenshot) - The written justifications (confounder explanation + omitted variable explanation) --- class: inverse, center, middle # Wrap‑Up --- ## What's the Single Most Important Rule? > **"Close all back‑door paths. Do not control for mediators, colliders, or post‑treatment variables."** If you remember only one thing from today, make it this.