class: center, middle, inverse, title-slide .title[ # 5: Approximating CEFs and Equalling Them ] .subtitle[ ## Linear Models ] .author[ ###
Jaye Seawright
] .institute[ ###
Northwestern Political Science
] .date[ ### Jan. 21, 2026 ] --- class: center, middle <style type="text/css"> pre { max-height: 400px; overflow-y: auto; } pre[class] { max-height: 200px; } </style> We have a theorem that, if the conditional expectation function is linear, then it equals the population OLS regression. --- What is a *theorem*? --- Our theorem posits that the CEF is linear: `$$E(y|x_{i}) = a + b x_{i}$$` for some values of `\(a\)` and `\(b\)`. --- Key property number one of the CEF: `$$E(y_{i} - E(y_{i}|x_{i})) = 0$$` --- ###The CEF Error * For any case `\(i\)`, the actual outcome `\(Y_{i}\)` will almost never be exactly equal to our prediction `\(E(Y_{i}|X_{i})\)`. * The difference is the CEF error: `\(Y_{i} - E(Y_{i}|X_{i})\)` --- * At the population level, we have an unlimited collection of these errors. Some are positive, some are negative. * Question: What happens if we take the average of all these errors? `\(E(Y_{i} - E(Y_{i}|X_{i}))\)` = ? --- * Why must the errors average to zero? Because if they didn't, our predictor wouldn't be the best one! --- Thought Experiment: * Suppose for a moment that `\(E(Y_{i} - E(Y_{i}|X_{i})) = c\)`, where `\(c > 0\)`. * This would mean that, on average, our predictor E[Y_i|X_i] is systematically too low by the amount c. * If we knew this, we could create a better predictor! We could just add c to our old one: New Predictor = `\(E(Y_{i}|X_{i}) + c\)` * This new predictor would have a smaller average error. But we defined `\(E(Y|X)\)` as the best predictor! (This is called proof by contradiction.) --- There's a better proof that relies on some probability theory: `\(E(Y_{i} - E(Y_{i}|X_{i})) = E(Y_{i}) - E(Y_{i}|X_{i})\)` `\(E(Y_{i}) - E(Y_{i}|X_{i}) = E(Y_{i}) - E(Y_{i}) = 0\)` --- Key property number two of the CEF: `$$E((y_{i} - E(y_{i}|x_{i})) x_{i}) = 0$$` --- What we're suggesting here is that the CEF error times `\(x_{i}\)` is equal to zero in expectation. --- Imagine a seesaw, with values of `\(x_{i}\)` being locations further left or right from the axis and values of the CEF error representing weights. What would happen if there was a large cluster of weights on one side or the other? --- ###Does the CEF Error Have Useful Information? If the CEF error is related to `\(x_{i}\)`, that means there is information in the error that could still have been predicted by `\(x\)` but wasn't. The best available predictor wouldn't allow that. --- Thought Experiment: * Suppose `\(E((y_{i} - E(y_{i}|x_{i})) * x_{i}) = c\)`, where `\(c > 0\)`. * This means that when `\(x_i\)` is large, our errors tend to be positive (our predictions are too low), and when `\(x_i\)` is small, our errors tend to be negative (our predictions are too high). * We could then build a better predictor by adding a small multiple of `\(x_i\)` to our old one: * New Predictor = `\(E(Y_{i}|x_{i}) + \delta * x_{i}\)` --- Now, let's go back to our theorem. `$$E(y|x_{i}) = a + b x_{i}$$` `$$E(y_{i} - E(y_{i}|x_{i})) = 0$$` `$$E((y_{i} - E(y_{i}|x_{i})) x_{i}) = 0$$` --- `$$E(y_{i} - a + b x_{i}) = 0$$` `$$a = E(y_{i}) - E(x_{i}) b$$` --- `$$E((y_{i} - a + b x_{i}) x_{i}) = 0$$` `$$E((y_{i} - E(y_{i}) + E(x_{i}) b + b x_{i}) x_{i}) = 0$$` `$$E((y_{i} - E(y_{i}))x_{i} - (x_{i} b - E(x_{i}) b) x_{i}) = 0$$` --- `$$b = \frac{E((y_{i} - E(y_{i}))x_{i})}{E((x_{i} - E(x_{i})) x_{i})} = \frac{cov(x,y)}{var(y)}$$` --- So the CEF equals the BLP when the CEF is linear. --- What happens when the CEF isn't linear? --- * The relationship between campaign spending and vote share likely has diminishing returns: the first million dollars matters more than the tenth million. * The relationship between education and social mobility plausibly follows an S-curve, with small gains at low education levels, rapid acceleration, then plateauing at higher levels. --- <img src="ApproximatingCEFs_files/figure-html/unnamed-chunk-2-1.png" width="80%" style="display: block; margin: auto;" /> --- ###How to proceed? 1. Model the complex, true relationship with flexible methods 2. Use a simple linear model that gives us one clear, interpretable number --- The BLP preserves the same crucial properties we saw in the CEF: * `\(E(e) = 0\)` (Errors average to zero across all observations) * `\(E(e * X) = 0\)` (Errors are uncorrelated with the independent variable) --- Even when the true relationship is curved, the BLP finds the line that makes prediction errors balanced across all values of X. --- <img src="ApproximatingCEFs_files/figure-html/unnamed-chunk-3-1.png" width="80%" style="display: block; margin: auto;" /> --- Let `\(Y = \text{Vote share}\)`, `\(X = \text{Campaign spending (millions)}\)` What the BLP gives us: * A single "average effect" of campaign spending across all spending levels * It will over-predict for extremely low-spending and extremely high-spending campaigns * It will under-predict for medium-spending campaigns * But overall, the errors balance out, with no linear information left in the residuals