After 2016, there's been an increased interest in predicting elections. Who predicted it best? Am I getting better? Etc.
One quirk to predicting a presidential election is less than a half-dozen races matter, the swing states. Consequently, about 40 states are easy to predict. If we were to use a naive accuracy metric for the predictions, it will always be greater than or equal to 80% accuracy (40 solid states predicted correctly out of 50).
Slightly cleverer fellows might propose the Brier Score \[ BS = \frac{1}{N}\sum_{k=1}^{N} (p_{k} - o_{k})^{2} \] if we predict outcome \(o_{k}\) with probability \(p_{k}\). The lower the score, the closer our predictions are to observations, hence the better the forecast. We also observe \(0\leq BS\leq1\). Let's work through some examples:
- If we predicted 2012's outcomes will be the same as 2008, we'd have a Brier score of 0.05882353.
- If we predicted the 2016 presidential election would be the same as 2012, we'd have a Brier score of 0.1372549.
- If we do predict the solid states remain solid, and 50% for the remaining 10 swing states, then we'd have a Brier score of generically 0.0625, quite a bit better than one would expect!
| Forecaster | Brier score | Electoral-delegate weighted Brier | 
|---|---|---|
| FiveThirtyEight | 0.0665612 | 0.0931435 | 
| The Economist | 0.0603016 | 0.0865332 | 
| Inside Elections | 0.0654902 | 0.1018216 | 
| Sabato’s Crystal Ball | 0.0736765 | 0.1168076 | 
| Cook Political Report | 0.0737255 | 0.0887128 | 
The spread of scores is somewhere between \(0.06\lt BS\leq 0.08\) for these forecasters, which is...good? It's unintuitive to compare scores which differ by ~0.003; and why are we giving "so much real-estate" \(0.25\lt BS\lt 1\) to worse than guessing?
Entropy
Another solution, more sensible, is to use the logarithm of the probability predicted for the observed outcome. This is given the unfortunate name of "entropy" (more popularly known as "log loss"): \[ H(pred) = \sum^{N}_{k=1}-\log_{2}(p_{k}) \] where there are N predictions made, and \(p_{k}\) is the probability assigned to the outcome of prediction \(k\). We use the base-2 logarithm because we can interpret the entropy in terms of "bits". The worse the prediction, the higher the entropy.
This puts all races on equal footing. We can multiply by the number of electoral delegates at stake. This produces the electoral-delegate weighted entropy \[ H_{ev}(pred) = \sum^{N}_{k=1}-d(k)\log_{2}(p_{k}) \] where \(d(k)\) are the electoral delegates at stake in race \(k\), and we predicted the winner with probability \(p_{k}\).
But we don't want to just consider penalizing bad predictions, we want to also incorporate predicting the wrong outcome (i.e., predicting the wrong person will win the presidency, e.g., predicting Clinton will win 2016). An apples-to-apples scoring function would be \[ H_{out} = -538\log_{2}(p) \] where the forecaster predicts the winner with probability p. We then rate a forecaster by summing the \(H_{out}+H_{ev}\) and, as always, the smaller score is better. The same forecast from the table of Brier scores has the following entropy scores:
| name | Entropy | EV Entropy | Outcome Entropy | Total Entropy | 
|---|---|---|---|---|
| FiveThirtyEight | 15.48132 | 210.9518 | 982.5133 | 1193.465 | 
| The Economist | 13.38435 | 193.3309 | 1452.0608 | 1645.392 | 
| Inside Elections | 51.00000 | 538.0000 | 711.9190 | 1249.919 | 
| Sabato’s Crystal Ball | 26.41833 | 353.1973 | 708.3173 | 1061.515 | 
| Cook Political Report | 51.00000 | 538.0000 | 708.3173 | 1246.317 | 
The main take-away is: if you want to look smart, use the Brier score, be confident with predicting solid states, and predict swing states with 50% probability. It's a great way to mislead people into thinking you are a fantastic forecaster.
 
 
No comments:
Post a Comment