After 2016, there's been an increased interest in predicting elections. Who predicted it best? Am I getting better? Etc.

One quirk to predicting a presidential election is less than a half-dozen races matter, the swing states. Consequently, about 40 states are easy to predict. If we were to use a naive accuracy metric for the predictions, it will always be greater than or equal to 80% accuracy (40 solid states predicted correctly out of 50).

Slightly cleverer fellows might propose the Brier Score \[ BS = \frac{1}{N}\sum_{k=1}^{N} (p_{k} - o_{k})^{2} \] if we predict outcome \(o_{k}\) with probability \(p_{k}\). The lower the score, the closer our predictions are to observations, hence the better the forecast. We also observe \(0\leq BS\leq1\). Let's work through some examples:

If we predicted 2012's outcomes will be the same as 2008, we'd have a Brier score of 0.05882353.
If we predicted the 2016 presidential election would be the same as 2012, we'd have a Brier score of 0.1372549.
If we do predict the solid states remain solid, and 50% for the remaining 10 swing states, then we'd have a Brier score of generically 0.0625, quite a bit better than one would expect!

We can game the Brier score, especially for something like presidential election forecasting. I've computed the Brier scores for a variety of forecasts:

Forecaster	Brier score	Electoral-delegate weighted Brier
FiveThirtyEight	0.0665612	0.0931435
The Economist	0.0603016	0.0865332
Inside Elections	0.0654902	0.1018216
Sabato’s Crystal Ball	0.0736765	0.1168076
Cook Political Report	0.0737255	0.0887128

The spread of scores is somewhere between \(0.06\lt BS\leq 0.08\) for these forecasters, which is...good? It's unintuitive to compare scores which differ by ~0.003; and why are we giving "so much real-estate" \(0.25\lt BS\lt 1\) to worse than guessing?

Entropy

Another solution, more sensible, is to use the logarithm of the probability predicted for the observed outcome. This is given the unfortunate name of "entropy" (more popularly known as "log loss"): \[ H(pred) = \sum^{N}_{k=1}-\log_{2}(p_{k}) \] where there are N predictions made, and \(p_{k}\) is the probability assigned to the outcome of prediction \(k\). We use the base-2 logarithm because we can interpret the entropy in terms of "bits". The worse the prediction, the higher the entropy.

This puts all races on equal footing. We can multiply by the number of electoral delegates at stake. This produces the electoral-delegate weighted entropy \[ H_{ev}(pred) = \sum^{N}_{k=1}-d(k)\log_{2}(p_{k}) \] where \(d(k)\) are the electoral delegates at stake in race \(k\), and we predicted the winner with probability \(p_{k}\).

But we don't want to just consider penalizing bad predictions, we want to also incorporate predicting the wrong outcome (i.e., predicting the wrong person will win the presidency, e.g., predicting Clinton will win 2016). An apples-to-apples scoring function would be \[ H_{out} = -538\log_{2}(p) \] where the forecaster predicts the winner with probability p. We then rate a forecaster by summing the \(H_{out}+H_{ev}\) and, as always, the smaller score is better. The same forecast from the table of Brier scores has the following entropy scores:

name	Entropy	EV Entropy	Outcome Entropy	Total Entropy
FiveThirtyEight	15.48132	210.9518	982.5133	1193.465
The Economist	13.38435	193.3309	1452.0608	1645.392
Inside Elections	51.00000	538.0000	711.9190	1249.919
Sabato’s Crystal Ball	26.41833	353.1973	708.3173	1061.515
Cook Political Report	51.00000	538.0000	708.3173	1246.317

The main take-away is: if you want to look smart, use the Brier score, be confident with predicting solid states, and predict swing states with 50% probability. It's a great way to mislead people into thinking you are a fantastic forecaster.

Political Arithmetic

Sunday, July 12, 2020

Prediction is Easy, Brier scores are for lying reprobates

Entropy

No comments:

Post a Comment