Tuesday, September 1, 2020

Geometric Mean for Combining Probabilities, addendum

We've noted before the geometric mean is best when combining several different forecasts together into one. Today I'd like to discuss how to do this specifically for election forecasts.

If we naively try to combine the forecasts that, e.g., Biden will win Nevada, using the forecasts as of August 30th:

ForecasterPr(Biden)Pr(Trump)
DecisionDeskHQ.com77.8%22.2%
The Economist89%11%
FiveThirtyEight77%23%
JHKForecasts.com86.1%13.9%
OurProgress.org79%21%
PluralVote.com76.6%24.4%
ReedForecasts.com62.5%37.5%

We would obtain \[ \Pr(Biden) = 77.87021\%\tag{1a} \] but we would also find \[ \Pr(Trump) = 20.33664\%\tag{1b} \] ...which sums to 98.20685%, which is odd. These probabilities should always sum to 100%, what gives?

The solution is to first transform the probabilities into odds. Then take the geometric mean of the odds, and finally transform back to probabilities.

Why does this work? Well, as odds, a forecast would look like \[ O(\text{Biden wins NV}) = \frac{\Pr(Biden)}{\Pr(Trump)} = \frac{N_{\text{Biden}}}{N_{\text{Trump}}}. \tag{2}\] Taking the geometric mean of the odds gives us better approximations for the ratio of frequencies, which could then be transformed back into probabilities. We can obtain the probability of Biden winning Nevada from the odds by \[ \Pr(Biden) = \frac{O(Biden)}{1 + O(Biden)} \tag{3}\] and for Trump we could note \[ O(Trump) = 1/O(Biden) \] then find \[ \Pr(Trump) = \frac{O(Trump)}{1 + O(Trump)} = \frac{1}{1 + O(Biden)}. \tag{4}\] Hence we find adding Eq (3) to Eq (4) that \[ \Pr(Biden) + \Pr(Trump) = 1 \tag{5} \] probabilities sum to 100%, as expected and desired.

Applied to our forecasts, we find the odds given as:

ForecasterOdds(Biden)
DecisionDeskHQ.com3.504505
The Economist8.090909
FiveThirtyEight3.347826
JHKForecasts.com6.194245
OurProgress.org3.761905
PluralVote.com3.273504
ReedForecasts.com1.666667

The geometric mean of these odds is approximately 3.829059, hence a probability of Biden winning Nevada approximately 79.29203% and Trump has a 20.70797% chance of winning Nevada. This makes a difference for Biden of about 2%, whilst negligible improvement for Trump.

Puzzle/Homework. Consider the case of, say, a primary with several candidates. Suppose we have multiple forecasters make predictions for each candidate to win the primary. How can we generalize our method to handle this case?

No comments:

Post a Comment