Political Arithmetic: Geometric Mean in Probability

If we have N estimates for the probability of an event, say p₁, ..., p_N, then ~~the best~~ a good estimate for the probability is the geometric mean:

p

[p₁×...×p_N]^1/N.

To see this, think of probability from a frequentist perspective, p_j = (estimated number of trials where event occurred)/(estimated number of trials), i.e.,

p_j

n_j_,x / n_j

where n_j_,x is the number of trials where the event x occurred, and n_j is the estimated number of trials.

We can best estimate the numerator as the geometric mean of the numerators of our estimates

n_x

[n_1,x×...×n_N,x]^1/N.

(and similarly for the denominator), since the geometric mean is the best way to combine different estimates of possibly different orders of magnitude.

If the numbers are "close enough", the geometric mean will not differ greatly from the arithmetic mean ("average"). To see this, simply consider the Taylor expansion of [1 + x]^1/N to linear order, take p_j to be μ + Δp_j where μ is the arithmetic mean and |Δp_j /μ| < 1. Expanding the geometric mean will produce the arithmetic mean plus "small" corrections of order N⁻¹.

For numbers which are "spread far apart", the geometric mean gives better estimates than the arithmetic mean. I suppose one way to think about this is the logarithm tells us the "order of magnitude" for a quantity. The order of magnitude for the revised estimate should be the arithmetic mean of the orders of magnitudes for our various estimates. The geometric mean, as the revised estimate, is the only quantity that can do this.

One fun book (among many) is Order of Magnitude Physics.

Addendum (April 22, 2019). I struck out "the best" estimator, because I actually don't have a proof off the top of my head that this is optimal in any sense. It's "good", consistent with the frequentist interpretation of probability, and most importantly it works. But I do not have a well-defined notion of an "error" or "loss function" which the geometric mean of probability estimates minimize, and thus I felt it dishonest to describe it as "the best".

That said, "absence of evidence is not evidence of absence". I may be ignorant of some folklore that the geometric mean of probabilities optimizes some desirable property, and really is (in some sense) "the best estimator". I just don't have the proof to back the claim, so I will revise the claim.

Political Arithmetic

Wednesday, April 17, 2019

Geometric Mean in Probability

No comments:

Post a Comment