If we have N estimates for the probability of an event, say p1, ..., pN, then the best a good estimate for the probability is the geometric mean:
p | = | [p1×...×pN]1/N. |
To see this, think of probability from a frequentist perspective, pj = (estimated number of trials where event occurred)/(estimated number of trials), i.e.,
pj | = | nj,x / nj |
where nj,x is the number of trials where the event x occurred, and nj is the estimated number of trials.
We can best estimate the numerator as the geometric mean of the numerators of our estimates
nx | = | [n1,x×...×nN,x]1/N. |
(and similarly for the denominator), since the geometric mean is the best way to combine different estimates of possibly different orders of magnitude.
If the numbers are "close enough", the geometric mean will not differ greatly from the arithmetic mean ("average"). To see this, simply consider the Taylor expansion of [1 + x]1/N to linear order, take pj to be μ + Δpj where μ is the arithmetic mean and |Δpj /μ| < 1. Expanding the geometric mean will produce the arithmetic mean plus "small" corrections of order N−1.
For numbers which are "spread far apart", the geometric mean gives better estimates than the arithmetic mean. I suppose one way to think about this is the logarithm tells us the "order of magnitude" for a quantity. The order of magnitude for the revised estimate should be the arithmetic mean of the orders of magnitudes for our various estimates. The geometric mean, as the revised estimate, is the only quantity that can do this.
One fun book (among many) is Order of Magnitude Physics.
Addendum (). I struck out "the best" estimator, because I actually don't have a proof off the top of my head that this is optimal in any sense. It's "good", consistent with the frequentist interpretation of probability, and most importantly it works. But I do not have a well-defined notion of an "error" or "loss function" which the geometric mean of probability estimates minimize, and thus I felt it dishonest to describe it as "the best".
That said, "absence of evidence is not evidence of absence". I may be ignorant of some folklore that the geometric mean of probabilities optimizes some desirable property, and really is (in some sense) "the best estimator". I just don't have the proof to back the claim, so I will revise the claim.
No comments:
Post a Comment