Tuesday, May 12, 2020

Race Ratings as Probability Estimates

What's the difference between "Lean Democratic" versus "Likely Democratic"? Presumably the latter has a higher probability the Democratic candidate will win compared to the former, but how much more?

One way to approximate the probabilities associated with these terms is to use frequentism, i.e., look at the relative frequency of outcomes associated with these terms.

Cook Political Report's House Ratings

From the accuracy report from the Cook Political Report:

From the Editor: In a new academic paper, Dr. James E. Campbell, Chairman of the Political Science Department at the State University of New York- Buffalo has analyzed The Cook Political Report's pre-Labor Day House ratings going back to the Report's founding in 1984. In 11 of the 13 elections in which the Cook Political Report published new ratings between July 1 and then end of August (all except 1986 and 1990), 99.8 percent of the 3,387 races rated by the Cook Political Report as Solid Republican or Solid Democratic in July or August of an election year went by way of that party, 94.9 percent of the 641 races rated as Likely Democratic or Likely Republican fell the way the Cook Political Report predicted, and 85.7 percent of the 441 races rated Lean Democratic or Lean Republican broke in favor of the leaning party. Of the 130 Democratic-held seats rated as Toss Up, 49.2 percent went for Democrats, and 55.0 percent of the 160 Republican held seats rated as Toss Up were won by the GOP.

It seems safe to assert that "toss up" races are approximately even-odds.

"Solid" races: the associated party (for "Solid R" the Republican candidate, for "Solid D" the Democratic candidate) is expected to win with approximately 99.8% probability; the probability of an upset victory is the same as getting 9 heads out of 9 coin flips (with a fair coin).

"Likely" races: the favored party's candidate is expected to win with 95% probability. The probability of an upset is about as likely as getting 8 or more heads, out of 10 coin flips (with a fair coin).

"Lean" races: the favored party's candidate is expected to win with about 85% probability. The probability of an upset is about as likely as getting 6 or more heads out of 8 coin flips (with a fair coin).

Toss up races: the probability of either candidate winning is 50%.

But these are ratings for House races. Presumably Senate races are similar, but what about Presidential races?

Case Study: Inside Elections

We can examine Inside Elections presidential predictions from October 2012 and October 2016.

Null Hypothesis: The race ratings correspond to the parameter values listed above.

Alternative Hypothesis: The race ratings DO NOT correspond to the parameter values listed above.

Of the 8 states rated "Lean", 7 went to the associated party. The probability of this event given the associated parameter is 0.85 is Pr(X=7/8, theta=0.85) = 0.3846925, which would mean: if we were to do a statistical test of significance, we would fail to reject the null hypothesis at significance level 0.05.

Of the 7 states rated "Likely", 6 went to the associated party. The probability of this event given the associated parameter is 0.95 is Pr(X=6/7, theta=0.95) = 0.2572822, which again would fail to reject the null hypothesis at significance level 0.05.

Conclusion

So, the significant results (giving some probability estimate for each race rating) seems to be approximately as probabilities: toss-up ~ 50%, lean ~ 85%, likely ~ 95%, solid ~ 99.8%.

This probability estimate seems to be consistent across multiple forecasters. Though I am too lazy to look at Sabato's Crystal Ball for further statistical testing, I probably should.

Another "sanity test", if these ratings are consistent, then we should expect to see results consistent with these parameters in Senate races and Governor races. This could be future work, ostensibly.

Also perhaps there are Bayesian methods worth applying here, though none immediately spring to mind.

Addendum [Tuesday May 12, 2020 at 8:23PM (PDT)]: it seems like one useful application of these ratings are as elicited priors for Bayesian analysis, since a nuanced analysis of voter turn-out would change the probabilities slightly from these approximations.

2 comments:

  1. Curiously, this seems to suggest there is a consistent 2.25% probability a presidential election would end with neither party claiming the necessary 270 delegates. That seems surprisingly high to me, but what do I know?

    ReplyDelete
  2. Another thing worth noting is that these ratings seem to be on the logit scale: likely as 3, lean 2, solid 5, toss up 0. A tilt seems to be about 1/2, which also explains why it's controversial as a rating (it's like a fraction of a decibel...the "white/gold vs blue/black dress" fight redux).

    ReplyDelete