Political Arithmetic: 2016 Election

Showing posts with label 2016 Election. Show all posts

Saturday, July 6, 2019

Post-Mortem of 2016 (Fragment)

How did Trump win 2016? It appeared to turn the world upside down, seemingly defying polls and reason. What happened? Was it really surprising or were we misled?

The game plan will be first to examine a few popular myths, eliminate these explanations, then examine the results of 2016. By articulating why 2016 was surprising, we will find the factors responsible for the outcome.

Executive Summary: It's all Gary Johnson's fault.

Remark (On Obama-Trump Voters). One plausible explanation for Trump's victory is the shift of Obama voters of 2012 who came to vote for Trump in 2016, the so-called "Obama-Trump voter". This is worth mentioning, only in passing, because the media has an irrational fascination with such voters.

Myth #1: Shy Trump Supporter

A popular conjecture is that a subpopulation of Trump supporters believed supporting Trump was "socially undesirable", hence would not publicly acknowledge backing Trump. Alexander Coppock conclusively tested this theory, and found no evidence of a shy Trump supporter existing. The predictive models with shy Trump supporters make statistically indistinguishable predictions from those without shy Trump supporters.

Andrew Gelman independently reckoned the same conclusion along different lines. Also, Gelman reasons, Republican candidates outperformed expectations in the Senate races, which casts doubt on the model in which respondents would not admit they supported Trump; rather, the Senate results are consistent with differential nonresponse or unexpected turnout or opposition to Hillary Clinton. It is possible that the anti-media, anti-elite, and even anti-pollster sentiment stoked by the Trump campaign has been a part of the reason for the low response of Trump supporters in states with large rural populations. Emphasis added.

It is worth remembering that Politico/Morning Consult released a poll, gathered online and via live phone calls, indicating despite different methodologies the different results show only a slight, not-statistically-significant difference in their effect on voters’ preferences for president. In other words, it didn't matter if a respondent talked to a pollster on the phone (where shyness would prevent the respondent from announcing support for Trump) or if the respondent communicated online, the results were statistically indistinguishable.

Testing this hypothesis three different ways, and they all reach the same conclusion, seriously undermines the "null hypothesis" of the existence of "shy Trump supporters". We can discard this explanation as lacking empirical support.

Myth #2: Comey Ruined the Election

Hillary Clinton has personally blamed the election outcome on Comey's public announcement that he was re-opening the FBI investigation into Clinton's email servers the Friday prior to the vote. This explanation has become popular, presumably for the proximity of Comey's announcement to the perceived surprising loss. Neither evidence nor reason supports this misconception.

This claim is a contentious matter. But the Comey letter is the sort of mirage which our cognitive biases are susceptible to mistake for real. We must heed Thucydides's words (1.21), stressing the search for truth strains the patience of most people, who would rather believe the first things that come to hand.

Using rather rudimentary post-stratified modeling techniques, Chad P. Kiewiet de Jonge and Gary Langer have shown in fact Clinton began losing the projected electoral college the Tuesday prior to Comey's announcement. Why didn't anyone else have this insight? The other prognosticating psephologists used some combination of poll-weighting and a likely voter model, which would have missed it.1Nate Silver noted, regarding Comey's letter, As of Oct. 28, the polls-plus version of FiveThirtyEight’s forecast, which accounts for these factors, expected Clinton to lose a point or so off her lead before Election Day. Silver's model did not detect the deviations which MRP modeling found.

The New York Times released a poll, released days prior to the letter, showing Trump ahead of Clinton by 4 points. Bloomberg/Selzer released a poll showing Trump ahead of Clinton by 2 points. Could we trust this poll? Well, FiveThiryEight has awarded Selzer & Co. an "A+" pollster grade (and similarly high marks for both Siena College and the New York Times).2As of July 3, 2019 and November 11, 2016 Selzer & Co. received an "A+". The New York Times, in collaboration with CBS, received the more modest "A-" grade, but Siena College won a solid "A". Although this is weak evidence supporting the claim Clinton began losing before Comey's announcement, the point we stress is this is a second approach to think about the matter.

Could it be that Comey's letter accelerated Clinton's decline? From post-stratified modeling based on polling released afterwards, there is insufficient evidence that Comey's letter impacted Clinton's standing. The MRP model using polling data with surveys held November 4–6 show Clinton recovering, albeit insufficient to win the election or recover significant ground. Such results challenge and undermine the claim Comey impacted Clinton at all.

The raw data, and refined statistics, both reach the same conclusion: Clinton began losing the election before Comey even spoke.

What Happened?

To better answer this question, we should first note what people expected. If, for each state, we take a rolling mean of the proportion of the vote for each party (bundling all "third parties" together into a single "third party"), then normalize the result per state (so we have proportions again), the result is precisely the proportion of the vote one might expect to have found on election day of 2016. The results may be described by the following map:

The electoral college count would have been 332 for the Democratic candidate, 206 for the Republican candidate. We can actually quantify how surprising the actual results were using the Kullback–Leibler divergence, but to make sense of this we should compare it to previous elections:

Election Year	Surprise (in bits)
1988	0.8504352
1992	33.3329646
1996	2.8056336
2000	2.8825127
2004	1.3989161
2008	0.7251642
2012	0.3414356
2016	3.5202876

For 1992, remember Ross Perot captured about 20% of the popular vote, the most a third party received since Teddy Roosevelt ran for a third term on the Progressive Party ticket in 1912,3Perot's 1992 performance stands third in the rankings of "percent of popular vote a third party candidate received in the presidential election". Teddy in 1912 stands at first with 27.39% of the vote, Millard Fillmore's 1856 bid on the Constitutional Party ticket ranks second at 21.54% of the vote, and Perot's 1992 bid comes in third at 18.91% of the popular vote. In contrast, Gary Johnson's 2016 bid received 3.28% of the popular vote. which is why it is the most surprising row in our table.

So what happened? Why was this so surprising? There are a variety of ways to approach answering this. We may take the expected vote proportions and compare them to the actual vote proportions, and extrapolate out the difference in votes (assuming voter turnout remained the same in this hypothetical 2016 election as in the actual 2016 election). We find the difference in votes:

Party	Difference in votes	As Percent of Total Vote
Third Parties	4,397,608	0.0321493
Democratic	-2,605,576	-0.0190484
Republican	-1,792,032	-0.0131009

Although we find Trump underperformed by 1.31%, we find that Clinton underperformed by 1.9% of the vote. Third parties overperformed by roughly 3.21% of the popular vote. Note: Johnson received 3.28% of the popular vote in 2016 — does this account for the overperformance of third party candidates in 2016?

Two questions immediately emerge: (a) which Third party candidate overperformed? (b) If we supposed the third parties received lower votes (i.e., they received the expected votes), how would the difference be re-allocated between Trump and Clinton?

Measuring Third Party Overpower

We can actually measure the Kullback–Leibler divergence for how Johnson performed in 2016 compared to 2012 (lumping all non-Johnson votes in one category). This measures the surprise in Johnson's 2016 performance relative to 2012 expectations, an adequate way to gauge improvement. We may do similarly for Stein, since both ran in 2012 and 2016. The result may be summed up thus:

The ratio of "Johnson's improvement" to "Stein's improvement" averaged 22.88781 — that's over an order of magnitude improvement!

In the states which went from Obama in 2012 to Trump in 2016 (specifically: Florida, Michigan, Pennsylvania, Wisconsin, Iowa, Indiana, North Carolina, Ohio, Nebraska), Johnson doubled his votes in 2016 compared to 2012...or better (in Florida, Johnson saw his votes grow from 44,726 to 207,043). Johnson's average improvement among these swing states was 336% more votes in 2016 compared to 2012. We should observe that Johnson didn't run in either Michigan or Wisconsin in 2012, though.

These numbers tell us, quite simply, Johnson improved considerably between 2012 to 2016. This alone is quite surprising, third party candidates seldom improve so drastically. Jill Stein, on the other hand, saw very little improvement in votes. We may safely conclude that Johnson is the dominant (sole?) dynamo for the third party's surprise improvement, which answers the first question the previous section posed: Which Third party candidate overperformed? We may safely answer, it was Johnson.

A Wonderful Life

The Economist's Lexington, stating the obvious, informs us, Most of those who voted for Mr Johnson in 2016 were protesting against the alternatives. But if there were a timeline where there was no Libertarian ticket in 2016, or any other third party for that matter, how would the election have changed?

Looking at the numbers, third party voters changed the outcome in Flordia, Michigan, Pennsylvania, and Wisconsin. (They did so in North Carolina, but they would have to break 95.7% for Clinton, which is implausible.) If 68.997%+1 of third party voters broke for Clinton in this hypothetical, then Clinton would have won an additional 75 delegates in the electoral college. This would have changed the outcome of the election. Is this feasible?

FiveThirtyEight's Harry Enten asked this very question in his article, Election Update: Is Gary Johnson Taking More Support From Clinton Or Trump? If we take his observations as a launching off point, then third party voters are divided up thus: 1.19864% (of the total vote) is taken from the third party candidates, then the remaining third party voters are divvied up evenly between Trump and Clinton. This effectively erases the margin of victory for Trump. (With the exception of Florida, only a small fraction of third party voters need to be shaved off to change the outcome.)

State	Delegates	Trump margin	Shift to Clinton
Florida	29	0.0119863	0.0207737
Michigan	16	0.0022303	0.0311395
Pennsylvania	20	0.0072427	0.0228425
Wisconsin	10	0.0076434	0.0366399

In this alternate timeline, where third parties vanished and its constituents had to pick between Trump and Clinton, would have produced a drastically different result.

Even if taking Harry Enten's findings too generously, that Clinton's edge was not 1% but a more conservative estimate 0.7643%+1 (the margin enough to win Wisconsin), in that hypothetical Clinton would still lose Florida but win Michigan, Pennsylvania, and Wisconsin. This would have given Clinton 46 electoral delegates, enough to make her delegate count 278 to Trump's 260. Again we find Johnson acted as spoiler, prevented Clinton's victory, and delivered to us a Trump presidency.

Remark. Let us suppose Enten's findings could be used to construct a random variable describing how third party voters will likely vote. Given that polls have a margin of error of 0.04 at a 95% confidence level, we can construct a normally distributed random variable X centered at 0.01 [which Enten determined is the edge Clinton has] with a 0.02 sigma [from the noise for polling] for the third party supporter who just "randomly" picks who to vote for as follows: generate a random real number following this distribution and, if it is positive, vote for Clinton, otherwise vote for Trump. In this scheme, Trump receives 30.85375% of the third party voters, Clinton receives 69.14625%, enough for Clinton to win Florida, Michigan, Wisconsin, and Pennsylvania.

Exercise 1. The New York Times's Libertarian Gary Johnson Polls at 10 Percent. Who Are His Supporters? surveys the demographics of Johnson supporters. Consider using a MRP model (like Gelman et al.'s in arXiv:1802.00842) to estimate the preference of third party voters.

Exercise 2. The argument produced above, and the results of exercise 1, give two different ways to show Johnson was a spoiler candidate and Clinton would have won the election had Johnson not run. But it is not wise to go to see with two chronometers (take either one or three). Think of another test for showing Johnson was a spoiler candidate.

What Remains to be Investigated?

Aside from the exercises for myself, points worth pursuing include what could Clinton have done differently? It's one thing for us to sit back and say, "Well, well, third parties ruined everything." But it's more useful to consider how third parties attracted voters, and what Clinton could have done to counter this effect.

Also comparing the demographics of Obama-Trump supporters to Johnson supporters may be insightful. If it turns out these two share a suitably similar political culture, then we may have found one strata of swing voters. It remains to be seen if they are so fed up with Trump that they abstain from even voting in 2020.

But also worth considering is the newly energized Democratic base which didn't materialize for Clinton in 2016 but sure as Hell materialized to protest Trump and vote out Republicans in 2018. If the newly energized base is larger than the Obama-Trump and Johnson voters, especially in the swing states, then it may be worth considering alternative 2020 strategies.

All the scratchwork for this post may be found on Github.

Thursday, June 6, 2019

Exit Polls Margin of Error Estimates

Exit polls do not have margins of error, but we can estimate the margin of error using confidence intervals for a two-party system.

Applied to the 2016 election, when we are told N respondents answered with a proportion p supporting a candidate, we can construct the Wilson confidence interval with, say, 95% confidence (i.e., α = 0.05 and \(z_{1-\alpha/2}\approx 1.96\)). Then we get an estimate \(\hat{p}\pm\Delta p\), and we may treat \(\Delta p\) as the margin of error.

If we are trying to estimate uncertainty propagated from the exit polls used in, say, computing coefficients for a logistic regression, then we could use z = 2 exactly, and then set the Wilson confidence interval computed to \(\hat{p}_{A}\pm2\sigma_{A}\) for supporters of candidate A.

The only caveat is, exit polls are not adequately random samples. Exit polls are cluster samples, since only a fraction of precincts are polled (although they are picked by random and are intended to reflect the state as a whole). There are techniques for computing the margin of error for such sampling techniques, I don't believe it to be tractable given the limited data from exit polls.

We can compute the margin of error for one-stage cluster sampling (which would be the upper bound in the margin of error, i.e., the stratified cluster sampling would have a smaller margin of error). How does it compare to binomial confidence intervals? Lets review binomial confidence intervals, then cluster sampling error, and see when/if the binomial confidence interval is a superior choice.

Brief Review of Confidence Intervals for Binomial Distribution

Remember the de Moivre-Laplace theorem, which states if X is a binomially distributed random variable with probability p of success in n trials, then as \(n\to\infty\) we find \[\frac{X-np}{\sqrt{np(1-p)}}=Z\to\mathcal{N}(0,1) \tag{1} \] the left hand side becomes approximately a normal distribution with mean 0 and standard deviation 1. Dividing top and bottom by n, then rearranging terms, we find \[\frac{X}{n} = p + Z\sqrt{p(1-p)/n}\tag{2}\] which gives us an estimate for p.

If we denote \(\hat{p}=y/n\) the empirically observed frequency of successes (y) to the number of trials (n), we pick some confidence level z, and the naive interval estimate for the probability of success is given by the normal approximation \[\boxed{\widehat{p}\pm z\sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}}\tag{3}\] which, for large enough n and for \(\hat{p}\) "not too extreme", gives some estimate for where the "true value" of p lies.

Puzzle: What values of n and p lead to good approximations by Eq (3)?

This puzzle is typically "solved" in textbooks by insisting \(n\cdot\mathrm{min}(\hat{p},1-\hat{p})>5\) (or 10), but this doesn't always lead to good estimates. Brown, Cai, and DasGupta investigated this question "empirically".

In fact, a better estimate of the interval starts by considering p at the boundaries of Eq (3) with a slight modification of the squareroot: \[p = \widehat{p}\pm z\sqrt{\frac{p(1-p)}{n}}\tag{4}\] then we get the quadratic equation \[(p - \widehat{p})^{2} = z^{2}\frac{p(1-p)}{n}.\tag{5}\] Solving the quadratic for p gives us the interval \[\boxed{\frac{\hat p+\frac{z^2}{2n}}{1+\frac{z^2}{n}} \pm \frac{z}{1+\frac{z^2}{n}}\sqrt{\frac{\hat p(1-\hat p)}{n}+\frac{z^2}{4n^2}}.}\tag{6}\] This is the Wilson Confidence Interval. Heuristically, for \(z\approx 2\) (the 95% confidence interval) estimates the true probability of success (p) to be centered nearly at a shifted estimate \((y+2)/(n+4)\). Observe for "large n", Eq (6) becomes Eq (3).

The Wilson confidence interval gives better estimates than the normal approximation, even for a small number of trials n and/or extreme probabilities. For larger n (i.e., \(n\gt 40\)), the Agresti-Coull interval should be used: compute the center of the Wilson confidence interval, then use this value as \(\hat{p}\) in the normal approximation Eq (3).

In short: If \(n\lt 40\), use the Wilson confidence interval. Otherwise, use either the Agresti-Coull interval, the Wilson interval, or the usual interval Eq (3).

Remark. For small n < 40, we could use a Bayesian approximation with the uninformative Jeffreys prior. This estimates the interval, for a confidence level α, to be the quantiles of the Beta distribution \(\mathrm{Beta}(y + 1/2, n-y+1/2)\) at the probabilities \(\alpha/2\) and \(1-\alpha/2\). This has to be computed numerically. I wonder if there are decent approximations to the quantile function for small α?

Rare Events

What if the probability of success is really low? That is to say, we're dealing with "rare events"? We have a special case for this, going back to the binomial distribution to describe the most extreme case where there are no successes observed y = 0 is described by \[\Pr(X=0) = (1 - p)^{n} = \alpha\tag{7}\] for a given confidence level α (usually 0.05). Then taking the (natural) logarithm of both sides yields \[n\ln(1-p)=\ln(\alpha).\tag{8}\] By assumption, the chance for success is small (\(p\ll 1\)), we can approximate the logarithm by the first term of the Taylor expansion (since the first term is an upper bound of the logarithm in this domain) \[-np=\ln(\alpha)\tag{9}\] hence the confidence interval is \[\boxed{0\leq p\leq\frac{-\ln(\alpha)}{n}.}\tag{10}\] For α = 0.05, the upper bound is 3/n (hence the so-called "Rule of three").

(Dually, for events which almost always happen, we could simply take 1 minus this interval. For α = 0.05, this is nearly [1-3/n,1].)

Cluster Sampling Margin of Error

The cluster sampling margin of error is the product of the standard error with the critical value z. We are given the sample proportions p. The cluster, in the case of exit polling, is a precinct; in 2004, there were 174,252 precincts in the United States (arXiv:1410.8868) with an average of 800 voters in a precinct. There is something on the order of 300 precincts (in 28 states, roughly 11 per state) sampled in the 2016 exit poll.

If there are N voters in the country, N_j voters in precinct j, m precincts in the exit poll, and M precincts in the country, and y_j be the number of voters in precinct j that voted for a fixed party for president, then we may consider the estimated number of votes may be given using the unbiased estimator \[\hat{\tau} = M\cdot\bar{y} = \frac{M\cdot\sum^{m}_{j=1}y_{j}}{m}.\tag{11}\] Its variance is given by \[\mathrm{Var}(\hat{\tau}) = M(M-m)\frac{s_{u}^{2}}{m}\tag{12a}\] where \[s_{u}^{2} = \frac{1}{m-1}\sum^{m}_{j=1}(y_{j}-\bar{y})^{2} \tag{12b}\] is the sample variance.

We can estimate the proportion of voters supporting our given party. We take \[\hat{\tau}_{r} = N\cdot r = N\cdot\frac{\sum^{m}_{j=1}y_{j}}{\sum^{m}_{j=1}N_{j}}\tag{13a}\] and \[\hat{\mu}_{r} = \hat{\tau}_{r}/N = r.\tag{13b}\] We find the variance \[\mathrm{Var}(\hat{\tau}_{r}) = \frac{M(M-m)}{m(m-1)}\sum^{m}_{j=1}(y_{j} - rN_{j})^{2}\tag{14a}\] which is biased, but the bias is small when the sample size is large; the variance for the ratio estimator \[\mathrm{Var}(\hat{\mu}_{r}) = \frac{M(M-m)}{m(m-1)}\frac{1}{N^{2}}\sum^{m}_{j=1}(y_{j} - rN_{j})^{2}.\tag{14b}\] It is not hard to find \[\mathrm{Var}(\hat{\mu}_{r}) = \frac{(1-m/M)}{m(m-1)}\sum^{m}_{j=1}\left(\frac{y_{j}}{N_{j}} - r\right)^{2}\frac{N_{j}^{2}M^{2}}{N^{2}}.\tag{14c}\] which is the variance we are looking for.

The margin of error for the exit polls would be approximately, for a given critical value z, \[ME = z\sqrt{\mathrm{Var}(\hat{\mu}_{r})}.\tag{15}\] Unfortunately, we are not given the data sufficient to compute the variance described in Eq (14c).

Another difficulty, Eq (14c) describes the sample variance. Exit polls are far less than ideal (in the US, at least), and this increases the actual variance. There has been some debate surrounding how much worse the error for exit polls is, when compared to naive binomial confidence interval estimates, but the Mystery Pollster inform us, Panagakis had checked with Warren Mitofsky, director of the NEP exit poll, and learned that the updated design effect used in 2004 assumed a 50% to 80% increase in error over simple random sampling (with the range depending on the number of precincts sampled in a given state). (Emphasis his) If the reader takes one thing away from this post, it should be exit polls are noisy and computing its margin of error is complicated.

Conclusion: multiplying the width of the confidence interval by a factor of 1.8, or even 2, would give us a reasonable margin of error for the exit polls.

References

Stat506 from Pennsylvania State University is where I learned about cluster sampling.

Saturday, June 1, 2019

2016 Exit Polls

Exit polls, by their nature, are extremely noisy and do not provide margins of error. Until 2016, news organizations banded together to provide a single, coherent exit poll for presidential elections. This coalition started collapsing after the 2016 presidential election. Both CNN and Fox News have slightly different exit poll data for the 2016 election. How do we make sense of these polls? When can we say they tell us "the same story"?

Toy Problem: Coin Tossing

Lets consider a simpler toy problem: I flip a coin N₁ times and obtain y₁ heads ("successes") and you flip a coin N₂ times and obtain y₂ heads ("successes"). How do we know our coins are equally biased? Assuming each of us has done a "large number of trials" (a couple hundred each).

The null hypothesis: these coins follow the same distribution with probability p of success.

The alternative hypothesis: these coins do not follow the same distribution.

Let \(p_{1} = y_{1}/N_{1}\) (and similarly \(p_{2} = y_{2}/N_{2}\)), a better approximation to the probability of heads ("success") is \[p=\frac{y_{1}+y_{2}}{N_{1}+N_{2}}.\tag{1}\] We can standardize the data and approximate it as a normal distribution with standard deviation \(\sigma_{1}^{2} = p(1-p)/N_{1}\) and similarly \(\sigma_{2}^{2} = p(1-p)/N_{2}\). Then the "true variance" is \[\sigma^{2} = \sigma_{1}^{2} + \sigma_{2}^{2} = p(1-p)\left(\frac{1}{N_{1}}+\frac{1}{N_{2}}\right)\tag{2}\] which we should use for performing the Z-transform: \[Z_{1} = \frac{p_{1} - p}{\sigma}\tag{3}\] and a similar definition of Z₂. The test statistic we want to measure is \[z = Z_{1}-Z_{2} = \frac{p_{1}-p_{2}}{\sigma}\tag{4}\] which should follow a normal distribution with mean 0 and standard deviation 1.

We can then pick some confidence level, usually 1 - α = 95% confidence level, which leads to the critical value \[z_{1 - \alpha/2} = \Phi^{-1}(1 - \alpha/2)\tag{5}\] using the quantile for the normal distribution \(\Phi^{-1}\) for approximately 1.96 for the 95% confidence level.

If the quantity computed in Eq (4) is "more extreme" than the quantity computed in Eq (5), i.e., if \(z_{1-\alpha/2} < |z|\), then we reject the null hypothesis and conclude these coins follow different distributions. Otherwise, we fail to reject the null hypothesis. Note: we never prove they follow the same distribution, we just fail to prove they follow different distributions.

Exit Polls

We can apply this same setup to exit polls. Unfortunately for Fox News, contrary to appearances, they do not provide state-level exit poll data for the first 12 questions. Perhaps this is a technical error due to some software bug on their server-side, but I can't do anything to remedy the problem.

Further, the results are identical percentages (on the first dozen questions) for FOX and CNN, despite having apparently different sample sizes. It is not hard to prove (if these results are accurately reported) the results of computing Eq (4) for each category is 0 identically, and thus we fail to reject the null hypothesis for each question.

It's rather anti-climactic despite this long buildup, but it's good to know these exit polls are coherent, in some appropriate sense.

The exit polls are in CSV form on github.