Exit polls, by their nature, are extremely noisy and do not provide margins of error. Until 2016, news organizations banded together to provide a single, coherent exit poll for presidential elections. This coalition started collapsing after the 2016 presidential election. Both CNN and Fox News have slightly different exit poll data for the 2016 election. How do we make sense of these polls? When can we say they tell us "the same story"?
Toy Problem: Coin Tossing
Lets consider a simpler toy problem: I flip a coin N1 times and obtain y1 heads ("successes") and you flip a coin N2 times and obtain y2 heads ("successes"). How do we know our coins are equally biased? Assuming each of us has done a "large number of trials" (a couple hundred each).
The null hypothesis: these coins follow the same distribution with probability p of success.
The alternative hypothesis: these coins do not follow the same distribution.
Let \(p_{1} = y_{1}/N_{1}\) (and similarly \(p_{2} = y_{2}/N_{2}\)), a better approximation to the probability of heads ("success") is \[p=\frac{y_{1}+y_{2}}{N_{1}+N_{2}}.\tag{1}\] We can standardize the data and approximate it as a normal distribution with standard deviation \(\sigma_{1}^{2} = p(1-p)/N_{1}\) and similarly \(\sigma_{2}^{2} = p(1-p)/N_{2}\). Then the "true variance" is \[\sigma^{2} = \sigma_{1}^{2} + \sigma_{2}^{2} = p(1-p)\left(\frac{1}{N_{1}}+\frac{1}{N_{2}}\right)\tag{2}\] which we should use for performing the Z-transform: \[Z_{1} = \frac{p_{1} - p}{\sigma}\tag{3}\] and a similar definition of Z2. The test statistic we want to measure is \[z = Z_{1}-Z_{2} = \frac{p_{1}-p_{2}}{\sigma}\tag{4}\] which should follow a normal distribution with mean 0 and standard deviation 1.
We can then pick some confidence level, usually 1 - α = 95% confidence level, which leads to the critical value \[z_{1 - \alpha/2} = \Phi^{-1}(1 - \alpha/2)\tag{5}\] using the quantile for the normal distribution \(\Phi^{-1}\) for approximately 1.96 for the 95% confidence level.
If the quantity computed in Eq (4) is "more extreme" than the quantity computed in Eq (5), i.e., if \(z_{1-\alpha/2} < |z|\), then we reject the null hypothesis and conclude these coins follow different distributions. Otherwise, we fail to reject the null hypothesis. Note: we never prove they follow the same distribution, we just fail to prove they follow different distributions.
Exit Polls
We can apply this same setup to exit polls. Unfortunately for Fox News, contrary to appearances, they do not provide state-level exit poll data for the first 12 questions. Perhaps this is a technical error due to some software bug on their server-side, but I can't do anything to remedy the problem.
Further, the results are identical percentages (on the first dozen questions) for FOX and CNN, despite having apparently different sample sizes. It is not hard to prove (if these results are accurately reported) the results of computing Eq (4) for each category is 0 identically, and thus we fail to reject the null hypothesis for each question.
It's rather anti-climactic despite this long buildup, but it's good to know these exit polls are coherent, in some appropriate sense.
The exit polls are in CSV form on github.
No comments:
Post a Comment