Previously, we discussed Kalman filtering the polls. We will examine how well such a filter performed in the 2020 US Senate races.
Implementation Details
The basic details of the Kalman filter may be found in the previous blog post, here I just review some of the decisions I've made when implementing it in practice:
Polling Date. We treat the date for a poll as the midpoint between its start and end dates. To be clear, we are truncating timestamps to dates.
Polls are 4-vectors. We treat a poll as giving 4 data points: the percentage support for the Democratic candidate, the Republican candidate, all third-party candidates, and the undecided voters.
Third Parties. I treated all third parties as a single candidate. For polls which do not ask about third-party voters, we treat them as the margin of error.
Pooling the Polls. Following Jackman's "Pooling the Polls", we take a precision-weighted average of polls concluding on the same day. This amounts to, for a single poll released on a given end date, renaming the variables and computing the covariance and precision matrices. For multiple polls, this amounts to taking the covariance matrix, inverting it to obtain the precision matrix, multiplying the associated polling result as a 4-vector, weighting the result by the polling size, then multipling the resulting sum by the matrix inverse of the sum of precision matrices. This gives us the "effective responses" for each candidate.
Undecided voters. If a poll ignores undecided voters, we similarly treat them as the margin of error.
Criteria for Assessing Estimates
We will look at the polls at two points of time: at the beginning of October, and a week before the election (i.e., October 27, 2020). The reason for this decision being, we're interested in whether the Democratic candidate could have acted to "course correct" before the election.
The assessment will compare the Democratic candidate's polling average with the votes cast on election day. This is because a lot of Republican supporters are nervous about publicly backing the Republican candidate, and tend to be swept into the "undecided" bucket.
We note that the estimates the Kalman filter produces is a multivariate normal distribution. Since we're interested in the Democratic candidate's performance, we can turn it into a univariate normal distribution. The 95% margin is given in parentheses around the estimates.
Further, we restrict focus to competitive senate seats. Inside Elections considered the following races as "competitive" [i.e., tilt or toss-up]: Arizona, Georgia, Iowa, Kansas, Maine, Montana, North Carolina, South Carolina.
Conclusion: The only state where Kalman filtering the polls produces significantly different results than the observed vote percentage is Maine, where Susan Collins overperformed (or Sarah Gideon drastically underperformed). All other senate race results coincide, within a prescribed margin of error, with the Kalman filtered polling.
Arizona
We plot the polling average as of October 1st:
The Polling averages as of October 27th:
Candidate | Sept. 28 | Oct. 27 | Vote Percent |
---|---|---|---|
Mark Kelley (D) | 46.8% (±4.32%) | 48.52% (±4.22%) | 51.16% |
Martha McSally (R) | 38.4% | 43.37% | 48.81% |
Third Party | 4.71% | 3.42% | 0.03% |
Undecided | 10.0% | 4.6% | -- |
Georgia Regular
Jon Ossoff faced an uphill battle, but managed to pull off an unexpected victory.
Candidate | Oct. 1 | Oct. 27 | Runoff Vote Percent |
---|---|---|---|
Jon Ossoff (D) | 41.8% (±3.49%) | 44.87% (±3.40%) | 47.9% |
David Perdue (R) | 46.8% | 42.57% | 49.7% |
Third Party | 2.95% | 4.08% | 2.4% |
Undecided | 8.45% | 8.47% | -- |
Iowa
The polls were fairly accurate for Ms Greenfield, with Kalman filtering at least. We should note the large number of undecideds in the polls reflect the surprisingly large noise.
Candidate | Sept. 26 | Oct. 25 | Vote Percent |
---|---|---|---|
Theresa Greenfield (D) | 47.1% (±4.51%) | 45.63% (±3.82%) | 45.15% |
Joni Ernst (R) | 44.0% | 46.10% | 51.74% |
Third Party | 1.39% | 2.89% | 3.11% |
Undecided | 7.44% | 5.39% | -- |
Maine
Senator Susan Collins remained stable in the polls around 41% until the very end of October, whereas her Democratic challenger fluctuated between 40% and 50%.
Candidate | Sept. 29 | Oct. 25 | Vote Percent |
---|---|---|---|
Sara Gideon (D) | 45.69% (±6.05%) | 50.89% (±5.27%) | 42.39% |
Susan Collins (R) | 42.17% | 49.0% | 50.98% |
Third Party | 0.72% | 0.04989734% | 6.63% |
Undecided | 11.42% | 0.06070465% | -- |
Montana
The polls in Montana were far more stable for Steve Bullock.
Candidate | Oct. 2 | Oct. 26 | Vote Percent |
---|---|---|---|
Steve Bullock (D) | 46.01% (±4.3%) | 46.38% (±4.21%) | 45.0% |
Steve Daines (R) | 45.40% | 45.72% | 55.0% |
Third Party | 3.66% | 3.31% | 0% |
Undecided | 4.93% | 4.60% | -- |
North Carolina
We plot the polling average as of October 1st:
The polling average fluctuates wildly in October, which is hard to see in the plots I could produce, since it's so wild. Remember, Cal Cunningham landed in hot water with an extramarital affair. We see the plot produced October 30, 2020, reflects fluctuation ranges between 45% and 48% for Cunningham:
Candidate | Sept. 26 | Oct. 27 | Vote Percent |
---|---|---|---|
Cal Cunningham (D) | 50.36% (±4%) | 47.0% (±2.8%) | 46.9% |
Thom Tillis (R) | 39.30% | 46.5% | 48.7% |
Third Party | 4.19% | 1.42% | 4.4% |
Undecided | 6.15% | 5.12% | -- |
South Carolina
It's not clear what caused Sen Graham's support to increase over time, but I suspect it's that undecideds "came back home" to Sen Graham (for whatever reason). Support for Jaime Harrison fluctuated around 44% (with a standard deviation of about ±2%) throughout October, just observing the Kalman filtered results.
Candidate | Oct. 2 | Oct. 26 | Vote Percent |
---|---|---|---|
Jaime Harrison (D) | 44.85% (± 3.9%) | 41.62% (± 3.6%) | 44.17% |
Lindsey Graham (R) | 44.82% | 49.89% | 54.44% |
Third Party | 3.08% | 3.38% | 1.39% |
Undecided | 7.25% | 5.10% | -- |
No comments:
Post a Comment