Sunday, May 16, 2021

Review: Kalman Filtered Senate Polls

Previously, we discussed Kalman filtering the polls. We will examine how well such a filter performed in the 2020 US Senate races.

Implementation Details

The basic details of the Kalman filter may be found in the previous blog post, here I just review some of the decisions I've made when implementing it in practice:

Polling Date. We treat the date for a poll as the midpoint between its start and end dates. To be clear, we are truncating timestamps to dates.

Polls are 4-vectors. We treat a poll as giving 4 data points: the percentage support for the Democratic candidate, the Republican candidate, all third-party candidates, and the undecided voters.

Third Parties. I treated all third parties as a single candidate. For polls which do not ask about third-party voters, we treat them as the margin of error.

Pooling the Polls. Following Jackman's "Pooling the Polls", we take a precision-weighted average of polls concluding on the same day. This amounts to, for a single poll released on a given end date, renaming the variables and computing the covariance and precision matrices. For multiple polls, this amounts to taking the covariance matrix, inverting it to obtain the precision matrix, multiplying the associated polling result as a 4-vector, weighting the result by the polling size, then multipling the resulting sum by the matrix inverse of the sum of precision matrices. This gives us the "effective responses" for each candidate.

Undecided voters. If a poll ignores undecided voters, we similarly treat them as the margin of error.

Criteria for Assessing Estimates

We will look at the polls at two points of time: at the beginning of October, and a week before the election (i.e., October 27, 2020). The reason for this decision being, we're interested in whether the Democratic candidate could have acted to "course correct" before the election.

The assessment will compare the Democratic candidate's polling average with the votes cast on election day. This is because a lot of Republican supporters are nervous about publicly backing the Republican candidate, and tend to be swept into the "undecided" bucket.

We note that the estimates the Kalman filter produces is a multivariate normal distribution. Since we're interested in the Democratic candidate's performance, we can turn it into a univariate normal distribution. The 95% margin is given in parentheses around the estimates.

Further, we restrict focus to competitive senate seats. Inside Elections considered the following races as "competitive" [i.e., tilt or toss-up]: Arizona, Georgia, Iowa, Kansas, Maine, Montana, North Carolina, South Carolina.

Conclusion: The only state where Kalman filtering the polls produces significantly different results than the observed vote percentage is Maine, where Susan Collins overperformed (or Sarah Gideon drastically underperformed). All other senate race results coincide, within a prescribed margin of error, with the Kalman filtered polling.

Arizona

We plot the polling average as of October 1st:

The Polling averages as of October 27th:

Candidate Sept. 28 Oct. 27 Vote Percent
Mark Kelley (D)46.8% (±4.32%)48.52% (±4.22%)51.16%
Martha McSally (R)38.4%43.37%48.81%
Third Party4.71%3.42%0.03%
Undecided10.0%4.6%--

Georgia Regular

Jon Ossoff faced an uphill battle, but managed to pull off an unexpected victory.

Candidate Oct. 1 Oct. 27 Runoff Vote Percent
Jon Ossoff (D)41.8% (±3.49%)44.87% (±3.40%)47.9%
David Perdue (R)46.8%42.57%49.7%
Third Party2.95%4.08%2.4%
Undecided8.45%8.47%--

Iowa

The polls were fairly accurate for Ms Greenfield, with Kalman filtering at least. We should note the large number of undecideds in the polls reflect the surprisingly large noise.

Candidate Sept. 26 Oct. 25 Vote Percent
Theresa Greenfield (D)47.1% (±4.51%)45.63% (±3.82%)45.15%
Joni Ernst (R)44.0%46.10%51.74%
Third Party1.39%2.89%3.11%
Undecided7.44%5.39%--

Maine

Senator Susan Collins remained stable in the polls around 41% until the very end of October, whereas her Democratic challenger fluctuated between 40% and 50%.

Candidate Sept. 29 Oct. 25 Vote Percent
Sara Gideon (D)45.69% (±6.05%)50.89% (±5.27%)42.39%
Susan Collins (R)42.17%49.0%50.98%
Third Party0.72%0.04989734%6.63%
Undecided11.42%0.06070465%--

Montana

The polls in Montana were far more stable for Steve Bullock.

Candidate Oct. 2 Oct. 26 Vote Percent
Steve Bullock (D)46.01% (±4.3%)46.38% (±4.21%)45.0%
Steve Daines (R)45.40%45.72%55.0%
Third Party3.66%3.31%0%
Undecided4.93%4.60%--

North Carolina

We plot the polling average as of October 1st:

The polling average fluctuates wildly in October, which is hard to see in the plots I could produce, since it's so wild. Remember, Cal Cunningham landed in hot water with an extramarital affair. We see the plot produced October 30, 2020, reflects fluctuation ranges between 45% and 48% for Cunningham:

Candidate Sept. 26 Oct. 27 Vote Percent
Cal Cunningham (D)50.36% (±4%)47.0% (±2.8%)46.9%
Thom Tillis (R)39.30%46.5%48.7%
Third Party4.19%1.42%4.4%
Undecided6.15%5.12%--

South Carolina

It's not clear what caused Sen Graham's support to increase over time, but I suspect it's that undecideds "came back home" to Sen Graham (for whatever reason). Support for Jaime Harrison fluctuated around 44% (with a standard deviation of about ±2%) throughout October, just observing the Kalman filtered results.

Candidate Oct. 2 Oct. 26 Vote Percent
Jaime Harrison (D)44.85% (± 3.9%)41.62% (± 3.6%)44.17%
Lindsey Graham (R)44.82%49.89%54.44%
Third Party3.08%3.38%1.39%
Undecided7.25%5.10%--

No comments:

Post a Comment