Tuesday, July 14, 2020

What are Swing States?

Puzzle: What is the criteria of a swing state?

The notion of a swing state has been introduced by journalists, first seemingly used by Brody's news article "Carter, Reagan camps focusing on suburbs in the swing states" in the Washington Post (dated Sep 28, 1980). Its meaning changes with every news article.1 The Economist, Road to 270 It is still, technically, a swing state (meaning that the average polling margin is less than 5%)...

Daily Kos's Swing State Project defined it as ...any state where the margin of victory [of combined past two elections] was 6 points or less. This was probably the least rigorously defined source.

The Hill When Swing States Stop Swinging By “swing states,” I am referring to the nine or ten perennial campaign battleground states that have, since 1988, been decided by less than five percentage points in the majority of elections, are usually bellwethers mirroring the national vote in those elections, and which have flipped between Democrat and Republican victories.

The Cook Political report has treated any state with PVI scores between D+5 to R+5 as "swing states", at least according to a few random articles I read but could not recall.

Swing Voters and Elastic States [FiveThirtyEight.com] notes The states that are traditionally thought of as swing states — meaning that they are close to the national average in their partisan orientation — ...

Stacey Hunter Hecht and David Schultz note in their introduction to Presidential Swing States: Why Only Ten Matter (2015) that political scientists haven't studied the notion of a "swing state", hence it is left to journalists to define the term. Instead political scientists use the term "competitive elections" or "competitive states".2 Schultz and Hect note in the preface (to the 2015 edition) four criteria which they use to designate a swing state:

First, it is a competitive state in the political science meaning of the term. Other scholars have suggested a 10 percentage point margin or less in the last election as competitive (Glaeser and Ward). Instead, this book employs a five percentage point standard as the first decisional criteria. [...]

Second, in a nod to the bellwether designation in the media, this book assesses whether a state has sided with the winner in presidential elections through the period [of the past 7 elections]. By that, over the last seven elections, how many times has a state's popular vote...matched that of the final result in the presidential election [i.e., the state was won by the president-elect]. [...]

The third criteria for being a swing state is the incidence of flipping during the past seven presidential contests. This really speaks to the swing in the label swing state—that is how often does the state change with regard to its vote for president?

Finally, using data from Koza et al., this book examines the number of post-convention events events there were in 2012 [the election year in question]. (xxx)

If we treat "which party system we're in" as a bus waiting problem in between N = 9 elections before party realignment, it turns out we expect to be in N(1 − e−1) + e−1 ≈ 6.05696 previous elections until we're at a realigning election, which is a theoretical justification for the second and third criteria's usage of 7 previous elections. This is actually foreshadowing another topic I'm driving at in a series of posts ("party systems" as a periodization of US political history: sensible or meaningless), so stick a pin in this aside.

The hope is that swing states coincide with Bellwether states, i.e., states which correlate with the outcome of the election as a whole. A Bellwether state is one which usually reflects the election's outcome: whoever wins the state usually wins the presidential election. Ohio is currently the textbook example of a bellwether.

Predicted Swing States in 2020

Or, "Shut up and give me predictions!" Based on the campaign events in 2016 and the margins of victory, our predictions are enumerated in the table below. One column uses the classic logistic regression with the unweighted alternations between parties, the next uses a Bayesian logistic regression, the third uses a classic logistic regression with a weighted mean (rewarding more recent swings more than swings a while ago), and the fourth column uses a Bayesian logistic regression and the weighted swings. The only real difference is in the edge-cases, which are nearly swing states (like Georgia, Colorado, and Iowa). I highlight instances where the probability forecasted is at least 50%

State Logistic Bayes Weighted Logistic Weighted Bayes
Florida 0.9993956 0.9964374 0.9999655 0.9990531
Pennsylvania 0.9652903 0.9537852 0.9606524 0.9444615
North Carolina 0.9555401 0.9294878 0.9241776 0.8904109
New Hampshire 0.9434254 0.8744334 0.9738157 0.9209954
Michigan 0.8596214 0.8681566 0.8005915 0.8393148
Ohio 0.8336188 0.7632081 0.7310860 0.6594543
Nevada 0.8301466 0.7199140 0.8601073 0.7600979
Wisconsin 0.7514187 0.7907851 0.6308183 0.7402993
Arizona 0.6563595 0.6751866 0.9660922 0.9102213
Colorado 0.6162296 0.5055186 0.9027512 0.7521946
Iowa 0.3770038 0.3817862 0.1045918 0.1867988
Virginia 0.3452857 0.3377292 0.2091488 0.2761821
Minnesota 0.2586809 0.3289624 0.2130948 0.3467333
Georgia 0.1617802 0.2894286 0.5362766 0.5598381
Maine 0.1596644 0.2286243 0.1134700 0.2283813
New Mexico 0.1158958 0.1205112 0.0389905 0.0753137
Texas 0.0082930 0.0376663 0.0027493 0.0239994

Statistical Model

Lets try to consider Schultz and Hect's criteria as inputs into a logistic regression:

  1. m = margin of victory for the [previous] President [in the given state] in the previous presidential election
  2. b = bellwether input = number of times state went to the winner of the election (over the past 7 previous elections), turns out not significant in the classic logistic regression;
  3. s = state swing-iness = min(number of time state went D in past 7 elections, number of times state went R in past 7 elections)
  4. v = number of post-convention events in the current presidential cycle

There is some degree of freedom here with choice of input signals. For the bellwhether input, we could have chosen the fraction of times the state went to the election's winner, as opposed to the count of times this occurred. We could also use a running mean, to penalize (for example) Indiana in 2020 which went to the Republican candidate in 6 of the past 7 presidential elections (it went to Obama in 2008, which seems like a fluke rather than swinginess).

Why Logistic Regression? We are trying to classify states as either "swing" (= 1) or "stable" (= 0). The logistic regression will accomplish this and give us information in the coefficients for the inputs as log-odds.

The alternative that immediately springs to mind would be the probit model, which would work if we standardized the inputs via the z-transform. There is no compelling a priori reason to do so, but it is a perfectly valid thing to do. It just makes interpreting coefficients harder. (It may be worth revisiting this later, to see if a probit model is worth while.)

Multicollinearity. It may be tempting to throw everything into a logistic regression and see what happens but we must test for multicollinearity among the input signals. That is to say, we need the input variables to be uncorrelated. How to measure this? Simply if the variance inflation factor (VIF) is greater than 2.50 or 3 for any input variables, then we should grow concerned.3There are three cases when multicollinearity is unworrying: (1) high VIF variables are control variables, low VIF variables are input variables; (2) when high VIF variables are products of input variables; (3) when high VIF variables are categorical variables or product of 3, or more product variables.

Multicollinearity tends to cause overfitting, which is bad for prediction...which is precisely what we're interested in doing! Fortunately, there is no multicollinearity: the VIG scores are routinely around 1.1, though for the Bayesian logistic regression on unweighted swings has a score of 1.608148 for the number of post-convention campaign events, and 1.571879 for the number of swings. These aren't terrible, though worth keeping an eye on.

Predictive or Postdictive? We should be careful with the inputs to make sure we can have predictions about which states are swing states. To be clear, when predicting if a state is a swing state in a given election, we use the margin-of-victory, a bellwether dummy variable (1 if it went to the winner and 0 otherwise), and number of post-convention events all from the previous election. This lets us construct a logistic regression trained on past data, then predict which states are "swing states" in 2020 from the 2016 data.

Training Data. We use results reported in JS Hill, E Rodriquez, AE Wooden, "Stump speeches and road trips: The impact of state campaign appearances in presidential elections" for determining swing states in 2000, 2004, and 2008. For campaign events, we consulted a variety of sources. There's some freedom in whether we count the appearances of just the presidential candidate, or if we include the vice presidential candidate; and independently whether we consider all campaign appearances or post-convention appearances.

Ostensibly, if we disagree with their assessments (and it is fairly subjective what to include or exclude), then we would get a slightly different estimate.

Testing 2012 Elections

When applied to 2012, we have the following table of swing states:

state Unweighted Bayes weighted Predict weighted Bayes
Florida 0.9620333 0.9283174 0.9689775 0.9295346
Missouri 0.9342639 0.8611949 0.9945827 0.9668182
Ohio 0.9330508 0.8862916 0.9888160 0.9526306
North Carolina 0.9162987 0.8348147 0.8095200 0.7615408
Indiana 0.8679019 0.7702609 0.6951285 0.6684250
Montana 0.6741770 0.5723402 0.9696912 0.8823920
Colorado 0.3780801 0.3857023 0.7783322 0.6267094
Georgia 0.3267437 0.2887727 0.8431760 0.6549777
Virginia 0.2263317 0.3512313 0.0971042 0.2399934
New Hampshire 0.2212433 0.2562481 0.4795003 0.4063095
Arizona 0.0964162 0.1039871 0.0715052 0.1014729
Iowa 0.0826769 0.1450948 0.0155189 0.0617803
Nevada 0.0643182 0.0968483 0.1186013 0.1386136
Pennsylvania 0.0592556 0.1298407 0.2179990 0.2636838

Testing 2016

Remember, we're retrodicting the probability a state will be a swing-state in 2016 based off of the number of post-campaign events held in 2012, the number of swings, if it was a bellwether, and the previous (2012) margin of victory. Since there were some surprises (e.g., the Rust belt), we should remember in 2012 there were few post-convention events in, say, Michigan. I have put in bold states which now seem like they should've been treated as swing states, but at the time were not. I have also italicized the states which at the time were treated as swing states, but don't seem to qualify looking back.

state Unweighted Bayes weighted Logistic weighted Bayes
Florida 0.9971685 0.9885786 0.9997810 0.9968004
Ohio 0.9960023 0.9866616 0.9968622 0.9847068
Iowa 0.8315618 0.7730072 0.6044177 0.6155665
North Carolina 0.7312868 0.7398799 0.5124623 0.6338570
Virginia 0.6873719 0.6128545 0.5994374 0.5711363
Colorado 0.6148295 0.5011042 0.9014906 0.7451717
New Hampshire 0.4536818 0.3772813 0.4851559 0.4198070
Nevada 0.3254291 0.2793178 0.2640912 0.2670547
Wisconsin 0.1599034 0.2773650 0.0535469 0.1673131
Pennsylvania 0.1590733 0.2847362 0.0543191 0.1812992
Arizona 0.1124913 0.1824953 0.5033183 0.4367127
Georgia 0.0418023 0.1079614 0.1573142 0.2413978
New Mexico 0.0411491 0.0523271 0.0102234 0.0281328
Indiana 0.0402132 0.0848593 0.0061927 0.0328337
Missouri 0.0198727 0.0605405 0.0120657 0.0521976
Michigan 0.0198235 0.0602409 0.0036850 0.0267344

Code. All the code doing these calculations are available.

References

  • Stacey Hunter Hecht and David Schultz(eds), Presidential Swing States: Why Only Ten Matter. Lexington Books, 2015.
  • David Schultz and Rafael Jacob, Presidential Swing States. Second ed., Lexington Books, 2018.

Data Sources

  • Thomas M. Holbrook, "Did the Whistle-Stop Campaign Matter?". PS: Political Science and Politics 35, No. 1 (2002) pp. 59–66 Eprint. (For Truman vs Dewey)
  • Scott L. Althaus, Peter F. Nardulli, Daron R. Shaw, "Candidate Appearances in Presidential Elections, 1972-2000". Political Communication 19, no.1 (2002) pp.49–72 Eprint.
  • JS Hill, E Rodriquez, AE Wooden, "Stump speeches and road trips: The impact of state campaign appearances in presidential elections". PS: Political Science & Politics 43, no.2 (2010) pp. 243–254. Eprint. (For data by state for the 2000, 2004, 2008 elections.)
  • Democracy in Action gives data "by state" for 2000, 2004, 2008, 2012, 2016.
  • MIT Election Lab for state-level Presidential election results, 1976–2016.

No comments:

Post a Comment