Showing posts with label Issue Space. Show all posts
Showing posts with label Issue Space. Show all posts

Friday, April 26, 2019

Estimating Legislator Ideal Points

We briefly introduced the idea of issue spaces as a formalization of the political spectrum. Now we want to figure out where legislators are on that political spectrum.

Game theory models political actors as an Ideal Point in an issue space equipped with a utility function on that issue space. But how do we estimate (unobservable) ideal points? One strategy is to try to use votes, which is the basis for the (i) Item-Response and (ii) NOMINATE families of algorithms.

Basic Idea

The policy space consists of s dimensions. Legislator i has his/her utility function for voting yes (y subscript) on measure j be a function of the "distance" from the legislator's ideal point to the proposed legislation's point. We use a slightly generalized version of the Pythagoren theorem, where the "distance" first dilates the coordinates by the subjective weights the legislator places wk for each dimension k in the policy space:

l i j y 2 = k = 1 s w k 2 d i j y k 2

A legislator's utility function is then some "suitably nice" function of these distances, u(l). Well, this isn't quite the end of the story, because we're dealing with statistical regression, we just described the "deterministic part" of the utility function. We also have the "random noise" ε

U ( l i j y ) = u ( l i j y ) + ε

We fix u to be either a Gaussian function or a quadratic polynomial. The only condition is that "it looks like a frown" (it has a global maximum at the legislator's ideal point, and is strictly decreasing).

How to progress? Well, we can represent the probability of voting "yea" in terms of the utility function (and how this is done varies model-by-model), then estimate the parameters (the wk for each legislator, and each legislator's ideal point, and each motion's location in the policy space) using something like maximizing the likelihood or expectation maximization.

NOMINATE models

The utility functions are Gaussian functions. If there are s dimensions to the policy space, legislator i has his/her utility function for voting yes (y subscript) on measure j, where wkdk measures the "cost" for deviating from the legislator's ideal point in the kth dimension of the issue space,

u i j y = β exp [ k = 1 s w k 2 d i j y k 2 / 2 ]

Observe the exponent is just the l "cost" for the legislator to support the measure. Well, this isn't quite the end of the story, because we're dealing with statistical regression, we just described the "deterministic part" of the utility function. We also have the "random noise" ε, giving the utility function U as

U i j y = u i j y + ε i j y

Note: we can similarly define the utility for voting "nay" by considering instead the location of the status quo in policy space, computing the distance to that point for the legislator. This is precisely uijn the utility function for voting "nay". The stochastic term ε of the utility function is assumed to follow an "extreme value distribution", which lets us write the probability legislator i votes for outcome y on roll call j as:

Pr ( Yea ) = P i j y = exp ( u i j y ) exp ( u i j y ) + exp ( u i j n )

The exact details of this variant of the NOMINATE algorithm may be found in "Scaling Roll Call Votes with wnominate in R", and it works for a single session of congress.

The models describe estimating the ideal points for a finite set of legislators within the same session of Congress. But how do we handle "progress"? I.e., how ideal points evolve over time (across multiple sessions of Congress)? The legislator's ideal point is then a polynomial in t (sessions since joining), supposing the legislator has served T terms (thus far in his life):

x i t = x i 0 + x i 1 P 1 ( t 1 T 1 ) + + x i ν P ν ( t 1 T 1 )

Where Pk is a Legendre polynomial, and the xit are more parameters to be determined. Why use a Legendre polynomial? It's unclear to me, presumably for its completeness relation (any function on the domain 0 < x < 1 can be adequately approximated by "enough" Legendre polynomials). This is the DW-NOMINATE variant.

Problems

Although ubiquitous in the literature, there are some problems with the NOMINATE scores.

First, the dimensions are not as clear to interpretation as its proponents claim. The first dimension is always interpreted as the "partisanship" dimension, but there's no clear way to glean that other than guessing.

Second, it poorly describes how someone's views evolve over time. This is important if we wanted to discuss, e.g., "party realignments" (Are the Republicans from the 1990s "the same as" the Republicans in 2019?).

Third, NOMINATE requires a lot of data before it can produce decent results. This has probably improved since the original algorithm, there are so many now it's hard to keep track.

Fourth, it's not Bayesian. This is unfortunate from a performance perspective. If I have just computed the NOMINATE scores for legislators based on the first session of congress, then 6 months into the next session I want to update those scores...I have to recompute everything from scratch. This isn't as terrible as the previous problems, but it is irritating.

Item-Response Models

The basic idea is to take advantage of votes as if they were responses to a survey, then use the already developed Item-Response Theory. The basic idea, as applied to ideal points of legislators, is to consider roughly a probit model for the probabilities that a legislator will vote "yea":

Pr ( y i j = 1 ) = Φ ( β j x i α j )

where Φ is the CDF for the Normal distribution.

This can be reinterpreted as an Item-Response model used (apparently) in educational testing, where βj is the "item discrimination parameter" and αj is the item difficulty parameter. Clinton, Jackman, and Rivers' "The Statistical Analysis of Roll Call Data" (2004) was the first to approach ideal point identification using Item-Response theory, at least so far as I can tell from the literature.

This led to a multitude of variants: emIRT improved performance, for example; while Martin and Quinn's work on Supreme Court justices ideal points produced innovative algorithms which are Bayesian and dynamical (take a "random walk" in the issue space, as it were).

This turns out to be superior for analyzing the dynamics of ideal points. Specifically for the questions of party realignment, Caughey and Schickler (2014) caution us to use a dynamic IRT model. Although computationally intensive, progress has been made (easily bundled, e.g., with the idealstan R package).

Problems with Item-Response Models

First, Item-Response models are scale-invariant — we can rescale the coordinates for the policy space however much we want. So the numeric values themselves may not matter for ideal points insomuch as their relationship to each other.

Second, for policy spaces which are not 1-dimensional, item-response values are rotation invariant. For 1-dimensional policy spaces, item-response doesn't know whether to order values from most liberal to most conservative or vice-versa.

But both these problems can be solved using semi-informative priors in the Bayesian approaches.

The third problem, perhaps more grave, is we are restricted to certain dimensions due to computational constraints. The NOMINATE algorithm could handle 8 dimensions, no problem; but item response algorithms struggle with determining ideal points in more than 2 dimensions within a sensible period of time.

Conclusion

If you are interested in an overview — a "big picture" of congress — without concern about nuance, the NOMINATE scores may be good enough.

Although it produces a decent approximate ideal point for legislators, it fails to adequately capture how a legislator evolves over multiple sessions. This makes it less than ideal for making any claims about party realignments.

Further, it fails to capture issue-specific nuances for each legislator. Presumably higher dimensionality fixes the problem, but giving, say, 16 numbers worsens the intuitive picture for a single legislator. It is unclear if the Item-Response families suffer the same problem. (See arXiv:1209.6004 for details.)

References

  • Nolan McCarty, Measuring Legislative Preferences. This review fleshes out more sordid details underpinning the general notion of "ideal points" than I have written about.
NOMINATE algorithms Item-Response Algorithms

Wednesday, April 24, 2019

Issue Space: A Primer on Spatial Voting

The first step towards applying rational behavior to Congressional politics is to consider a body of voters deliberating on a proposed bill. The bill is up for a passage vote (i.e., a vote considering whether to enact it or not), so a given voter has two choices: yea [enact] or nay [do not enact].

We model each voter as independent rational agents who possibly interact. But the real question I'd like to address in this post is: How do we model the bill, the question?

Example 1. Consider a ballot initiative for giving a raise to school teachers. The initiative will pay school teachers $x per year. Ostensibly x could be any real number.1 Strictly speaking, it would be a subset of the real numbers, since we'd have to truncate real numbers to 2 digits after the decimal point. Each voter has a belief about what the pay should be, and this could be determined subjectively. Some may believe school teachers should be volunteers or charity funded, and thus would prefer x to be 0. Others may believe teachers deserve a living wage and thus prefer x to be closer to, say, $45000. This "preferred wage" each voter has, we call the voter's Ideal Point.

The choice the voter faces is between $x and whatever the current wage $wcurrent. We need to give each voter a utility function U mapping any given proposed wage to that voter's "utility". More precisely, it measures "how far off" a proposed wage is from that voter's "ideal wage". The exact interpretation and mathematical properties of the utility function is the topic for a future post, today we're interested only in the issues.

The one-dimensional real line containing the proposed wages $x versus $wcurrent is the domain of the utility functions of the voters. This "space of possible school teacher wages" is the Issue Space of the proposed measure. (End of Example 1)

Dimensional Reduction. We could divide up any piece of legislation into policies. Our previous example could have simultaneously included a change in taxes to fund the increase in school teacher wage, and we'd have 2 ostensible dimensions to consider: the tax rate, and the school teacher wage.

For a real piece of legislation, such a naive translation of a bill into policies may result in a combinatorial explosion of dimensions in the issue space.

What (apparently) happens is, we bundle policy dimensions into (hopefully coherent) world views which we classify as the Political Spectrum. In some sense, we implicitly perform a kind of Principal Component Analysis to reduce the proposed policies implemented in a given bill down into a lower-dimensional "Policy Space". This is done informally, and we do it all the time when we say, "Oh, this bill is a liberal bill", we just boiled down all the policies into one-dimension (the left/right spectrum).

There is no exotic geometry to the policy space, it's usually N-dimensional real space for N around 2.

Definition 1. A bill's Issue Space is the space of all possible implementations of the proposed policies contained in the legislation's text.

The Policy Space is a "coarse-grained" N-dimensional real space, in the sense that any legislation or proposed policy can be located as a point in that N-dimensional space.

Warning: This distinction between "policy space" and "issue space" is one I am making at present. In the literature, the terms are used interchangeably to refer to the "coarse-grained" lower-dimensional space. Following suite, I will have to respect tradition, and in future posts use the terms interchangeably unless otherwise explicitly stated.

Model Refinement. If we take this seriously, then we just need to model actors (rational voters) using (i) their ideal point and (ii) their utility function (preferences). Well, we also need to model:

  1. the institutional factors ["rules to the voting game"],
  2. if voters interact with each other and how it'd affect their behavior, and
  3. how voters get and process information.

Empirical Concerns. We also need to determine how many dimensions there are to the policy space. We could, ostensibly, have a large number dimensions (say, N = 26 dimensions or something), but that's just a wild guess. As far as I am aware, there is no rigorous way to measure the dimensionality of the policy space.

I also wonder about the geometry of the issue space (is there curvature? What about symmetries?) as well as its topology (is it connected? Compact? Does it have nontrivial homotopy groups or homological ring?). This wouldn't really impact much, except the geometry may have surprising results in voter behavior.

Further, we have to come up with some model of voter utility functions. There are two popular choices, namely a Gaussian and a quadratic polynomial, both functions of "distances" between the voter's ideal point and the proposed legislation location in policy space. The "distance" is measured using a voter-dependent metric (how "painful" it is to stretch that distance away from the voter's ideal). I'll discuss this more in a future post on ideal points.

References

I don't really have any, since this is glossed over in the literature to get to voter preferences in spatial voting models.