Monday, April 29, 2019

TeX on blogger

Note to self: if I screw things up customizing the style, the javascript snippet to enable TeX on blogger is:

<script type='text/x-mathjax-config'>
  MathJax.Hub.Config({
    tex2jax: {
      inlineMath: [ ['$','$'], ["\\(","\\)"] ],
      displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
      processEscapes: true
    },
    "HTML-CSS": { availableFonts: ["TeX"] }
  });
</script>
<script async='async' 
  src='https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=TeX-AMS-MML_HTMLorMML'
  type='text/javascript'>
</script>

Friday, April 26, 2019

Estimating Legislator Ideal Points

We briefly introduced the idea of issue spaces as a formalization of the political spectrum. Now we want to figure out where legislators are on that political spectrum.

Game theory models political actors as an Ideal Point in an issue space equipped with a utility function on that issue space. But how do we estimate (unobservable) ideal points? One strategy is to try to use votes, which is the basis for the (i) Item-Response and (ii) NOMINATE families of algorithms.

Basic Idea

The policy space consists of s dimensions. Legislator i has his/her utility function for voting yes (y subscript) on measure j be a function of the "distance" from the legislator's ideal point to the proposed legislation's point. We use a slightly generalized version of the Pythagoren theorem, where the "distance" first dilates the coordinates by the subjective weights the legislator places wk for each dimension k in the policy space:

l i j y 2 = k = 1 s w k 2 d i j y k 2

A legislator's utility function is then some "suitably nice" function of these distances, u(l). Well, this isn't quite the end of the story, because we're dealing with statistical regression, we just described the "deterministic part" of the utility function. We also have the "random noise" ε

U ( l i j y ) = u ( l i j y ) + ε

We fix u to be either a Gaussian function or a quadratic polynomial. The only condition is that "it looks like a frown" (it has a global maximum at the legislator's ideal point, and is strictly decreasing).

How to progress? Well, we can represent the probability of voting "yea" in terms of the utility function (and how this is done varies model-by-model), then estimate the parameters (the wk for each legislator, and each legislator's ideal point, and each motion's location in the policy space) using something like maximizing the likelihood or expectation maximization.

NOMINATE models

The utility functions are Gaussian functions. If there are s dimensions to the policy space, legislator i has his/her utility function for voting yes (y subscript) on measure j, where wkdk measures the "cost" for deviating from the legislator's ideal point in the kth dimension of the issue space,

u i j y = β exp [ k = 1 s w k 2 d i j y k 2 / 2 ]

Observe the exponent is just the l "cost" for the legislator to support the measure. Well, this isn't quite the end of the story, because we're dealing with statistical regression, we just described the "deterministic part" of the utility function. We also have the "random noise" ε, giving the utility function U as

U i j y = u i j y + ε i j y

Note: we can similarly define the utility for voting "nay" by considering instead the location of the status quo in policy space, computing the distance to that point for the legislator. This is precisely uijn the utility function for voting "nay". The stochastic term ε of the utility function is assumed to follow an "extreme value distribution", which lets us write the probability legislator i votes for outcome y on roll call j as:

Pr ( Yea ) = P i j y = exp ( u i j y ) exp ( u i j y ) + exp ( u i j n )

The exact details of this variant of the NOMINATE algorithm may be found in "Scaling Roll Call Votes with wnominate in R", and it works for a single session of congress.

The models describe estimating the ideal points for a finite set of legislators within the same session of Congress. But how do we handle "progress"? I.e., how ideal points evolve over time (across multiple sessions of Congress)? The legislator's ideal point is then a polynomial in t (sessions since joining), supposing the legislator has served T terms (thus far in his life):

x i t = x i 0 + x i 1 P 1 ( t 1 T 1 ) + + x i ν P ν ( t 1 T 1 )

Where Pk is a Legendre polynomial, and the xit are more parameters to be determined. Why use a Legendre polynomial? It's unclear to me, presumably for its completeness relation (any function on the domain 0 < x < 1 can be adequately approximated by "enough" Legendre polynomials). This is the DW-NOMINATE variant.

Problems

Although ubiquitous in the literature, there are some problems with the NOMINATE scores.

First, the dimensions are not as clear to interpretation as its proponents claim. The first dimension is always interpreted as the "partisanship" dimension, but there's no clear way to glean that other than guessing.

Second, it poorly describes how someone's views evolve over time. This is important if we wanted to discuss, e.g., "party realignments" (Are the Republicans from the 1990s "the same as" the Republicans in 2019?).

Third, NOMINATE requires a lot of data before it can produce decent results. This has probably improved since the original algorithm, there are so many now it's hard to keep track.

Fourth, it's not Bayesian. This is unfortunate from a performance perspective. If I have just computed the NOMINATE scores for legislators based on the first session of congress, then 6 months into the next session I want to update those scores...I have to recompute everything from scratch. This isn't as terrible as the previous problems, but it is irritating.

Item-Response Models

The basic idea is to take advantage of votes as if they were responses to a survey, then use the already developed Item-Response Theory. The basic idea, as applied to ideal points of legislators, is to consider roughly a probit model for the probabilities that a legislator will vote "yea":

Pr ( y i j = 1 ) = Φ ( β j x i α j )

where Φ is the CDF for the Normal distribution.

This can be reinterpreted as an Item-Response model used (apparently) in educational testing, where βj is the "item discrimination parameter" and αj is the item difficulty parameter. Clinton, Jackman, and Rivers' "The Statistical Analysis of Roll Call Data" (2004) was the first to approach ideal point identification using Item-Response theory, at least so far as I can tell from the literature.

This led to a multitude of variants: emIRT improved performance, for example; while Martin and Quinn's work on Supreme Court justices ideal points produced innovative algorithms which are Bayesian and dynamical (take a "random walk" in the issue space, as it were).

This turns out to be superior for analyzing the dynamics of ideal points. Specifically for the questions of party realignment, Caughey and Schickler (2014) caution us to use a dynamic IRT model. Although computationally intensive, progress has been made (easily bundled, e.g., with the idealstan R package).

Problems with Item-Response Models

First, Item-Response models are scale-invariant — we can rescale the coordinates for the policy space however much we want. So the numeric values themselves may not matter for ideal points insomuch as their relationship to each other.

Second, for policy spaces which are not 1-dimensional, item-response values are rotation invariant. For 1-dimensional policy spaces, item-response doesn't know whether to order values from most liberal to most conservative or vice-versa.

But both these problems can be solved using semi-informative priors in the Bayesian approaches.

The third problem, perhaps more grave, is we are restricted to certain dimensions due to computational constraints. The NOMINATE algorithm could handle 8 dimensions, no problem; but item response algorithms struggle with determining ideal points in more than 2 dimensions within a sensible period of time.

Conclusion

If you are interested in an overview — a "big picture" of congress — without concern about nuance, the NOMINATE scores may be good enough.

Although it produces a decent approximate ideal point for legislators, it fails to adequately capture how a legislator evolves over multiple sessions. This makes it less than ideal for making any claims about party realignments.

Further, it fails to capture issue-specific nuances for each legislator. Presumably higher dimensionality fixes the problem, but giving, say, 16 numbers worsens the intuitive picture for a single legislator. It is unclear if the Item-Response families suffer the same problem. (See arXiv:1209.6004 for details.)

References

  • Nolan McCarty, Measuring Legislative Preferences. This review fleshes out more sordid details underpinning the general notion of "ideal points" than I have written about.
NOMINATE algorithms Item-Response Algorithms

Wednesday, April 24, 2019

Issue Space: A Primer on Spatial Voting

The first step towards applying rational behavior to Congressional politics is to consider a body of voters deliberating on a proposed bill. The bill is up for a passage vote (i.e., a vote considering whether to enact it or not), so a given voter has two choices: yea [enact] or nay [do not enact].

We model each voter as independent rational agents who possibly interact. But the real question I'd like to address in this post is: How do we model the bill, the question?

Example 1. Consider a ballot initiative for giving a raise to school teachers. The initiative will pay school teachers $x per year. Ostensibly x could be any real number.1 Strictly speaking, it would be a subset of the real numbers, since we'd have to truncate real numbers to 2 digits after the decimal point. Each voter has a belief about what the pay should be, and this could be determined subjectively. Some may believe school teachers should be volunteers or charity funded, and thus would prefer x to be 0. Others may believe teachers deserve a living wage and thus prefer x to be closer to, say, $45000. This "preferred wage" each voter has, we call the voter's Ideal Point.

The choice the voter faces is between $x and whatever the current wage $wcurrent. We need to give each voter a utility function U mapping any given proposed wage to that voter's "utility". More precisely, it measures "how far off" a proposed wage is from that voter's "ideal wage". The exact interpretation and mathematical properties of the utility function is the topic for a future post, today we're interested only in the issues.

The one-dimensional real line containing the proposed wages $x versus $wcurrent is the domain of the utility functions of the voters. This "space of possible school teacher wages" is the Issue Space of the proposed measure. (End of Example 1)

Dimensional Reduction. We could divide up any piece of legislation into policies. Our previous example could have simultaneously included a change in taxes to fund the increase in school teacher wage, and we'd have 2 ostensible dimensions to consider: the tax rate, and the school teacher wage.

For a real piece of legislation, such a naive translation of a bill into policies may result in a combinatorial explosion of dimensions in the issue space.

What (apparently) happens is, we bundle policy dimensions into (hopefully coherent) world views which we classify as the Political Spectrum. In some sense, we implicitly perform a kind of Principal Component Analysis to reduce the proposed policies implemented in a given bill down into a lower-dimensional "Policy Space". This is done informally, and we do it all the time when we say, "Oh, this bill is a liberal bill", we just boiled down all the policies into one-dimension (the left/right spectrum).

There is no exotic geometry to the policy space, it's usually N-dimensional real space for N around 2.

Definition 1. A bill's Issue Space is the space of all possible implementations of the proposed policies contained in the legislation's text.

The Policy Space is a "coarse-grained" N-dimensional real space, in the sense that any legislation or proposed policy can be located as a point in that N-dimensional space.

Warning: This distinction between "policy space" and "issue space" is one I am making at present. In the literature, the terms are used interchangeably to refer to the "coarse-grained" lower-dimensional space. Following suite, I will have to respect tradition, and in future posts use the terms interchangeably unless otherwise explicitly stated.

Model Refinement. If we take this seriously, then we just need to model actors (rational voters) using (i) their ideal point and (ii) their utility function (preferences). Well, we also need to model:

  1. the institutional factors ["rules to the voting game"],
  2. if voters interact with each other and how it'd affect their behavior, and
  3. how voters get and process information.

Empirical Concerns. We also need to determine how many dimensions there are to the policy space. We could, ostensibly, have a large number dimensions (say, N = 26 dimensions or something), but that's just a wild guess. As far as I am aware, there is no rigorous way to measure the dimensionality of the policy space.

I also wonder about the geometry of the issue space (is there curvature? What about symmetries?) as well as its topology (is it connected? Compact? Does it have nontrivial homotopy groups or homological ring?). This wouldn't really impact much, except the geometry may have surprising results in voter behavior.

Further, we have to come up with some model of voter utility functions. There are two popular choices, namely a Gaussian and a quadratic polynomial, both functions of "distances" between the voter's ideal point and the proposed legislation location in policy space. The "distance" is measured using a voter-dependent metric (how "painful" it is to stretch that distance away from the voter's ideal). I'll discuss this more in a future post on ideal points.

References

I don't really have any, since this is glossed over in the literature to get to voter preferences in spatial voting models.

Monday, April 22, 2019

Criticisms of Instrumental Rationality

As a follow up to the post Agents are Instrumentally Rational, I thought it would be good to discuss a number of criticisms to instrumental rationality. It's healthy to do so, since political actors are not cold, calculating automatons. In elections, voters do not vote rationally. Politicians may or may not behave rationally. If political actors are (gasp) human, perhaps examining the flaws of rationality will illuminate aspects to better model scenarios. And unlike economists, we are trying to fit theory to reality.

There is a growing literature testing predictions "instrumental rationality" makes. Alarmingly, the vast majority of this literature finds actors are not instrumentally rational.

The basic problem seems to be, expected utility models actors as computers trying to optimize some quantity. But the human brain is more like a sophisticated "pattern recognition machine", and there are builtin "short circuits" to avoid heavy computations but tend to produce false-positives. (This is useful for doing things which do not need heavy computations; it is a feature, sometimes a bug.) This is the pioneering work of Daniel Kahneman and Amos Tversky. There is a wonderful book Thinking: Fast and Slow, by Kahneman himself, summarizing the research.

There are two points of particular concern I'll mention here. In another post, I will go on a philosophical spelunking on where this particular notion of "rationality" comes from (Hume) and the criticisms it has faced in the past.

Allais Paradox

Lets try testing the framework of preferences over prospects, specifically the independence axiom. You have to make one choice between two alternative lotteries:

Lottery 1:
Win $2500 with probability 33%
Win $2400 with probability 66%
Win nothing with 1% probability

Lottery 2: win $2400 with certainty.

Once you made that choice, you need to choose between two more lotteries:

Lottery 3:
Win $2500 with probability 33%
Win nothing with probability 67%

Lottery 4:
Win $2400 with probability 34%
Win nothing with probability 66%

Which would you prefer? What does expected utility suggest?

The expected winnings from Lottery 1 would be $2409, whereas the expected winnings from Lottery 2 is $2400. The instrumentally rational individual would pick Lottery 1, even though most people empirically choose Lottery 2.

Similarly, the expected winnings from lottery 3 is $825, whereas the expected winnings for lottery 4 is $816. But again, people choose lottery 4 over lottery 3.

The Allais Paradox is the observation that, in experiments, volunteer behavior directly contradicts rational behavior. Sugden's Rational Choice: A Survey of Contributions from Economics and Philosophy reviews the literature on this topic.

The conflict is with the axiom of independence. If we use an alternative axiomatization for rational behavior ("Savage's axioms"), the Allais paradox contradicts the "sure-thing principle".

Source of Beliefs

So, how does a rational actor "acquire" beliefs? (An instructive exercise for the reader, harking back to Socrates, is to consider how the reader "acquires" beliefs.)

For some political actors, it doesn't really matter. I'm pretty certain Senator Ted Cruz has beliefs on almost everything, and as far as how he acquired them, well, it doesn't matter.

For other types of political actors, like your "everyday voters", it does matter. Voter belief is actually a hotly debated topic: to what degree voters are "rational actors", where they acquire their party identification or how they develop affinity for a candidate, these are all hot topics.

There is universal agreement in the literature rational agents update their beliefs using Bayes inference. We should recall from probability that Bayes' theorem is a generalization of contrapositive in logic. Heuristically, what happens is we have some mathematical model of the world using random variables and parameters (denoted θ in the literature, possibly a vector of parameters). Some event E occurs, and we use adjust our model's parameters based on the event occurring.

[I don't have the space to describe the details (though I should in some future blog post), the interested reader is encouraged to read John Kruschke's Doing Bayesian Data Analysis for details.]

But where do the initial estimates for prior distributions come from? Where do the "initial beliefs" emerge? There are two answers to this query.

First answer: from the search for information itself. People do not sit around waiting for "information" to fall into their laps. No! Rational actors must actively pursue information. The original "prejudices" (initial beliefs) are adjusted as more information is actively obtained.

When does a rational actor cease seeking information? Economists answer, with either a sigh or a smile, the rational actor will stop when the utility of the information gained equals the cost of the search for that information (with the cost evaluated in utility terms). As long as there is more utility gleaned from seeking than it costs to seek, a rational actor will keep searching. This is a rather cute, self-consistent solution.

But it begs the question: how does an actor know how to evaluate the utility of the new information prior to obtaining it? Perhaps our actor has formulated expectations about the value of additional information. How did our actor acquire that expectation of the value of information?

A waggish defender might say, "By acquiring information about the value of information up to the point where the marginal benefits of this (second-order) information were equal to the costs." This solution really degenerates into an infinite regress, since we can ask the same question of how the actor knows value of this second-order information.

There are two ways to stop the infinite regress:

  1. Something additional is needed. This concedes the instrumental rationality paradigm is incomplete.
  2. The only alternate would be to assume that the individual knows the benefits the actor can expect from a little more search because the actor knows the full information set. But then there is no problem: the actor knows everything already!

I wonder if we might not be more generous with the waggish defender, and try to bootstrap some interpolated polynomial from first-order, second-order, ..., N-order costs of information? My intuition suggests the answer to be "In general, no; only for a few special edge cases can this bootstrap occur coherently."

A last remark: this discussion reminds me of Meno's paradox (not to be confused with Zeno's paradox). In Plato's dialogue, Meno, Meno asks Socrates And how will you inquire into a thing when you are wholly ignorant of what it is? Even if you happen to bump right into it, how will you know it is the thing you didn't know? [80d1-4] Socrates reformulates it thus [A] man cannot search either for what he knows or for what he does not know[.] He cannot search for what he knows—since he knows it, there is no need to search—nor for what he does not know, for he does not know what to look for. [80e] Or, phrased more relevantly for our discussion, how can a rational actor actively pursue information about a matter which the actor is completely ignorant of?

I'm sure the rejoinder would be, "Rational actors seek information precisely how Socrates sought answers to questions on matters he professed ignorance about." But it still dodges the question.

Second answer: Beliefs as purely subjective assessments. This is following Savage's The foundations of statistics (1954), where beliefs are purely subjective assessments. They are what they are, and only revealed ex post by the choices people make.

This avoids a lot of the problems plaguing the first answer. Unfortunately, we have some experimental evidence casting doubt on the consistency of such subjective assessments and more generally on the probabilistic representations of uncertainty; most famous of which is the Ellsberg paradox.

Game theory has been pursuing the line of reasoning Savage provides. But then this may license any kind of action, rendering instrumental rationality nearly vacuous. Game theorists have sought to prevent this "purely subjective assessments" turning against itself [i.e., letting "anything" be a solution to describe rational behavior] by supplementing instrumental rationality with the assumption of the common knowledge of rationality. This leads to weak solutions to game theoretic problems apparently called Rationalizability, not to be confused with the psychological mechanism of "Rationalizing" (i.e., lying to one's self to feel better).

References

  • Shaun Hargreaves Heap and Yanis Varoufakis, Game Theory: A Critical Introduction. Second ed., Routledge.
  • John Searle, Rationality in Action. MIT Press, 2001.
Savage Axioms

Wednesday, April 17, 2019

Geometric Mean in Probability

If we have N estimates for the probability of an event, say p1, ..., pN, then the best a good estimate for the probability is the geometric mean:

p = [p1×...×pN]1/N.

To see this, think of probability from a frequentist perspective, pj = (estimated number of trials where event occurred)/(estimated number of trials), i.e.,

pj = nj,x / nj

where nj,x is the number of trials where the event x occurred, and nj is the estimated number of trials.

We can best estimate the numerator as the geometric mean of the numerators of our estimates

nx = [n1,x×...×nN,x]1/N.

(and similarly for the denominator), since the geometric mean is the best way to combine different estimates of possibly different orders of magnitude.

If the numbers are "close enough", the geometric mean will not differ greatly from the arithmetic mean ("average"). To see this, simply consider the Taylor expansion of [1 + x]1/N to linear order, take pj to be μ + Δpj where μ is the arithmetic mean and |Δpj /μ| < 1. Expanding the geometric mean will produce the arithmetic mean plus "small" corrections of order N−1.

For numbers which are "spread far apart", the geometric mean gives better estimates than the arithmetic mean. I suppose one way to think about this is the logarithm tells us the "order of magnitude" for a quantity. The order of magnitude for the revised estimate should be the arithmetic mean of the orders of magnitudes for our various estimates. The geometric mean, as the revised estimate, is the only quantity that can do this.

One fun book (among many) is Order of Magnitude Physics.

Addendum (). I struck out "the best" estimator, because I actually don't have a proof off the top of my head that this is optimal in any sense. It's "good", consistent with the frequentist interpretation of probability, and most importantly it works. But I do not have a well-defined notion of an "error" or "loss function" which the geometric mean of probability estimates minimize, and thus I felt it dishonest to describe it as "the best".

That said, "absence of evidence is not evidence of absence". I may be ignorant of some folklore that the geometric mean of probabilities optimizes some desirable property, and really is (in some sense) "the best estimator". I just don't have the proof to back the claim, so I will revise the claim.

Monday, April 15, 2019

How many days in the year?

Here's a brain teaser: your helpful lab assistant has rounded up a sample of N individuals. One by one, they tell you their birthday. What value of N is needed to determine there are 366 days in the (leap) year? (Or, if you hate leap years, that there are 365 days in the year.)

Variant A: You only know the existence of a day when someone tells you their birthday. So, if the first person says they were born January 2nd, you cannot infer January 1st must exist because "1 < 2".

Variant B: You can infer from January 1st the existence of January 2nd.

Possible acceptable answers include but are not limited to: a probability distribution for getting the correct number (as a function of N), the N which maximizes the likelihood of getting the correct number of days in a year, or the expected value for N.

Variant C: What other ways are there to determine N?

I may post a solution next week to this (or the week after).

Sunday, April 14, 2019

Next Candidate to Announce will be Monday or Tuesday

So, candidates are forming exploratory committees, few are formally announcing. Keeping track of the date an exploratory committee was announced or (if the candidate just filed without forming a committee) the date of FEC filing. The data accumulated so far:

Candidate Announced
Elizabeth Warren December 31, 2018[cnn]
Tulsi Gabbard January 11, 2019[fec]
Julian Castro January 12, 2019[bloomberg.com]
Kirsten Gillibrand January 15, 2019[nytimes]
Kamala Harris January 21, 2019[fec]
Pete Buttigieg January 23, 2019[politico]
Cory Booker February 1, 2019[fec]
Amy Klobuchar February 10, 2019
Bernie Sanders February 19, 2019[fec]
Jay Inslee March 1, 2019[fec]
John Hickenlooper March 4, 2019[fec]
Beto O'Rourke March 14, 2019[fec]
Mike Gravel March 19, 2019[nbc]
Tim Ryan April 4, 2019[nbc]
Eric Swalwell April 8, 2019[fec]

I goofed on thinking Ojeda was the start of the primary process, Senator Warren seems like the opening candidate.

The questions that spring to mind include:

  1. How many people will qualify for the June debates?
  2. Are the Democratic candidates similar in announcement behaviour as past Republican candidates?
  3. When will the next announcement be?

I will dig through the first two questions in a future blog post, but the third question is time sensitive. The short answer is we can expect the next candidate to emerge Monday night or Tuesday morning, the exact probability distribution is plotted below:

This is based on the Jeffreys prior for predicting the exponential distribution using the prior candidates in this cycle as the data points (c.f., [wikipedia]). The expected number of days after the Swalwell's announcement is 98/13 ≈ 7.53846.

I'll "show my work" in a future blog post, and include a "DIY" equation to predict the next announcement based on N candidates already announced and d (the number of days between the latest candidate and the first, Elizabeth Warren's announcement date).

Addendum . Representative Seth Moulton (D-MA, 6) was the next candidate to announce he was running for president, throwing his hat in the ring on April 22, 2019: a full week after expected. The probability of this happening, with the Bayesian posterior used here, is approximately 1.927626% whereas the maximum likelihood estimate would give 1.93336%; however we cut it, it's around 1.93% probability. (I forgot to publish this addendum on the date I wrote it, it was saved as a draft for about a month.)

(After further thought, Moulton declared \(2/\lambda\) days after the previous candidate, with \(\lambda=13/98\approx 1/7.5\). The probability of a candidate declaring after twice the expected value in an exponentially distributed model is \(\Pr(x\geq 2/\lambda)\approx 0.13533\) which isn't unreasonable. An event occurring with 13.5% probability is roughly the same odds as getting 3 heads in a row with a fair coin.)