Game Plan: We'll introduce the notion of "instrumental rationality" as an ordering of alternatives with some technical condition. Then we'll discuss measures of "preference" via utility functions. Then we conclude by discussing maximizing utility under uncertainty.
Loosely put, individuals who are instrumentally rational have preferences over various "things" (e.g., baby-back ribs are preferred to chicken, and chicken is preferred to bread). Such individuals are deemed "rational" for picking actions which satisfy those preferences. The only constraint is that preferences are ordered in some suitably "weakly coherent" way (e.g., ribs are still preferred to bread if there is no chicken).
The convention is to call these "things" preferred as "Alternatives".
Definition 1. Let an actor be choosing between countably many possible different alternatives x1, x2, x3, …. An actor is called Instrumentally Rational if the actor has preferences satisfying the following conditions:
- Reflexivity: No alternative xi is less preferred than itself.
- Completeness: For any two alternatives xi and xj, either (1) xi is strictly preferred over xj, (2) xj is strictly preferred over xi, or (3) the actor is indifferent between the two alternatives.
- Transitivity: For any alternatives xi, xj, xk, if xi is no less desired than xj, and if xj is no less desired than xk, then xi is no less desired than xk.
- Continuity: For any alternatives xi, xj, xk, if xi is (strictly) preferred to xj, and if xj is (strictly) preferred to xk, then there exists some "composite" of xi and xk (call it y) which is equally as desired as xj.
Remark 1 (On Continuity). There are two ways to interpret the continuity axiom. The first perspective is to think of y as a "basket" containing "bits" of xi and "bits" of xk. For example, if xi is "18 ribs", xj is "half a roasted chicken", and xk is "10 rolls", then there is some composite ("9 ribs and 5 rolls") which is equally as desirable as half a chicken.
The other perspective is to think of y as a lottery, where the actor obtains xi with probability p (0 < p < 1) and xk with probability 1 − p. The continuity axiom then says there is some p for which the actor is indifferent between the lottery y and the alternative xj.
Remark 2 (Ordering, Utility Functions). The first three axioms taken together implies the actor has a well-defined preference ordering (in the mathematical sense). When the continuity axiom is added, the preference ordering may be "represented" by a utility function (i.e., a function assigning to each alternative xi some real number U(xi) reflecting the "utility" or "desire" for that alternative). An actor making choices to satisfy his or her preference ordering can be viewed "as if" maximizing his or her utility function.
Now, discussions of "utility" of an alternative should not be confused with the philosophy of Utilitarianism. A utility function just assigns some numbers such that the ordering induced by it is the same as the actor's preference relation. That is to say, U(xi) > U(xj) if and only if xi is strictly preferred to xj. The numbers represented by U(xi) are measured in utils, which is Agent-dependent and measures that Agent's preference for the given alternative.
Ordinal Utilities, Cardinal Utilities, Maximizing Expected Utility
Definition 2. If we assign utility "arbitrarily" but in a manner consistent with the preference ordering (e.g., for any alternatives X and Y such that X is preferred to Y, we assign the utilities such that U(X) > U(Y) but otherwise the quantities remain arbitrary), then we call such utility the Ordinal Utility.
Here we must stress again there are two important points of assigning utility in a manner which captures only the preference ordering (and nothing else).
First, ordinal utility does not describe the agent's "intensity of desire" for an alternative. The "strength of preference" is not captured by this notion. So how much more I want ribs than chicken is not adequately described by this notion, just the fact that I really want ribs right now (and not chicken, much less bread).
Second, ordinal utility cannot be compared "across agents". The ordinal utility I assign to a full rack of baby-back ribs cannot be compared to anyone else's ordinal utility for, say, Lasagna. We can only compare my ordinal utility for baby-back ribs against my ordinal utility for Lasagna.
Dealing with Uncertainty
My local BBQ joint smokes 1 pig per day, and when it's all sold, there's no more. If I am hungry, should I go before the lunch rush or afterwards?
Here we must talk of Prospects, outcomes and their associated probabilities.
For our particular situation, there is a decision I must make (go before the lunch rush or after) and two outcomes (there is food left, or they ran out of food). One prospect is given by the possible outcomes to a given choice of going before the lunch rush (go before lunch AND they have food, go before lunch AND no more food; p, 1 − p)
. The other prospect is given by the decision to go after the lunch rush (go after lunch AND they have food, go after lunch AND no more food; q, 1 − q)
.
Observe, each decision has different possible outcomes, but the probabilities for the outcomes on a given decision must sum to 100%: something must happen when I take a decision.
We now need to consider preference ordering over prospects.
Definition 3. Suppose a person must choose between actions with uncertain outcomes, in the sense that: each action has various possible outcomes associated with it, each with some probability. We call this action a Prospect and represent it by a pairing of the possible outcomes with their respective probabilities (y1, y2, ...; p1, p2, ...) where the outcome yi occurs with probability pi, and the probabilities sum to 1 = p1 + p2 + ... (since there must be an outcome to the action).
Remark. There is a "nested structure" to prospects, in the sense that yi might be an "atomic outcome" (e.g., "there will be food", "there will be no food", "it will rain", "the world will end", etc.) or another prospect (imagine "I flip a coin; if it is heads, then I do this action, but if it is tails then I do some other action").
An actor's Preferences over Prospects are called Consistent if the preference satisfies axioms (1), (2), and (3) of Definition 1, and:
- Continuity: Consider three prospects yi, yj, and yk, and suppose the first is preferred to the second and the second is preferred to the third. Then there exists some probability p such that the prospect (yi, yk; p, 1 − p) is equally as preferable to yj (compare to the second interpretation forwarded in Remark 1).
- Preference increasing with probability: If yi is preferred to yj, letting ym = (yi, yj; p1, 1 − p1) and yn = (yi, yj; p2, 1 − p2), then ym is preferred to yn only if p1 > p2.
- Independence: For any three prospects yi, yj, and yk, if yi is preferred to yj, then there exists a probability p such that the prospect (yi, yj; p, 1 − p) is no less desired than (yi, yk; p, 1 − p)
Given this notion of "consistent preferences over uncertain prospects", how can we develop a notion of instrumental rationality?
Maximizing Expected Utility
The first step is to introduce the notion of Cardinal Utility, which assigns to a given outcome yi the intensity for an agent's preference for that outcome u(yi).
The second step is to consider the Expected Utility of a Prospect y = (y1, y2, ...; p1, p2, ...) as the sum Eu[y] = u(y1)p1 + u(y2)p2 + ..., which is the expected value of the "random variable".
Now, an agent with cardinal utility u(-) is considered instrumentally rational if it picks the action whose prospect has the maximum expected utility.
Example 1. If I go to my favorite BBQ restaurant before the lunch rush, the prospect looks like ("get food", "no food"; 0.95, 0.05). If I leave after the lunch rush, the prospect looks like ("get food", "no food"; 0.1, 0.9).
My cardinal utility function looks like u(get food) = 10, u(no food) = −30.
The expected utility for going before the lunch rush is then
E[before] = u(get food)×0.95 + u(no food)×0.05 = 10×0.95 − 30×0.05 = 9.5 − 1.5 = 8 The expected utility for going after the lunch rush is then
E[after] = u(get food)×0.1 + u(no food)×0.9 = 10×0.1 − 30×0.9 = 1 − 27 = −26 Since 8 > −26, it is rational to go before the lunch rush to try to get food.
Next time, we'll discuss flaws with this notion of instrumental rationality, both logical and empirical.
References
- Shaun Hargreaves Heap and Yanis Varoufakis, Game Theory: A Critical Introduction. Second ed., Routledge. (This is the axiomatization scheme I am following.)
- John Searle, Rationality in Action. MIT Press, 2001. (This provides a different set of axioms for rational behaviour, equivalent to the axioms of game theory, and discusses implicit assumptions & its flaws.)
No comments:
Post a Comment