HET: Von Neumann-Morgenstern Axiomatization

> >

The von Neumann-Morgenstern
Expected Utility Theory

Contents

(i) Lotteries
(ii) Axioms of Preference
(iii) The von Neumann-Morgenstern Utility Function
(iv) Expected Utility Representation

Back

The expected utility hypothesis of John von Neumann and Oskar Morgenstern (1944), while formally identical, has nonetheless a somewhat different interpretation from Bernoulli's. However, the major impact of their effort was that they attempted to axiomatize this hypothesis in terms of agents' preferences over different ventures with random prospects, i.e. preferences over what can be called lotteries.

(i) Lotteries

Let x be an "outcome" and let X be a set of outcomes. Let p be a simple probability measure on X, thus p = (p(x₁), p(x₂), ..., p(x_n)) where p(x_i) are probabilities of outcome x_i Î X occurring, i.e. p(x_i) ³ 0 for all i = 1, ..., n and å _i=1ⁿp(x_i) = 1. Note that for simple probability measures, there are finite elements x Î X for which p(x) > 0, i.e. p has "finite support". Define D (X) as the set of simple probability measures on X. A particular lottery p is a point in D (x).

One of the first questions to be faced is how does an agent evaluate a compound lottery, i.e. a lottery which gives out tickets for another lottery as prizes rather than a certain reward? We can reduce compound lotteries into simple lotteries by combining the probabilities of the lotteries so that all we obtain is a single distribution over outcomes.

Suppose we have a lottery r with two possible outcomes: with 50% probability, it yields a ticket for another lottery p, while with the remaining 50% probability, it yields a ticket for different lottery q (this is shown heuristically on the left side in Figure 1b). Thus, r = 0.5p + 0.5q, where p and q are the lotteries which serve as outcomes of the lottery we are actually playing, i.e. lottery r. We can illustrate the reduction of r into a compound lottery in Figure 1b.

As shown in Figure 1a, simple lottery p has payoffs (x₁, x₂, x₃) = (0, 2, 1) with respective probabilities (p₁, p₂, p₃) = (0.5, 0.2, 0.3). Simple lottery q has payoffs (y₁, y₂) = (2, 3) with probabilities (q₁, q₂) = (0.6, 0.4). Thus, combining the sets of outcomes (on the right side of Figure 1b), the compound lottery r will have payoffs (z₁, z₂, z₃, z₄) = (0, 1, 2, 3). The probabilities of each of these outcomes of r are obtained by taking the linear combination of the probabilities in the original lotteries: so if outcome 2 had 0.2 probability in lottery p and 0.6 probability in lottery q, then it will have 0.5(0.2) + 0.5(0.6) = 0.4 probability in the compound lottery r. Similarly, outcome 1 has 0.3 probability in p and 0 probability in q, thus that outcome will have 0.5(0.3) + 0.5(0) = 0.15 probability in lottery r, etc. In short, the compound lottery r faces outcomes (z₁, z₂, z₃, z₄) = (0, 1, 2, 3) with respective probabilities (r₁, r₂, r₃, r₄) = (0.25, 0.15, 0.4, 0.2).

expect1a.gif (2590 bytes)

Fig. 1a - Two simple lotteries

expect1b.gif (3316 bytes)

Fig. 1b - Compound Lottery

In general, a compound lottery is a set of K simple lotteries {p_k}_k=1^K that are connected by probabilities {a _k}_k=1^K where å_k=1^K a _k = 1, so that we have lottery p_k with probability a _k. Thus, a compound lottery q is of the form q = a ₁p₁ + a ₂p₂... +a _Kp_K. The compound lottery q can be reduced to a "simple" lottery as q(x_i) = a ₁p₁(x_i) + a ₂p₂(x_i) + ... + a _Kp_K(x_i) can be interpreted as the probability of x_{i Î}X occurring. This is obtained from recognizing that å _k=1^Ka_k = 1 and å _i=1ⁿ p_k(x_i) = 1. Thus, defining q(x_i) = å _{ka k}p_k(x_i) then å _i=1ⁿq(x_i) = å _{ka k}(å _i p_k(x_i)) = å _{ka k} = 1. Thus, q = (a ₁p₁, ..., a _kp_k) is itself a simple lottery. Note that, as a result, D (X), the set of simple lotteries on X, is a convex set (i.e. for any p, q Î D (X), a p + (1-a )q Î D (X) for all a Î (0, 1)).

In the von Neumann-Morgenstern hypothesis, probabilities are assumed to be "objective" or exogenously given by "Nature" and thus cannot be influenced by the agent. However, the problem of an agent under uncertainty is to choose among lotteries, and thus find the "best" lottery in D (X). One of von Neumann and Morgenstern's major contributions to economics more generally was to show that if an agent has preferences defined over lotteries, then there is a utility function U: D (X) ® R that assigns a utility to every lottery p Î D (X) that represents these preferences.

Of course, if lotteries are merely distributions, it might not seem to make sense that a person would "prefer" a particular distribution to another on its own. If we follow Bernoulli's construction, we get a sense that what people really get utility from is the outcome or consequence, x Î X. We do not eat "probabilities", after all, we eat apples! Yet what von Neumann and Morgenstern suggest is precisely the opposite: people have utility from lotteries and not apples! In other words, people's preferences are formed over lotteries and from these preferences over lotteries, combined with objective probabilities, we can deduce what the underlying preferences on outcomes might be. Thus, in von Neumann-Morgenstern's theory, unlike Bernoulli's, preferences over lotteries logically precede preferences over outcomes.

How can this bizarre argument be justified? It turns out to be rather simple actually, if we think about it carefully. Consider a situation with two outcomes, either $10 or $0. Obviously, people prefer $10 to $0. Now, consider two lotteries: in lottery A, you receive $10 with 90% probability and $0 with 10% probability; in lottery B, you receive $10 with 40% probability and $0 with 60% probability. Obviously, the first lottery A is better than lottery B, thus we say that over the set of outcomes X = ($10, 0), the distribution p = (90%, 10%) is preferred to distribution q = (40%, 60%). What if the two lotteries are not over exactly the same outcomes? Well, we make them so by assigning probability 0 to those outcomes which are not listed in that lottery. For instance, in Figure 1, lotteries p and q have different outcomes. However, letting the full set of outcomes be (0, 1, 2, 3), then the distribution implied by lottery p is (0.5, 0.3, 0.2, 0) whereas the distribution implied by lottery q is (0, 0, 0.6, 0.4). Thus our preference between lotteries with different outcomes can be restated in terms of preferences between probability distributions over the same set of outcomes by adjusting the set of outcomes accordingly.

But is this not arguing precisely what Bernoulli was saying, namely, that the "real" preferences are over outcomes and not lotteries? Yes and no. Yes, in the sense that the only reason we prefer a lottery over another is due to the implied underlying outcomes. No, in the sense that preferences are not defined over these outcomes but only defined over lotteries. In other words, von Neumann and Morgenstern's great insight was to avoid defining preferences over outcomes and capturing everything in terms of preferences over lotteries. The essence of von Neumann and Morgenstern's expected utility hypothesis, then, was to confine themselves to preferences over distributions and then from that, deduce the implied preferences over the underlying outcomes.

We shall proceed through von Neumann-Morgenstern's (1944) axiomatization via the following manner: (i) we first define and axiomatize a preference relation ³ _h over simple lotteries, D (X); (ii) we then use this preference relation to construct a utility function on simple lotteries, U: D (X) ® R; (iii) we then prove that this utility function U has an "expected utility" structure, i.e. there is an underlying utility on outcomes u: X ® R that yields U(p) = å p(x)u(x). Later on we shall extend this theorem to more general lotteries.

We should note here that up to step (iii), we can treat D (X) is merely a convex subset of a linear space, and thus omit all discussion of "lotteries" or underlying outcome spaces X, etc. The first two parts of the von Neumann-Morgenstern theorem, thus, apply quite generally. Nonetheless, for the sake of intuition, we shall not change the notation and leave D (X) as is.

(ii) Axioms of Preference

Let ³ _h be a binary relation over D (X), i.e. ³ _h Ì D (X) ´ D (X). Hence, we can write (p, q) Î ³ _h, or p ³ _h q to indicate that lottery p is "preferred to or equivalent to" lottery q. Naturally, Ø (p ³ _hq) = p <_h q, i.e. if p is not preferred to or equivalent to q, then we say q is strictly preferred to p. Of course, p ³ _h q and q ³ _h p implies p ~_hq, i.e. p is equivalent to q. We now state the four axioms for these preferences:

(A.1) ³ _h is complete, i.e. either p ³ _hq or q ³ _h q for all p, q Î D (X).

(A.2) ³ _h is transitive, i.e. if p ³ _h q and q ³ _h r then p ³ _h r for all p, q, r Î D (X)

(A.3) Archimedean Axiom: if p, q, r Î D (X) such that p >_h q >_h r, then there is an a , b Î (0, 1) such that a p + (1-a )r >_h q and q >_h b p + (1-b )r.

(A.4) Independence Axiom: for all p, q, r Î D (X) and any a Î [0, 1], then p ³ _h q if and only if a p + (1-a )r ³ _h a q + (1-a )r.

The first two axioms (A.1) and (A.2) should be familiar from conventional theory. Together, (A.1) and (A.2) are sometimes referred to as the "weak order" axioms. The Archimedean Axiom (A.3) works like a continuity axiom on preferences. It effectively states that given any three lotteries strictly preferred to each other, p >_h q >_h r, we can combine the most and least preferred lottery (p and r) via an a Î (0, 1) such that the compound of p and r is strictly preferred to the middling lottery q and we can combine p and r via a b Î (0, 1) so that the middling lottery q is strictly preferred to the compound of p and r. Notice that one really needs D (X) to be a linear, convex structure to have (A.3).

The Independence Axiom (A.4), as we shall see later, is a little bit more troublesome. It effectively claims that the preference between p and q is unaffected if they are both combined in the same way with a third lottery r. One can envisage this as a choice between a pair of two-stage lotteries. In this case, a p + (1-a )r is a two stage lottery which yields either lottery p with probability a and lottery r with probability (1-a ) in the first stage. Using the same interpretation for a q + (1-a )r, then since both mixtures lead to r with the same probability (1-a ) in the first stage and since one is equally well-off if this case occurs, then preferences between the two-stage lotteries ought to depend entirely on one's preferences between the alternative lotteries in the second-stage, p and q.

We should perhaps note, at this point, that these axioms, as stated, are derived from N.E. Jensen (1967) and are not exactly the original von Neumann-Morgenstern (1944) axioms (in particular, they did not have an explicit independence axiom). There are, of course, alternative sets of axioms which we can use for the main theorem. One famous axiomatization was provided by I.N. Herstein and J. Milnor (1953) which is a bit more general. See Fishburn (1970, 1982) for more details.

(iii) The von Neumann-Morgenstern Utility Function

We now want to proceed to the next step and derive the von Neumann-Morgenstern utility function, U: D (X) ® R to represent preferences over lotteries, where by representation we mean that for any p, q Î D (X), p ³ _h q if and only if U(p) ³ U(q). Thus if lottery p is preferred or equivalent to q, then the utility from lottery p is greater than utility from lottery q and vice-versa. Let us then turn to the main existence theorem:

Theorem: (von Neumann and Morgenstern) Let D (X) be a convex subset of a linear space. Let ³ _h be a binary relation on D (X). Then ³ _h satisfies (A.1), (A.2), (A.3) and (A.4) if and only if there is a real-valued function U:D (X) ® R such that:

(a) U represents ³ _h (i.e. " p, q Î D (X), p ³ _h q Û U(p) ³ U(q))

(b) U is affine (i.e. " p, q Î D (X), U(a p + (1-a )q) = a U(p) + (1-a )U(q) for any a Î (0, 1))

Moreover, if V:D (X) ® R also represents preferences, then there is an b, c Î R (where b > 0) such that V = bU + c, i.e. U is unique up to a positive linear transformation.

Proof: This is quite a long proof. We have an "iff" statement, so we must prove it both ways (i.e. that the axioms on preferences imply utility representation and affinity and that representation and affinity of utility implies the axioms). Let us start with the former.

Part I: (Axioms Þ Representation and Affinity)

We proceed in three steps: firstly, we prove two lemmas on preferences; secondly, we prove that the theorem holds on a closed preference interval; finally, we extend this result to the entire D (X). So let us begin with the two lemmas:

Lemma: (L.1 - Mixture Monotonicity): For any p, q Î D (X), and a , b Î (0, 1) where p >_h q and a £ b , then b p + (1-b )q >_h a p + (1-a )q.

Proof: (i) Suppose a = 0. Note that p = b p + (1-b )p and q = b q + (1-b )q obviously. Now, by (A.4), p >_h q Þ b p + (1-b )p >_h b p + (1-b )q as we have b p on both sides. But, by (A.4) again, b p + (1-b )q >_h b q + (1-b )q as we now have (1-b )q in common on both sides. But note that this implies b p + (1-b )q >_h q = a p + (1-a )q when a = 0, and we are done. (ii) Suppose a > 0. Now, recall from (i) that b p + (1-b )q >_h q. Thus, defining r = b p + (1-b )q, then r >_hq. Now, define g = a /b . Then g r + (1-g )r >_h q. But, as r >_h q, then by (A.4), g r + (1-g )r >_h g r + (1-g )q where g r is in common on both sides. Or, by definition of r, g r + (1-g )r >_h g (b p + (1-b )q)) + (1-g )q. Then, rearranging, g r + (1-g )r >_h g b p + (1- b g )q. But, by definition of g , g b = a , thus g r + (1-g )r >_h a p + (1- a )q. But as r = g r + (1-g )r = b p + (1-b )q by definition, then b p + (1-b )q >_h a p + (1- a )q. Q.E.D.

This makes intuitive sense: if lottery p is preferred to lottery q, then if we construct two compound lotteries with different weights, then we prefer the compound lottery in which lottery p is given the relatively greater weight.

Lemma: (L.2 - Unique Solvability): If p, q, r Î D (X) and p ³ _h q ³ _h r and p >_h r, then there is a unique a * Î [0, 1] such that q ~_h a *p + (1-a *)r.

Proof: (i) If p ~_h q, then a * = 1 and we are done. (ii) if r ~_h q, then a * = 0, and we are done. (iii) if p >_h q >_h r, then define the set Q³ = {a Î (0, 1) ½ q ³ _h a p + (1-a )r}. This set is non-empty because a = 0 is an element of it and it is bounded above by a £ 1. Thus, there is a supremum (least upper bound) of Q³ . Let a * = sup Q³ . Then we can consider two violating cases. Case 1: p >_h q >_{h a}*p + (1-a *)r. Then, by (A.3), there is a b Î [0, 1] such that q >_h b (a *p + (1-a *)r) + (1-b )p. Or, rearranging, q >_h [1 - b (1-a *)]p + b (1-a *)r. But, as b (1-a *) < (1-a *), then (1-b (1-a *)) > a *. But then a * is not a supremum of Q³ . A contradiction. Case 2: a *p + (1-a *)r >_h q. We can proceed the same way, i.e. by (A.3) we can find some g Î [0, 1] such that [1 - g (1-a *)]p + g (1-a *)r >_h q, which implies that a * is not a supremum - thus a contradiction. Consequently, it must be that neither Case 1 or Case 2 can apply, thus a *p + (1-a *)r ~_h q. Finally, by mixture monotonicity (L.1), a * is unique. Q.E.D.

This also makes intuitive sense. Given a lottery q, we can construct a compound lottery which yields the same utility as q by appropriately combining any lottery p which is preferred to q with any lottery r to which q is preferred.

Now, let us return to the main proof. Consider first the following case: suppose that, for any p, q Î D (X), we have p ~_h q (all lotteries are equivalent) In this case, U is constant, i.e. U(p) = c for all p Î D (X), which is of course real-valued and affine. Thus, this trivial case is easily disposed with. But consider now the following. Suppose s, r Î D (X) where s >_h r. Define RS = {p Î D (X) ½ s ³ _h p ³ _h r}, which is a closed and convex subset of D (X) (by (A.4)). For each p Î RS, define ¦ (p) as a number such that p ~_h ¦ (p)s + (1-¦ (p))r. By unique solvability (L.2), such a ¦ (p) exists and is unique. We now make two claims:

Proposition (Representation): ¦ (.) represents preferences on RS, i.e. for all p, q Î RS, ¦ (p) ³ ¦ (q) if and only if ¦ (p)s + (1-¦ (p))r ³ _h ¦ (q)s + (1-¦ (q))r.

To prove this, consider that by mixture monotonicity (L.1), s >_h r and ¦ (p) ³ ¦ (q) implies that ¦ (p)s + (1-¦ (p))r >_h ¦ (q)s + (1-¦ (q))r. But, by the definition of ¦ (p) and ¦ (q), (i.e. p ~_h ¦ (p)s + (1-¦ (p))r and q ~_h ¦ (q)s + (1-¦ (q))r), we can note immediately by transitivity (A.2) that this implies that p >_h q. The same argument works in reverse. Thus, ¦ (p) ³ ¦ (q) Û p >_h q, i.e. ¦ (.) represents preferences ³ _h on RS, and we are done. Q.E.D.

Proposition (Affinity): ¦ (.) is affine for all p, q Î RS, i.e. ¦ (a p + (1-a )q) = a ¦ (p) + (1-a )¦ (q).

To prove this, consider any p, q Î RS and define p¢ = a p + (1-a )q. As RS is convex, then p¢ Î RS for any a Î (0, 1). Thus, by unique solvability (L.2) there is a real number ¦ (p¢ ) such that p¢ ~_h ¦ (p¢ )s + (1-¦ (p¢ ))r. But as p¢ = a p + (1-a )q and p ~_h ¦ (p)s + (1-¦ (p))r by (L.2), then p¢ ~_h a [¦ (p)s + (1-¦ (p))r] + (1-a )q by the independence axiom (A.4). Doing the same for q ~_h ¦ (q)s + ((1-¦ (q))r, then we obtain p¢ ~_h a [¦ (p)s + (1-¦ (p))r] + (1-a )[¦ (q)s + ((1-¦ (q))r]. Rearranging a bit, we obtain that p¢ ~_h [a ¦ (p) + (1-a )¦ (q)]s + [a (1-¦ (p)) + (1-a )(1-¦ (q))]r, thus p¢ is equivalent to another convex combination of s and r. But, by unique solvability (L.2), there is only one a * such that p¢ ~_h a *s + (1-a *)r. Thus, it must be that a * = ¦ (p¢ ) = [a ¦ (p) + (1-a )¦ (q)], or, by the definition of p¢ , ¦ (a p + (1-a )q) = a ¦ (p) + (1-a )¦ (q). This is the definition of affinity. Q.E.D.

Let us now enter on our third stage and extend the representation and affinity results from RS to the entire set. To do so, we first need to prove the following claim:

Proposition: (Order-Preservation): If ¦ represents ³ _h and is affine, then g = a + b¦ where b > 0 also (i) represents ³ _h and (ii) is affine.

The proof is simple. (i) For any p, q Î D (X), then p ³ _h q Þ ¦ (p) ³ ¦ (q) by representation of ¦ . Thus, if b > 0, then this implies a + b¦ (p) ³ a + b¦ (q), thus g(p) ³ g(q) by definition. (ii) As ¦ is affine, then ¦ (a p + (1-a )q) = a ¦ (p) + (1-a )¦ (q). Now by definition, g(a p + (1-a )q) = a + b¦ (a p + (1-a )q)) = a + b[a ¦ (p) + (1-a )¦ (q)] = a a + (1-a )a + ba ¦ (p) + b(1-a )¦ (q) = a [a + b¦ (p)] + (1-a )[a + b¦ (q)] = a g(p) + (1-a )g(q). Q.E.D..

Let us return to the extension of RS. By the definition of ¦ , s ~_h ¦ (s)s + (1-¦ (s))r and r ~_h ¦ (r)s + (1-¦ (r))r, thus ¦ (s) = 1 and ¦ (r) = 0. Now, Define RS₁ = {p Î D (X) ½ s₁ ³ _h p ³ _h r₁} where s₁ >_h s and r >_h r₁, so obviously RS Ì RS₁. Now, let us define ¦ ₁ over RS₁ a manner analogous to before, so that for any p Î RS₁, then p ~_h ¦ ₁(p)s₁ + (1-¦ ₁(p))r₁ and ¦ ₁ is affine. Let us now find a₁ and a b₁ > 0 and thus a function g₁ = a₁ + b_{1¦ 1} such that g₁(s) = a₁ + b_{1¦ 1} (s) = 1 and g₁(r) = a₁ + b_{1¦ 1}(r) = 0. If we think of D (X) as the real line and preferences increasing along it, then ¦ ₁ and the adjustment to g₁ can be represented as in Figure 2.

Now, define RS₂ = {p Î D (X) | s₂ ³ _h p ³ _h r₂} where s₂ >_h s and r >_h r₂, so we again obtain RS Ì RS₂. Defining ¦ ₂ the same way as before on RS₂, we can thus find now find a₂ and b₂ > 0 such that g₂(s) = a₂ + b_{2¦ 2}(s) = 1 and g₂(r) = a₂ + b_{2¦ 2}(r) = 0. Thus, g₁(r) = g₂(r) = 0 and g₁(s) = g₂(s) = 1. This is illustrated heuristically in Figure 2.

expect2.gif (4515 bytes)

Figure 2 - Illustration of von Neumann-Morgenstern Proof

As Figure 2 implies, we now show that for any p Î RS₁ Ç RS₂ Þ g₁(p) = g₂(p). As p is in the intersection, then either p is inside, above or below RS. In other words, one of the following three cases will be true:

(i) s ³ _h p ³ _h r: Þ by unique solvability (L.2), $ a such that p ~_h a s + (1-a )r

(ii) p >_h s >_h r: Þ by unique solvability (L.2), $ a such that s ~_h a p + (1-a )r

(iii) s >_h r >_h p: Þ by unique solvability (L.2), $ a such that r ~_h a s + (1-a )p

Consider now the consequences of the different cases: Case (i) implies that g₁(p) = a g₁(s) + (1-a )g₁(r) = a by construction of g₁. But it is also true that g₂(p) = a g₂(s) + (1-a )g₂(r) = a again by construction, thus g₁(p) = g₂(p) = a . Case (ii) implies that 1 = g₁(s) = a g₁(p) + (1-a )g₁(r) = a g₁(p), so g₁(p) = 1/a . But similarly, 1 = g₂(s) = a g₂(p) + (1-a )g₂(r) = a g₂(p), so g₂(p) = 1/a . Thus, once again g₁(p) = g₂(p) = 1/a . Finally, Case (iii) implies that 0 = g₁(r) = a g₁(s) + (1-a )g₁(p) = a + (1-a )g₁(p), so g₁(p) = a /(a -1). Similarly, 0 = g₂(r) = a g₂(s) + (1-a )g₂(p) = a + (1-a )g₂(p), so g₂(p) = a /(a -1). Thus, again g₁(p) = g₂(p) = a /(a -1).

Thus, for every p Î RS₁ Ç RS₂, g₁(p) = g₂(p). Consider now an increasing sequence RS Ì RS₁ Ì RS₂ Ì RS₃ Ì ...Ì D (X). At each step, we can define g_i that represents preferences over RS_i, but g_i(p) = g_i-1(p) = g_i-2(p) = ... for all p Î RS_i-1. Thus, let us define this common value g_i(p) = g_i-1(p) = U(p). We can thereby construct a U that represents preferences over the entire set D (X). Thus the first important part of the proof, the derivation of a utility function U:D (X) ® R from axioms (A.1)-(A.4) is finished.

Q.E.D. for Part I.

Part II: (Representation and Affinity Þ Axioms).

We now turn to the converse. This is rather more simple. If U: D (X) ® R is affine and represents preferences, then we want so show that the axioms (A.1)-(A.4) hold. Completeness is clear enough: as U is defined over D (X), then for any pair p, q Î D (X) then either U(p) ³ U(q) or U(p) £ U(q) or both. By representation, this implies (A.1). Similarly, for any triple, p, q, r Î D (X), by representation, U(p) ³ U(q) and U(q) ³ U(r) implies p ³ _h q and q ³ _h r. It then follows from the properties of the real number line that U(p) ³ U(r), thus p ³ _h r, so transitivity (A.2) is done. The Archimedean axiom (A.3) is just as simple. By representation, U(p) ³ U(q) ³ U(r) implies p ³ _hq ³ _h r. We know by the properties of the real line (called the Archimedian axiom in fact), there is an a Î (0, 1) such that a U(p) + (1-a )U(r) ³ U(q). As, by affinity, a U(p) + (1-a )U(q) = U(a p + (1-a )q), then U(a p + (1-a )r) ³ U(q) so, by representation, a p + (1-a )r ³ _h q. The same reasoning applies when choosing a b so b U(p) + (1-b )U(r) £ U(q), etc., thus we are done. Finally, for the independence axiom (A.4), note that if U(p) ³ U(q), then p ³ _h q. Notice also that this implies that for a Î (0, 1), that a U(p) ³ a U(q). Thus, adding (1-a )U(r) from both sides a U(p) + (1-a )U(r) ³ a U(q) + (1-a )U(r). By affinity, as U(a p + (1-a )r) = a U(p) + (1-a )U(r) and U(a q + (1-a )r) = a U(q) + (1-a )U(r), thus U(a p + (1-a )r) ³ U(a q + (1-a )r) so, by representation a p + (1-a )r ³ _h a q + (1-a )r. The reverse also applies by the same reasoning. Thus, the Archimedean axiom is finished.

Q.E.D. for Part II

Part III: (Uniqueness)

We now wish to turn to the "moreover" remark and prove that if both U: D (X) ® R and V: D (X) ® R represent preferences, then there is a c and b > 0 such that V = bU + c. Let us get rid of the trivial case first: if for all p, q Î D (X), p ~_h q, then U(p) = k and V(p) = k¢ , thus V(p) = U(p) - (k-k¢ ), so c = (k - k¢ ) and b = 1. Now, suppose there is s, p, r Î D (X) such that s >_h p >_h r. Then define the following: H^U(p) = [U(p) - U(r)]/[U(s) - U(r)] and H^V(p) = [V(p) - V(r)]/[V(s) - V(r)]. By unique solvability (L.2), there is an a Î (0, 1) such that p ~_h a s + (1-a )r. Thus H^U(p) = [U(a s + (1-a )r) - U(r)]/[U(s) - U(r)] or, by affinity:

H^U(p) = [a U(s) + (1-a )U(r) - U(r)]/[U(s) - U(r)] = a

Similarly, as H^V(p) = [V(a s + (1-a )r) - V(r)]/[V(s) - V(r)], then H^V(p) = a . This implies, then, that H^U(p) = H^V(p). Thus,

[U(p) - U(r)]/[U(s) - U(r)] = [V(p) - V(r)]/[V(s) - V(r)]

cross-multiplying:

[U(p) - U(r)][V(s) - V(r)] = [U(s) - U(r)][V(p) - V(r)]

or:

U(p)[V(s) - V(r)] - U(r)[V(s) - V(r)] = [U(s) - U(r)]V(p) - [U(s) - U(r)]V(r)

or simply:

V(p) = U(p)[V(s) - V(r)]/[U(s) - U(r)] - U(r)[V(s) - V(r)]/ [U(s) - U(r)] + V(r)

thus letting b = [V(s) - V(r)]/[U(s) - U(r)] and c = - U(r)[V(s) - V(r)]/ [U(s) - U(r)] + V(r), then:

V(p) = bU(p) + c

which is the form we wanted.

Q.E.D. for Part III

And finally, having proved (I) axioms Þ utility representation and affinity; (II) utility representation and affinity Þ axioms and (III) uniqueness of the utility function up to a positive linear transformation, we have now at long last finished the proof of the von Neumann-Morgenstern theorem.

Grand Q.E.D. for von Neumann-Morgenstern Theorem.§

(iv) The Expected Utility Representation

We have now obtained the utility function U:D (X) ® R on the basis of the four axioms set forth earlier. However, we have not finished in proving the expected utility hypothesis, namely, that a utility function U:D (X) ® R has a representation:

U(p) = å _{xÎ
Supp(p)} p(x)u(x)

where u: X ® R is a elementary utility function on the underlying outcomes X. Note that as D (X) is the set of simple probability distributions on X, then if p Î D (X), then p has a finite support denoted Supp(p) Ì X. D (X), of course, is a convex set. Finally, we should note that by convexity, for any p, q Î D (X), a p + (1-a )q Î D (X) for any a Î (0, 1) and that, if p and q are simple probability distributions, then (a p + (1-a )q)(x) = a p(x) + (1-a )q(x) for any x Î X.

We now state the expected utility representation as a corollary to the earlier von Neuman-Morgenstern theorem:

Corollary: (Expected Utility Representation) Let D (X) be the set of all simple probability distributions on X. Let ³ _h be a binary relation on D (X). Then ³ _h satisfies (A.1)-(A.4) if and only if there is a function u: X ® R such that for every p, q Î D (X):

p ³ _h q if and only if å _{xÎ Supp(p)} p(x)u(x) ³ å _{xÎ
Supp(q)} q(x)u(x).

Moreover, v: X ® R represents ³ _h in the above sense if and only if there exist c and b > 0 such that v = bu + c.

Proof: From the von Neumann-Morgenstern theorem, there is a U: D (X) ® R which represents preferences ³ _h on D (X) and is affine. Now, define the function d _x: X ® {0, 1} as d _x(y) = 1 if y = x and d _x(y) = 0 otherwise. This implies that for every x Î X, d _x is a degenerate distribution, thus d _x Î D (X). Let U(d _x) = u(x). Now consider a distribution p = [p(x), p(y)]. Obviously, we can write this out as a convex combination of degenerate distributions d _x and d _y, i.e. p = p(x)d _x + p(y)d _y. Thus, U(p) = U(p(x)d _x + p(y)d _y) = p(x)U(d _x) + p(y)U(d _y) = p(x)u(x) + p(y)u(y) by affinity and our definition of u(x) and u(y). Thus, more generally, any distribution p with finite support can be written out as a convex combination of degenerate distributions, p = å _{xÎ Supp(p)}p(x)d _x, and thus we obtain U(p) = U(å _xÎ
Supp(p)p(x)d (x)) = å _{xÎ Supp(p)}p(x)u(x) which is the expected utility representation of U(p). Thus as p ³ _h q iff U(p) ³ U(q) by the von Neumann-Morgenstern theorem, then equivalently, p ³ _h q iff å _{xÎ Supp(p)}p(x)u(x) ³ å _{xÎ Supp(q)}q(x)u(x). The moreover remark is simpler and thus we leave it as an exercise.§

So far, we developed the von Neumann-Morgenstern expected utility hypothesis within the context of simple probabilities, i.e. probability distributions which take positive values only for a finite number of outcomes. However, we would like to extend the hypothesis to continuous spaces (i.e. infinite support) and more complicated measures. Specifically, we would like it that for any probability measure p over X:

U(p) = ò _X u(x)dp(x)

as the general analogue of the expected utility decomposition for non-simple probability measures. However, things are not that simple: specifically, the Archimedean axiom is a source of failure in obtaining such a representation. As a result, it is necessary to strengthen and/or supplement it. For details, consult Fishburn (1970: Ch. 10).

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------