-
Notifications
You must be signed in to change notification settings - Fork 1
/
old_ignore
1668 lines (1662 loc) · 97.7 KB
/
old_ignore
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<h1 id="utility-and-utility-functions">6.1 Utility and Utility
Functions</h1>
<h2 id="fundamentals">6.1.1 Fundamentals</h2>
<p><strong>A utility function is a mathematical representation of
preferences.</strong> A utility function, <span
class="math inline">\(u\)</span>, takes inputs like goods or situations
and outputs a value called <em>utility</em>. Utility is a measure of how
much an agent prefers goods and situations relative to other goods and
situations.<br />
Suppose we offer Alice some apples, bananas, and cherries. She might
have the following utility function for fruits:<br />
<span class="math display">\[u(\text{fruits}) = 12a+10b+2c,\]</span>
where <span class="math inline">\(a\)</span> is the number of apples,
<span class="math inline">\(b\)</span> is the number of bananas, and
<span class="math inline">\(c\)</span> is the number of cherries that
she consumes. Suppose Alice consumes no apples, one banana, and five
cherries. The amount of utility she gains from her consumption is
calculated as <span class="math display">\[u(0 \: \text{apples}, 1 \:
\text{banana}, 5 \: \text{cherries}) =(12 \cdot 0)+(10 \cdot 1)+(2 \cdot
5) = 20.\]</span> The output of this function is read as “20 units of
utility” for short. These units are arbitrary and reflect the level of
Alice’s utility. We can use utility functions to quantitatively
represent preferences over different combinations of goods and
situations. For example, we can rank Alice’s preferences over fruits as
<span class="math display">\[\text{apple}\succ \text{banana}\succ
\text{cherry},\]</span> where <span class="math inline">\(\succ\)</span>
represents <em>preference</em>, such that what comes before the symbol
is preferred to what comes after it. This follows from the fact that
Alice gains 12 units from an apple, 10 units from a banana, and 2 units
from a cherry. The advantage of having a utility function as opposed to
just an explicit ranking of goods is that we can directly infer
information about more complex goods. For example, we know <span
class="math display">\[u(1 \text{ banana}, 5 \text{ cherries}) =
20>u(1 \text{ apple}) = 12>u(1 \text{ banana}) = 10.\]</span>
<strong>Utility functions, if accurate, reveal what options agents would
prefer and choose.</strong> If told to choose only one of the three
fruits, Alice would pick the apple, since it gives her the most utility.
Her preference follows from <em>rational choice theory</em>, which
proposes that individuals, acting in their own self-interest, make
decisions that maximize their self-interest. This view is only an
approximation to human behavior. In this chapter we will discuss how
rational choice theory is an imperfect but useful way to model choices.
We will also refer to individuals who behave in coherent ways that help
maximize utility as <em>agents</em>.</p>
<p><strong>In this chapter, we explore various concepts about utility
functions that are useful for thinking about AIs, humans, and
organizations like companies and states.</strong> First, we introduce
<em>Bernoulli utility functions</em>, which are conventional utility
functions that define preferences over certain outcomes like the example
above. We later discuss <em>von Neumann-Morgenstern utility
functions</em>, which extend preferences to probabilistic situations, in
which we cannot be sure which outcome will occur. <em>Expected utility
theory</em> suggests that rationality is the ability to maximize
preferences. We consider the relevance of utility functions to <em>AI
corrigibility</em>—the property of being receptive to corrections—and
see how this might be a source of tail risk. Much of this chapter
focuses on how utility functions help understand and model agents’
<em>attitudes toward risk</em>. Finally, we examine <em>non-expected
utility theories</em>, which seek to rectify some shortcomings of
conventional expected utility theory when modeling real-life
behavior.</p>
<h2 id="motivations-for-learning-about-utility-functions">6.1.2
Motivations for Learning About Utility Functions</h2>
<p><strong>Utility functions are a central concept in economics and
decision theory.</strong> Utility functions can be applied to a wide
range of problems and agents, from rats finding cheese in a maze to
humans making investment decisions to countries stockpiling nuclear
weapons. Conventional economic theory assumes that people are rational
and well-informed, and make decisions that maximize their self-interest,
as represented by their utility function. The view that individuals will
choose options that are likely to maximize their utility functions,
referred to as <em>expected utility theory</em>, has been the major
paradigm in real-world decision making since the Second World War <span
class="citation" data-cites="schoemaker1982expected"></span>. It is
useful for modeling, predicting, and encouraging desired behavior in a
wide range of situations. However, as we will discuss, this view does
not perfectly capture reality, because individuals can often be
irrational, lack relevant knowledge, and frequently make mistakes.</p>
<p><strong>The objective of maximizing a utility function can cause
intelligence.</strong> The <em>reward hypothesis</em> suggests that the
objective of maximizing some reward is sufficient to drive behavior that
exhibits intelligent traits like learning, knowledge, perception, social
awareness, language, generalization, and more <span class="citation"
data-cites="silver2021reward"></span>. The reward hypothesis implies
that artificial agents in rich environments with simple rewards could
develop sophisticated general intelligence. For example, an artificial
agent deployed with the goal of maximizing the number of successful food
deliveries may develop relevant geographical knowledge, an understanding
of how to move between destinations efficiently, and the ability to
perceive potential dangers. Therefore, the construction and properties
of the utility function that agents maximize are central to guiding
intelligent behavior.</p>
<p><strong>Certain artificial agents may be approximated as expected
utility maximizers.</strong> Some artificial intelligences are
agent-like. They are programmed to consider the potential outcomes of
different actions and to choose the option that is most likely to lead
to the optimal result. It is a reasonable approximation to say that many
artificial agents make choices that they predict will give them the
highest utility. For instance, in reinforcement learning (introduced in
the previous chapter), artificial agents explore their environment and
are rewarded for desirable behavior. These agents are explicitly
constructed to maximize reward functions, which strongly shape an
agent’s internal utility function. This view of AI has implications for
how we design and evaluate these systems—we need to ensure that their
value functions promote human values. Utility functions can help us
reason about the behavior of AIs, as well as the behavior of powerful
actors that direct AIs, such as corporations or governments.</p>
<p><strong>Utility functions are a key concept in AI safety.</strong>
Utility functions come up explicitly and implicitly at various times
throughout this book, and are useful for understanding the behavior of
reward-maximizing agents, as well as humans and organizations involved
in the AI ecosystem. They will also come up in our chapter on , when we
consider that some advanced AIs may have social welfare functions as
their utility function. In the chapter, we will continue our discussion
of rational agents that seek to maximize their own utility.</p>
<h1 id="properties-of-utility-functions">6.2 Properties of Utility
Functions</h1>
<p><strong>Overview.</strong> In this section, we will formalize our
understanding of utility functions. First, we will introduce
<em>Bernoulli utility functions</em>, which are simple utility functions
that allow an agent to select between different choices with known
outcomes. Then we will discuss <em>von Neumann-Morgenstern utility
functions</em>, which model how rational agents select between choices
with probabilistic outcomes based on the concept of <em>expected
utility</em>, to make these tools more generally applicable to the
choices under uncertainty. Finally, we will describe a solution to a
famous puzzle applying expected utility—the <em>St. Petersburg
Paradox</em>—to see why expected utility is a useful tool for decision
making.</p>
<p>Establishing these mathematical foundations will help us understand
how to apply utility functions to various actors and situations.</p>
<h2 id="bernoulli-utility-functions">6.2.1 Bernoulli Utility
Functions</h2>
<p><strong>Bernoulli utility functions represent an individual’s
preferences over potential outcomes.</strong> Suppose we give people the
choice between an apple, a banana, and a cherry. If we already know each
person’s utility function, we can deduce, predict, and compare their
preferences In the introduction, we met <strong></strong>Alice, whose
preferences are represented by the utility function over fruits:<br />
<span class="math display">\[u(f) = 12a+10b+2c.\]</span> This is a
Bernoulli utility function.</p>
<p><strong>Bernoulli utility functions can be used to convey the
strength of preferences across opportunities.</strong> In their most
basic form, Bernoulli utility functions express ordinal preferences by
ranking options in order of desirability. For more information, we can
consider cardinal representations of preferences. With cardinal utility
functions, numbers matter: while the units are still arbitrary, the
relative differences are informative.<br />
To illustrate the difference between ordinal and cardinal comparisons,
consider how we talk about temperature. When we want to precisely convey
information about temperature, we use a cardinal measure like Celsius or
Fahrenheit: “Today is five degrees warmer than yesterday.” We could have
also accurately, but less descriptively, used an ordinal descriptor:
“Today is warmer than yesterday.” Similarly, if we interpret Alice’s
utility function as cardinal, we can conclude that she feels more
strongly about the difference between a banana and a cherry (8 units of
utility) than she does about the difference between an apple and a
banana (2 units). We can gauge the relative strength of Alice’s
preferences from a utility function.</p>
<h2 id="von-neumann-morgenstern-utility-functions">6.2.2 Von
Neumann-Morgenstern Utility Functions</h2>
<p><strong>Von Neumann-Morgenstern utility functions help us understand
what people prefer when outcomes are uncertain.</strong> We do not yet
know how Alice values an uncertain situation, such as a coin flip. If
the coin lands on heads, Alice gets both a banana and an apple. But if
it lands on tails, she gets nothing. Now let’s say we give Alice a
choice between getting an apple, getting a banana, or flipping the coin.
Since we know her fruit Bernoulli utility function, we know her
preferences between apples and bananas, but we do not know how she
compares each fruit to the coin flip. We’d like to convert the possible
outcomes of the coin flip into a number that represents the utility of
each outcome, which can then be compared directly against the utility of
receiving the fruits with certainty. The von Neumann-Morgenstern (vNM)
utility functions help us do this <span class="citation"
data-cites="vonneumann1947theory"></span>. They are extensions of
Bernoulli utility functions, and work specifically for situations with
uncertainty, represented as <em>lotteries</em> (denoted <strong><span
class="math inline">\(L\)</span></strong>), like this coin flip. First,
we work through some definitions and assumptions that allow us to
construct utility functions over potential outcomes, and then we explore
the relation between von Neumann-Morgenstern utility functions and
expected utility.</p>
<p><strong>A lottery assigns a probability to each possible
outcome.</strong> Formally, a lottery <span
class="math inline">\(L\)</span> is any set of possible outcomes,
denoted <span class="math inline">\(o_{i}\)</span>, and their associated
probabilities, denoted <span class="math inline">\(p_{i}\)</span>.
Consider a simple lottery: a coin flip where Alice receives an apple on
heads, and a banana on tails. This lottery has possible outcomes <span
class="math inline">\(apple\)</span> and <span
class="math inline">\(banana\)</span>, each with probability <span
class="math inline">\(0.5\)</span>. If a different lottery offers a
cherry with certainty, it would have only the possible outcome <span
class="math inline">\(cherry\)</span> with probability <span
class="math inline">\(1\)</span>. Objective probabilities are used when
the probabilities are known, such as when calculating the probability of
winning in casino games like roulette. In other cases where objective
probabilities are not known, like predicting the outcome of an election,
an individual’s subjective best-guess could be used instead. So, both
uncertain and certain outcomes can be represented by lotteries.<br />
</p>
<div class="storybox">
<p><span>A Note on Expected Value vs. Expected Utility</span></p>
<p>An essential distinction in this chapter is that between expected
value and expected utility.</p>
<p><strong>Expected value is the average outcome of a random
event.</strong> While most lottery tickets have negative expected value,
in rare circumstances they have positive expected value. Suppose a
lottery has a jackpot of 1 billion dollars. Let the probability of
winning the jackpot be 1 in 300 million, and let the price of a lottery
ticket be $2. Then the expected value is calculated by adding together
each possible outcome by its probability of occurrence. The two outcomes
are (1) that we win a billion dollars, minus the cost of $2 to play the
lottery, which happens with probability one in 300 million, and (2) that
we are $2 in debt. We can calculate the expected value with the formula:
<span class="math display">\[\frac{1}{300 \text{ million}} \cdot
\left(\$ 1 \text{ billion}-\$ 2\right)+\left(1-\frac{1}{300 \text{
million}}\right) \cdot \left(-\$ 2\right)\approx \$ 1.33.\]</span> The
expected value of the lottery ticket is positive, meaning that, on
average, buying the lottery ticket would result in us receiving <span
class="math inline">\(\$\)</span>1.33.<br />
Generally, we can calculate expected value by multiplying each outcome
value, <span class="math inline">\(oi\)</span>, with its probability
<span class="math inline">\(p,\)</span> and sum everything up over all
<span class="math inline">\(n\)</span> possibilities: <span
class="math display">\[E\left[L\right] = o_{1} \cdot p_{1}+o_{2} \cdot
p_{2}+\cdot s +o_{n} \cdotp_{n}.\]</span> <strong>Expected utility is
the average utility of a random event.</strong> Although the lottery has
positive expected value, buying a lottery ticket may still not increase
its expected utility. Expected utility is distinct from expected value:
instead of summing over the monetary outcomes (weighing each outcome by
its probability), we sum over the utility the agent receives from each
outcome (weighing each outcome by its probability).<br />
If the agent’s utility function indicates that one “util” is just as
valuable as one dollar, that is <span class="math inline">\(u\left(\$
x\right) = x\)</span>, then expected utility and expected value would be
the same. But suppose the agent’s utility function were a different
function, such as <span class="math inline">\(u\left(\$ x\right) =
x^{1/3}\)</span>. This utility function means that the agent values each
additional dollar less and less as they have more and more money.<br />
For example, if an agent with this utility function already has <span
class="math inline">\(\$\)</span>500, an extra dollar would increase
their utility by 0.05, but if they already have <span
class="math inline">\(\$\)</span>200,000, an extra dollar would increase
their utility by only 0.0001. With this utility function, the expected
utility of this lottery example is negative: <span
class="math display">\[\frac{1}{300 \text{ million}} \cdot \left(1
\text{ billion}-2\right)^{1/3}+\left(1-\frac{1}{300 \text{
million}}\right) \cdot \left(-2\right)^{1/3}\approx -1.26.\]</span>
Consequently, expected value can be positive while expected utility can
be negative, so the two concepts are distinct.<br />
Generally, expected utility is calculated as: <span
class="math display">\[E[u(L)] = u(o_{1}) \cdot p_{1}+u(o_{2}) \cdot
p_{2}+\cdots +u(o_{n}) \cdot p_{n}.\]</span></p>
</div>
<p><strong>According to expected utility theory, rational agents make
decisions that maximize expected utility.</strong> Von Neumann and
Morgenstern proposed a set of basic propositions called <em>axioms</em>
that define an agent with rational preferences. When an agent satisfies
these axioms, their preferences can be represented by a von
Neumann-Morgenstern utility function, which is equivalent to using
expected utility to make decisions. While expected utility theory is
often used to model human behavior, it is important to note that it is
an imperfect approximation. In the final section of this chapter, we
present some criticisms of expected utility theory and the vNM
rationality axioms as they apply to humans. However, artificial agents
might be designed along these lines, resulting in an explicit expected
utility maximizer, or something approximating an expected utility
maximizer. The von Neumann-Morgenstern rationality axioms are listed
below with mathematically precise notation for sake of completeness, but
a technical understanding of them is not necessary to proceed with the
chapter.</p>
<p><strong>Von Neumann-Morgenstern Rationality Axioms.</strong> When the
following axioms are satisfied, we can assume a utility function of an
expected utility form, where agents prefer lotteries that have higher
expected utility <span class="citation"
data-cites="vonneumann1947theory"></span>. <span
class="math inline">\(L\)</span> is a lottery. <span
class="math inline">\(L_{A}\succcurlyeq L_{B}\)</span> means that the
agent prefers lottery A to lottery B, whereas <span
class="math inline">\(L_{A}\sim L_{B}\)</span> means that the agent is
indifferent between lottery A and lottery B. These axioms and
conclusions that can be derived from them are contentious, as we will
see later on in this chapter. There are six such axioms, that we can
split into two groups.<br />
The first two axioms may seem obvious, but are nonetheless
essential:</p>
<ol>
<li><p><span>Monotonicity</span>: Agents prefer higher probabilities of
preferred outcomes.</p></li>
<li><p><span>Decomposability</span>: The agent is indifferent between
two lotteries that share the same probabilities for all the same
outcomes, even if they are described differently.</p></li>
</ol>
<p><strong></strong> The remaining four axioms are:</p>
<ol>
<li><p>Completeness: The agent can rank their preferences over all
lotteries. For any two lotteries, it must be that <span
class="math inline">\(L_{A}\succcurlyeq L_{B}\)</span> or <span
class="math inline">\(L_{B}\succcurlyeq L_{A}\)</span>.</p></li>
<li><p>Transitivity: If <span class="math inline">\(L_{A}\succcurlyeq
L_{B}\)</span> and <span class="math inline">\(L_{B}\succcurlyeq
L_{C}\)</span>, then <span class="math inline">\(L_{A}\succcurlyeq
L_{C}\)</span>.</p></li>
<li><p>Continuity: For any three lotteries, <span
class="math inline">\(L_{A}\succcurlyeq L_{B}\succcurlyeq
L_{C}\)</span>, there exists a probability <span
class="math inline">\(p\in\left[0,1\right]\)</span> such that <span
class="math inline">\(pL_{A}+\left(1-p\right)L_{C}\sim L_{B}\)</span>.
This means that the agent is indifferent between <span
class="math inline">\(L_{B}\)</span> and some combination of the worse
lottery <span class="math inline">\(L_{C}\)</span> and the better
lottery <span class="math inline">\(L_{A}\)</span>. In practice, this
means that agents’ preferences change smoothly and predictably with
changes in options.</p></li>
<li><p>Independence: The preference between two lotteries is not
impacted by the addition of equal probabilities of a third, independent
lottery to each lottery. That is, <span
class="math inline">\(L_{A}\succcurlyeq L_{B}\)</span> is equivalent to
<span class="math inline">\(pL_{A}+\left(1-p\right)L_{C}\succcurlyeq
pL_{B}+\left(1-p\right)L_{C}\)</span> for any <span
class="math inline">\(L_{C}\)</span>. ></p></li>
</ol>
<p><strong>Form of von Neumann-Morgenstern utility functions.</strong>
If an agent’s preferences are consistent with the above axioms, their
preferences can be represented by a vNM utility function. This utility
function, denoted by a capital <span class="math inline">\(U\)</span>,
is simply the expected Bernoulli utility of a lottery. That is, a vNM
utility function takes the Bernoulli utility of each outcome, multiplies
each with its corresponding probability of occurrence, and then adds
everything up. Formally, an agent’s expected utility for a lottery <span
class="math inline">\(L\)</span> is calculated as: <span
class="math display">\[U\left(L\right) = u\left(o_{1}\right) \cdot
p_{1}+u\left(o_{2}\right) \cdot p_{2}+\cdots +u\left(o_{n}\right) \cdot
p_{n},\]</span> so expected utility can be thought of as a weighted
average of the utilities of different outcomes.<br />
This is identical to the expected utility formula we discussed above—we
sum over the utilities of all the possible outcomes, each multiplied by
its probability of occurrence. With Bernoulli utility functions, an
agent prefers <span class="math inline">\(a\)</span> to <span
class="math inline">\(b\)</span> if and only if their utility from
receiving <span class="math inline">\(a\)</span> is greater than their
utility from receiving <span class="math inline">\(b\)</span>. With
expected utility, an agent prefers lottery <span
class="math inline">\(L_{A}\)</span> to lottery <span
class="math inline">\(L_{B}\)</span> if and only if their expected
utility from lottery <span class="math inline">\(L_{A}\)</span> is
greater than from lottery <span class="math inline">\(L_{B}\)</span>.
That is: <span class="math display">\[L_{A}\succ L_{B}\Leftrightarrow
U\left(L_{A}\right)>U\left(L_{B}\right).\]</span> where the symbol
<span class="math inline">\(\succ\)</span> indicates preference. The von
Neumann-Morgenstern utility function models the decision making of an
agent considering two lotteries as just calculating the expected
utilities and choosing the larger resulting one.<br />
</p>
<div class="storybox">
<p><span>A Note on Logarithms</span></p>
<p><strong>Logarithmic functions are commonly used as utility
functions.</strong> A logarithm is a mathematical function that
expresses the power to which a given number (referred to as the base)
must be raised in order to produce a value. The logarithm of a number
<span class="math inline">\(x\)</span> with respect to base <span
class="math inline">\(b\)</span> is denoted as <span
class="math inline">\(\log_{b}x\)</span>, and is the exponent to which
<span class="math inline">\(b\)</span> must be raised to produce the
value <span class="math inline">\(x\)</span>. For example, <span
class="math inline">\(\log_{2}8 = 3\)</span>, because <span
class="math inline">\(2^{3} = 8\)</span>.<br />
One special case of the logarithmic function, the natural logarithm, has
a base of <span class="math inline">\(e\)</span> (which is Euler’s
constant, roughly 2.718); in this chapter, it is referred to simply as
<span class="math inline">\(\log\)</span>. Logarithms have the following
properties, independent of base: <span
class="math inline">\(\log0\rightarrow -\infty\)</span>, <span
class="math inline">\(\log1 = 0,\)</span> <span
class="math inline">\(\log_{b}b = 1,\)</span> and <span
class="math inline">\(\log_{b}b^{a} = a\)</span>.<br />
Logarithms have a downward, concave shape, meaning the output increases
slower than the input. This shape resembles how humans value resources:
we generally value a good less if we already have more of it.
Logarithmic functions value goods in inverse proportion to how much of
the resource we already have.<br />
</p>
<figure>
</figure>
</div>
<h2 id="st.-petersburg-paradox">6.2.3 St. Petersburg Paradox</h2>
<p>An old man on the streets of St. Petersburg offers gamblers the
following game: he will flip a fair coin repeatedly until it lands on
tails. If the first flip lands tails, the game ends and the pension fund
gets $2. If the coin first lands on heads and then lands on tails, the
game ends and the gambler gets $4. The amount of money (the “return”)
will double for each consecutive flip landing heads before the coin
ultimately lands tails. The game concludes when the coin first lands
tails, and the gambler receives the appropriate returns. Now, the
question is, how much should a gambler be willing to pay from the
pension fund to play this game <span class="citation"
data-cites="peterson2019paradox"></span>?<br />
With probability <span class="math inline">\(\frac{1}{2}\)</span>, the
first toss will land on tails, in which case the gambler wins two
dollars. With probability <span
class="math inline">\(\frac{1}{4}\)</span>, the first toss lands heads
and the second lands tails, and the gambler wins four dollars.
Extrapolating, this game offers a maximum possible payout of: <span
class="math display">\[\$ 2^{n} = \$ \overbrace{2 \cdot 2 \cdot 2\cdots
2 \cdot 2 \cdot 2}^{n \text{ times}},\]</span> where <span
class="math inline">\(n\)</span> is the number of flips until and
including when the coin lands on tails. As offered, though, there is no
limit to the size of <span class="math inline">\(n\)</span>, since the
company promises to keep flipping the coin until it lands on tails. The
expected payout of this game is therefore: <span
class="math display">\[E\left[L\right] =\frac{1}{2} \cdot \$
2+\frac{1}{4} \cdot \$ 4+\frac{1}{8} \cdot \$ 8+\cdots = \$ 1+\$ 1+\$
1+\cdots = \$ \infty.\]</span> Bernoulli described this situation as a
paradox because he believed that, despite it having infinite expected
value, anyone would take a large but finite amount of money over the
chance to play the game. While paying <span
class="math inline">\(\$\)</span>10,000,000 to play this game would not
be inconsistent with its expected value, we would think it highly
irresponsible! The paradox reveals a disparity between expected value
calculations and reasonable human behavior.<br />
</p>
<figure>
</figure>
<p><strong>Logarithmic utility functions can represent decreasing
marginal utility.</strong> A number of ways have been proposed to
resolve the St. Petersburg paradox. We will focus on the most popular:
representing the player with a utility function instead of merely
calculating expected value. As we discussed in the previous section, a
logarithmic utility function seems to resemble how humans think about
wealth. As a person becomes richer, each additional dollar gives them
less satisfaction than before. This concept, called decreasing marginal
utility, makes sense intuitively: a billionaire would not be as
satisfied winning $1000 as someone with significantly less money.
Wealth, and many other resources like food, have such diminishing
returns. While a first slice of pizza is incredibly satisfying, a second
one is slightly less so, and few people would continue eating to enjoy a
tenth slice of pizza.<br />
Assuming an agent with a utility function <span
class="math inline">\(u(\$ x) = \log_{2}\left(x\right)\)</span> over
<span class="math inline">\(x\)</span> dollars, we can calculate the
expected utility of playing the St. Petersburg game as: <span
class="math display">\[E\left[U\left(L\right)\right] =\frac{1}{2} \cdot
\log_{2}(2)+\frac{1}{4} \cdot \log_{2}(4)+\frac{1}{8} \cdot
\log_{2}(8)+\cdots = 2.\]</span> That is, the expected utility of the
game is 2. From the logarithmic utility function over wealth, we know
that: <span class="math display">\[2 = \log_{2}x\Rightarrow x =
4,\]</span> which implies that the player is indifferent between playing
this game and having $4: the level of wealth that gives them the same
utility as what they expect playing the lottery.</p>
<p><strong>Expected utility is more reasonable than expected
value.</strong> The previous calculation explains why an agent with
<span class="math inline">\(u\left(\$ x\right) = \log_{2}x\)</span>
should not pay large amounts of money to play the St. Petersburg game.
The log utility function implies that the player receives diminishing
returns to wealth, and cares less about situations with small chances of
winning huge sums of money. Figure 5 shows how the large payoffs with
small probability, despite having the same expected value, contribute
little to expected utility. This feature captures the human tendency
towards risk aversion, explored in the next section. Note that while
logarithmic utility functions are a useful model (especially in
resolving such paradoxes), they do not perfectly describe human behavior
across choices, such as the tendency to buy lottery tickets, which we
will explore in the next chapter.<br />
</p>
<figure>
</figure>
<p><strong>Summary.</strong> In this section, we examined the properties
of Bernoulli utility functions, which allow us to compare an agent’s
preferences across different outcomes. We then introduced von
Neumann-Morgenstern utility functions, which calculate the average, or
expected, utility over different possible outcomes. From there, we
derived the idea that rational agents are able to make decisions that
maximize expected utility. Through the St. Petersburg Paradox, we showed
that taking the expected utility of a logarithmic function leads to more
reasonable behavior. Having understood the properties of utility
functions, we can now examine the problem of incorrigibility, where AI
systems do not accept corrective interventions because of rigid
preferences.</p>
<h1 id="tail-risk-corrigibility">6.3 Tail Risk: Corrigibility</h1>
<p><strong>Overview.</strong> In this section, we will explore how
utility functions provide insight into whether an AI system is open to
corrective interventions and discuss related implications for AI risks.
The von Neumann-Morgenstern (vNM) axioms of completeness and
transitivity can lead to strict preferences over shutting down or being
shut down, which affects how easily an agent can be corrected. We will
emphasize the importance of developing corrigible AI systems that are
responsive to human feedback and that can be safely controlled to
prevent unwanted AI behavior.</p>
<p><strong>Corrigibility measures our ability to correct an AI if and
when things go wrong.</strong> An AI system is <em>corrigible</em> if it
accepts and cooperates with corrective interventions like being shut
down or having its utility function changed <span class="citation"
data-cites="pace"></span>. Without many assumptions, we can argue that
typical rational agents will resist corrective measures: changing an
agent’s utility function necessarily means that the agent will pursue
goals that result in less utility relative to their current
preferences.</p>
<p><strong>Suppose we own an AI that fetches coffee for us every
morning.</strong> Its utility function assigns “10 utils” to getting us
coffee quickly, “5 utils” to getting us coffee slowly, and “0 utils” to
not getting us coffee at all. Now, let’s say we want to change the AI’s
objective to instead make us breakfast. A regular agent would resist
this change, reasoning that making breakfast would mean it is less able
to efficiently make coffee, resulting in lower utility. However, a
corrigible AI would recognize that making breakfast could be just as
valuable to humans as fetching coffee and would be open to the change in
objective. The AI would move on to maximizing its new utility function.
In general, corrigible AIs are more amenable to feedback and
corrections, rather than stubbornly adhering to their initial goals or
directives. When AIs are corrigible, humans can more easily correct
rogue actions and prevent any harmful or unwanted behavior.</p>
<p><strong>Completeness and transitivity imply that an AI has strict
preferences over shutting down.</strong> Assume that an agent’s
preferences satisfy the vNM axioms of completeness, such that it can
rank all options, as well as transitivity, such that its preferences are
consistent. For instance, the AI can see that preferring an apple to a
banana and a banana to a cherry implies that we prefer an apple to a
cherry. Then, we know that the agent’s utility function ranks every
option.<br />
Consider again the coffee-fetching AI. Suppose that in addition to
getting us coffee quickly (10 utils), getting us coffee slowly (5
utils), and not getting us coffee (0 utils), there is a fourth option,
where the agent gets shut down immediately. The AI expected that
immediate shutdown will result in its owner getting coffee slowly
without AI assistance, which appears to be valued at 5 units of utility
(the same as it getting us coffee slowly). The agent thus strictly
prefers getting us coffee quickly to shutting down, and strictly prefers
shutting down to us not having coffee at all.<br />
Generally, unless indifferent between everything, completeness and
transitivity imply that the AI has unspecified preferences about
potentially shutting down <span class="citation"
data-cites="thornley2023shutdown"></span>. Without completeness, the
agent could have no preference between shutting down immediately and all
other actions. Without transitivity, the agent could be indifferent
between shutting down immediately and all other possible actions without
that implying that the agent is indifferent between all possible
actions.</p>
<p><strong>It is bad if an AI either increases or reduces the
probability of immediate shutdown.</strong> Suppose that in trying to
get us coffee quickly, the AI drives at unsafe speeds. We’d like to shut
down the AI until we can reprogram it safely. A corrigible AI would
recognize our intention to shut down as a signal that it is misaligned.
However, an incorrigible AI would instead stay the course with what it
wanted to do initially—get us coffees—since that results in the most
utility. If possible, the AI would decrease the probability of immediate
shutdown, say by disabling its off-switch or locking the entrance to its
server rooms. Clearly, this would be bad.<br />
Consider a different situation where the AI realizes that making coffee
is actually quite difficult and that we would make coffee faster
manually, but fails to realize that we don’t want to exert the effort to
do so. The AI may then try to shut down, so that we’d have to make the
coffee ourselves. Suppose we tell the AI to continue making coffee at
its slow pace, rather than shut down. A corrigible AI would recognize
our instruction as a signal that it is misaligned and would continue to
make coffee. However, an incorrigible AI would instead stick with its
decision to shut down without our permission, since shutting down
provides it more utility. Clearly, this is also bad. We’d like to be
able to alter AIs without facing resistance.</p>
<p><strong>Summary.</strong> In this section, we introduced the concept
of corrigibility in AI systems. We discussed the relevance of utility
functions in determining corrigibility, particularly challenges that
arise if an AI’s preferences are complete and transitive, which can lead
to strict preferences over shutting down. We explored the potential
problems of an AI system reducing or increasing the probability of
immediate shutdown. The takeaway is that developing corrigible AI
systems—systems that are responsive and adaptable to human feedback and
changes—is essential in ensuring safe and effective control over AIs’
behavior. Examining the properties of utility functions illuminates
potential problems in implementing corrigibility.<br />
</p>
<div class="storybox">
<p><span>A Note on Utility Functions vs. Reward Functions</span> Utility
functions and reward functions are two interrelated yet distinct
concepts in understanding agent behavior. Utility functions represent an
agent’s preferences about states or the choice-worthiness of a state,
while rewards functions represent externally imposed reinforcement. The
fact that an outcome is rewarded externally does not guarantee that it
will become part of an agent’s internal utility function.<br />
An example where utility and reinforcement comes apart can be seen with
Galileo Galilei. Despite the safety and societal acceptance he could
gain by conforming to the widely accepted geocentric model, Galileo
maintained his heliocentric view. His environment provided ample
reinforcement to conform, yet he deemed the pursuit of scientific truth
more choiceworthy, highlighting a clear difference between environmental
reinforcement and the concepts of choice-worthiness or utility.<br />
As another example, think of evolutionary processes as selecting or
reinforcing some traits over others. If we considered taste buds as
components that help maximize fitness, we would expect more people to
want the taste of salads over cheeseburgers. However, it is more
accurate to view taste buds as “adaptation executors” rather than
“fitness maximizers,” as taste buds evolved in our ancestral environment
where calories were scarce. This illustrates the concept that agents act
on adaptations without necessarily adopting behavior that reliably helps
maximize reward.<br />
The same could be true for reinforcement learning agents. RL agents
might execute learned behaviors without necessarily maximizing reward;
they may form <em>decision procedures</em> that are not fully aligned
with its reinforcement. The fact that what is rewarded is not
necessarily what an agent thinks is choiceworthy could lead to AIs that
are not fully aligned with externally designed rewards. The AI might not
inherently consider reinforced behaviors as choiceworthy or of high
utility, so its utility function may differ from the one we want it to
have.<br />
</p>
</div>
<h1 id="attitudes-to-risk">6.4 Attitudes to Risk</h1>
<p><strong>Overview.</strong> The concept of risk is central to the
discussion of utility functions. Knowing an agent’s attitude towards
risk—whether they like, dislike, or are indifferent to risk—gives us a
good idea of what their utility function looks like. Conversely, if we
know an agent’s utility function, we can also understand their attitude
towards risk. We will first outline the three attitudes towards risk:
risk aversion, risk neutrality, and risk seeking. Then, we will consider
some arguments for why we might adopt each attitude, and provide
examples of situations where each attitude may be suitable to
favor.<br />
It is crucial to understand what risk attitudes are appropriate in which
contexts. To make AIs safe, we will need to give them safe risk
attitudes, such as by favoring risk-aversion over risk-neutrality. Risk
attitudes will help explain how people do and should act in different
situations. National governments, for example, will differ in risk
outlook from rogue states, and big tech companies will differ from
startups. Moreover, we should know how risk averse we should be with AI
development, as it has both large upsides and downsides.</p>
<h2 id="what-are-the-different-attitudes-to-risk">6.4.1 What Are the
Different Attitudes to Risk?</h2>
<p><strong>There are three broad types of risk preferences.</strong>
Agents can be risk averse, risk neutral, or risk seeking. In this
section, we first explore what these terms mean. We consider a few
equivalent definitions by examining different concepts associated with
risk <span class="citation" data-cites="dixitslides"></span>. Then, we
analyze what the advantages to adopting each certain attitude toward
risk might be.</p>
<p><strong>Let’s consider these in the context of a bet on a coin
toss.</strong> Suppose agents are given the opportunity to bet <span
class="math inline">\(\$\)</span>1000 on a fair coin toss—upon guessing
correctly, they would receive <span
class="math inline">\(\$\)</span>2000 for a net gain of <span
class="math inline">\(\$\)</span>1000. However, if they guess
incorrectly, they would receive nothing and lose their initial bet of
<span class="math inline">\(\$\)</span>1000. The expected value of this
bet is <span class="math inline">\(\$\)</span>0, irrespective of who is
playing: the player gains or loses <span
class="math inline">\(\$\)</span>1000 with equal probabilities. However,
a particular player’s willingness to take this bet, reflecting their
risk attitude, depends on how they calculate expected utility.</p>
<ol>
<li><p><em>Risk aversion</em> is the tendency to prefer a certain
outcome over a risky option with the same expected value. A risk-averse
agent would not want to participate in the coin toss. The individual is
unwilling to take the risk of a potential loss in order to potentially
earn a higher reward. Most humans are instinctively risk averse. A
common example of a risk-averse utility function is <span
class="math inline">\(u\left(x\right) = \log x\)</span> (red line in
Figure <a href="#fig:attitudes-to-risk" data-reference-type="ref"
data-reference="fig:attitudes-to-risk">1</a>).</p></li>
<li><p><em>Risk neutrality</em> is the tendency to be indifferent
between a certain outcome and a risky option with the same expected
value. For such players, expected utility is proportional to expected
value. A risk-neutral agent would not care whether they were offered
this coin toss, as its expected value is zero. If the expected value was
negative, they would prefer not to participate in the lottery, since the
lottery has negative expected value. Conversely, if the expected value
was positive, they would prefer to participate, since it would then have
positive expected value. The simplest risk-neutral utility function is
<span class="math inline">\(u(x) = x\)</span> (blue line in Figure <a
href="#fig:attitudes-to-risk" data-reference-type="ref"
data-reference="fig:attitudes-to-risk">1</a>).</p></li>
<li><p><em>Risk seeking</em> is the tendency to prefer a risky option
over a sure thing with the same expected value. A risk-seeking agent
would be happy to participate in this lottery. The individual is willing
to risk a negative expected value to potentially earn a higher reward.
We tend to associate risk seeking with irrationality, as it leads to
lower wealth through repeated choices made over time. However, this is
not necessarily the case. An example of a risk-seeking utility function
is <span class="math inline">\(u(x) = x^{2}\)</span> (green line in
Figure <a href="#fig:attitudes-to-risk" data-reference-type="ref"
data-reference="fig:attitudes-to-risk">1</a>).</p></li>
</ol>
<p>We can define each risk attitude in three equivalent ways. Each draws
on a different aspect of how we represent an agent’s preferences.</p>
<figure id="fig:attitudes-to-risk">
<figcaption>Utility functions capturing different attitudes to
risk</figcaption>
</figure>
<p><strong>Risk attitudes are fully explained by how an agent values
uncertain outcomes.</strong> According to expected utility theory, an
agent’s risk preferences can be understood from the shape of their
utility function, and vice-versa. We will illustrate this point by
showing that concave utility functions necessarily imply risk aversion.
An agent with a concave utility function faces decreasing marginal
utility. That is, the jump from <span
class="math inline">\(\$\)</span>1000 to <span
class="math inline">\(\$\)</span>2000 is less satisfying than the jump
from wealth <span class="math inline">\(\$\)</span>0 to wealth <span
class="math inline">\(\$\)</span>1000. Conversely, the agent dislikes
dropping from wealth <span class="math inline">\(\$\)</span>1000 to
wealth <span class="math inline">\(\$\)</span>0 more than they like
jumping from wealth <span class="math inline">\(\$\)</span>1000 to
wealth <span class="math inline">\(\$\)</span>2000. Thus, the agent will
not enter the aforementioned double-or-nothing coin toss, displaying
risk aversion.</p>
<p><strong>Preferences over outcomes may not fully explain risk
attitudes.</strong> It may seem unintuitive that risk attitudes are
entirely explained by how humans calculate utility of outcomes. As we
just saw, in expected utility theory, it is assumed that agents are risk
averse only because they have diminishing returns to larger outcomes.
Many economists and philosophers have countered that people also have an
inherent aversion to risk that is separate from preferences over
outcomes. At the end of this chapter, we will explore how non-expected
utility theories have attempted to more closely capture human behavior
in risky situations.</p>
<h2 id="risk-and-decision-making">6.4.2 Risk and Decision Making</h2>
<p><strong>Overview.</strong> Having defined risk attitudes, we will now
consider situations where it is appropriate to act in a risk-averse,
risk-neutral, or risk-seeking manner. Often, our risk approach in a
situation aligns with our overall risk preference—if we are risk averse
in day-to-day life, then we will also likely be risk averse when
investing our money. However, sometimes we might want to make decisions
as if we have a different attitude towards risk than we truly do.</p>
<p><strong>Criterion of rightness vs. decision procedure.</strong>
Philosophers distinguish between a <em>criterion of rightness</em>, the
way of judging whether an outcome is good, and a <em>decision
procedure</em>, the method of making decisions that lead to the good
outcomes. A good criterion of rightness may not be a good decision
procedure. This is related to the gap between theory and practice, as
explicitly pursuing an ideal outcome may not be the best way to achieve
it. For example, a criterion of rightness for meditation might be to
have a mind clear of thoughts. However, as a decision procedure,
thinking about not having thoughts may not help the meditator achieve a
clear mind—a better decision procedure would be to focus on the
breath.<br />
As another example, the <em>hedonistic paradox</em> reminds us that
people who directly aim at pleasure rarely secure it <span
class="citation" data-cites="sidgwick2019methods"></span>. While a
person’s pleasure level could be a criterion of rightness, it is not
necessarily a good guide to increasing pleasure—that is, not necessarily
a good decision procedure. Whatever one’s vision of pleasure looks
like—lying on a beach, buying a boat, consuming drugs—people who
directly aim at pleasure often find these things are not as pleasing as
hoped. People who aim at meaningful experiences, helping others and
engaging in activities that are intrinsically worthwhile, are more
likely to be happy. People tend to get more happiness out of life when
not aiming explicitly for happiness but for some other goal. Using the
criterion of rightness of happiness as a decision procedure can
predictably lead to unhappiness.<br />
Maximizing expected value can be a criterion of rightness, but it is not
always a good decision procedure. In the context of utility, we observe
a similar discrepancy where explicitly pursuing the criterion of
rightness (maximizing the utility function) may not lead to the best
outcome. Suppose an agent is risk neutral, such that their criterion of
rightness is maximizing a linear utility function. In the first
subsection, we will explore how they might be best served by making
decisions as if they are risk averse, such that their decision procedure
is maximizing a concave utility function.</p>
<h3 id="why-be-risk-averse">Why Be Risk Averse?</h3>
<p><strong>Risk-averse behavior is ubiquitous.</strong> In this section,
we will explore the advantages of risk aversion and how it can be a good
way to advance goals across different domains, from evolutionary fitness
to wealth accumulation. It might seem that by behaving in a risk-averse
way, thereby refusing to participate in some positive expected value
situations, agents leave a lot of value on the table. Indeed, extreme
risk aversion may be counterproductive—people who keep all their money
as cash under their bed will lose value to inflation over time. However,
as we will see, there is a sweet spot that balances the safety of
certainty and value maximization: risk-averse agents with logarithmic
utility almost surely outperform other agents over time, under certain
assumptions.</p>
<p><strong>Response to computational limits.</strong> In complex
situations, decision makers may not have the time or resources to
thoroughly analyze all options to determine the one with the highest
expected value. This problem is further complicated when the outcomes of
some risks we take have effects on other decisions down the line, like
how risky investments may affect retirement plans. To minimize these
complexities, it may be rational to be risk averse. This helps us avoid
the worst effects of our incomplete estimates when our uncertain
calculations are seriously wrong.<br />
Suppose Martin is deciding between purchasing a direct flight or two
connecting flights with a tight layover. The direct flight is more
expensive, but Martin is having trouble estimating the likelihood and
consequences of missing his connecting flight. He may prefer to play the
situation safe and pay for the more expensive direct flight, even though
the true value-for-money of the connected route may have been higher.
Now Martin can confidently make future decisions like booking a bus from
the airport to his hotel. Risk-averse decision making not only reduces
computational burden, but can also increase decision-making speed.
Instead of constantly making difficult calculations, an agent may prefer
to have a bias against risk.</p>
<p><strong>Behavioral advantage.</strong> Risk aversion is not only a
choice but a fundamental psychological phenomenon, and is influenced by
factors such as past experiences, emotions, and cognitive biases. Since
taking risks could lead to serious injury or death, agents undergoing
natural selection usually develop strategies to avoid such risks
whenever possible. Humans often shy away from risk, prioritizing safety
and security over more risky ventures, even if the potential rewards are
higher.<br />
Studies have shown that animals across diverse species exhibit
risk-averse behaviors. In a study conducted on bananaquits, a
nectar-drinking bird, researchers presented the birds with a garden
containing two types of flowers: one with consistent amounts of nectar
and one with variable amounts. They found that the birds never preferred
the latter, and that their preference for the consistent variety was
intensified when the birds were provided fewer resources in total <span
class="citation" data-cites="wunderle1987risk"></span>. This risk
aversion helps the birds survive and procreate, as risk-neutral or
risk-seeking species are more likely to die out over time: it is much
worse to have no nectar than it is better to have double the nectar.
Risk aversion is often seen as a survival mechanism.</p>
<p><strong>Natural selection favors risk aversion.</strong> Just as
individual organisms demonstrate risk aversion, entire populations are
pushed by natural selection to act risk averse in a manner that
maximizes the expected logarithm of their growth, rather than the
expected value. Consider the following, highly simplified example.
Suppose there are three types of animals—antelope, bear, crocodile—in an
area where each year is either scorching or freezing with probability
0.5. Every year, the populations grow or shrink depending on the
weather—some animals are better suited to the hot weather, and some to
the cold. The populations’ per-capita offspring, or equivalently the
populations’ growth multipliers, are shown in the table below.<br />
</p>
<p>Antelope have the same growth in each state, bears grow faster in the
warmth but slower in the cold when they hibernate, and crocodiles grow
rapidly when it is scorching and animals gather near water sources but
die out when their habitats freeze over. However, notice that the three
populations have the same average growth ratio of 1.1.<br />
However, “average growth” is misleading. Suppose we observe this
population over two periods, one hot followed by one cold. The average
growth multiplier over these two periods would be 1.1 for every animal.
However, this does not mean that they all grow the same amount. In the
table below, we can see the animals’ growth over time.<br />
</p>
<p>Adding the logarithm of each species’ hot and cold growth rates
indicates its long term growth trajectory. The antelope population will
continue growing no matter what, compounding over time. However, the
crocodile population will not—as soon as it enters a cold year, the
crocodiles will become permanently extinct. The bear population is not
exposed to immediate extinction risk, but over time it will likely
shrink towards extinction. Notice that maximizing long-run growth in
this case is equivalent to maximizing the sum of the logarithm of the
growth rates—this is risk aversion. The stable growth population, or
equivalently the risk-averse population, is favored by natural selection
<span class="citation" data-cites="okasha2007rational"></span>.</p>
<p><strong>Avoid risk of ruin.</strong> Risk aversion’s key benefit is
that it avoids risk of ruin. Consider a repeated game of equal
probability “triple-or-nothing” bets. That is, players are offered a
<span class="math inline">\(\frac{1}{2}\)</span> probability of tripling
their initial wealth <span class="math inline">\(w\)</span>, and a <span
class="math inline">\(\frac{1}{2}\)</span> probability of losing it all.
A risk-neutral player can calculate the expected value of a single round
as:<br />
<span class="math display">\[\frac{1}{2} \cdot 0+\frac{1}{2} \cdot 3w =
1.5w.\]</span> Since the expected value is greater than the player’s
initial wealth, a risk-neutral player would bet their entire wealth on
the game. Additionally, if offered this bet repeatedly, they would
reinvest everything they had in it each time. The expected value of
taking this bet <span class="math inline">\(n\)</span> times in a row,
reinvesting all winnings, would be:<br />
<span class="math display">\[\frac{1}{2} \cdot 0+\frac{1}{4} \cdot
0+\cdots +\frac{1}{2^{n}} \cdot 0+\frac{1}{2^{n}} \cdot 3^{n} \cdot w =
(1.5)^{n}w.\]</span> If the agent was genuinely offered this bet as many
times as they wanted, then they would continue to invest everything
infinitely many times, which gives them expected value of:<br />
<span class="math display">\[\lim_{n\rightarrow \infty }1.5^{n}w =
\infty.\]</span> This is another infinite expected value game—just like
in the St. Petersburg Paradox! However, notice that this calculation is
again heavily skewed by a single, low-probability branch in which an
extremely lucky individual continues to win, exponentially increasing
their wealth. In figure <a href="#fig:tripleornothing"
data-reference-type="ref"
data-reference="fig:tripleornothing">[fig:tripleornothing]</a>, we show
the first four bets in this strategy with a starting wealth of 16. Only
along the cyan branch does the player win any money, and this branch
increasingly becomes astronomically improbable. We would rarely choose
to repeatedly play triple-or-nothing games with everything we owned in
real life. We are risk averse when dealing with high probabilities of
losing all our money. Acting risk neutral and relying on expected value
would be a poor decision-making strategy.<br />
</p>
<figure>
</figure>
<p><strong>Maximizing logarithmic utility is a better decision
procedure.</strong> Agents might want to act as if maximizing the
logarithm of their wealth instead of maximizing the expected value. A
logarithmic function avoids risk of ruin because it assigns a utility
value of negative infinity to the outcome of zero wealth, since <span
class="math inline">\(\log0\rightarrow -\infty\)</span>. Therefore an
agent with a logarithmic utility function in wealth will never
participate in a lottery that could, however unlikely the case, land
them at zero wealth. The logarithmic function also grows slowly, placing
less weight on very unlikely, high-payout branches, a property that we
used to resolve the St. Petersburg Paradox. While we might have
preferences that are linear over wealth (which is our criterion of
rightness) we might be better served by a different decision procedure:
maximizing the logarithm of wealth rather than maximizing wealth
directly.</p>
<p><strong>Maximizing the logarithm of wealth maximizes every percentile
of wealth.</strong> Maximizing the logarithmic utility valuation avoids
risk of ruin since investors never bet their entire wealth on one
opportunity, much like how investors seek to avoid over-investing in one
asset by diversifying investments over multiple assets. Instead of
maximizing average wealth (as expected value does), maximizing the
logarithmic utility of wealth maximizes other measures associated with
the distribution of wealth. In fact, doing so maximizes the median,
which is the 50th percentile of wealth, and it also delivers the highest
value at any arbitrary percentile of wealth. It even maximizes the
mode—the most likely outcome. Mathematically, maximizing a logarithmic
utility function in wealth outperforms any other investment strategy in
the long run, with probability one (certainty) <span class="citation"
data-cites="kelly1956new"></span>. Thus, variations on maximizing the
logarithm of wealth are widely used in the financial sector.<br />
</p>
<h3 id="why-be-risk-neutral">Why Be Risk Neutral?</h3>
<p><strong>Risk neutrality is equivalent to acting on the expected
value.</strong> Since averages are straightforward and widely taught,
expected value is the mostly widely known explicit decision-making
procedure. However, despite expected value calculations being a common
concept in popular discourse, situations where agents do and should act
risk neutral are limited. In this section, we will first look at the
conditions under which risk neutrality might be a good decision
procedure—in such cases, maximizing expected value can be a significant
improvement over being too cautious. However, being mistaken about
whether the conditions hold is entirely possible. We will examine two
scenarios: one when these conditions hold, and one situation in which
incorrectly assuming that they held led to ruin.</p>
<p><strong>Risk neutrality is undermined by the possibility of
ruin.</strong> In the previous section, we examined the
triple-or-nothing game, where a risk-neutral approach can lead to zero
wealth in the long term. The risk of ruin, or the loss of everything, is
a major concern when acting risk neutral. In order for a situation to be
free of risk of ruin, several conditions must be met. First, risks must
be <em>local</em>, meaning they affect only a part of a system, unlike
<em>global</em> <em>risks</em>, which affect an entire system. Second,
risks must be <em>uncorrelated</em>, which means that the outcomes do
not increase or decrease together, so that local risks do not combine to
cause a global risk. Third, risks must be <em>tractable</em>, which
means the consequences and probabilities can be estimated reasonably
easily. Finally, there should be no <em>black swans</em>, unlikely and
unforeseen events that have a significant impact. As we will see, all of
these conditions are rarely met in a high-stakes environment, and there
can be dire consequences to underestimating the severity of risks.</p>
<p><strong>Risk neutrality is useful when the downside is
small.</strong> It can be appropriate to act in a risk-neutral manner
with regards to relatively inconsequential decisions. Suppose we’re
considering buying tickets to a movie that might not be any good. The
upside is an enjoyable viewing experience, and the downsides are all
local: <span class="math inline">\(\$\)</span>20 and a few wasted hours.
Since the stakes of this decision are minimal, it is reasonable not to
overthink our risk attitude and just attend the movie if we think that,
on average, we won’t regret this decision. However, if the decision at
hand were that of purchasing a car on credit, we likely would not act
hastily. The risk might not be localized but instead affect one’s entire
life; if we can’t afford to make payments, we could go bankrupt.
However, when potential losses are small, extreme risk aversion may be
too safe a strategy. We would prefer not to leave expected value on the
table.</p>
<p><strong>Dangers of risk neutrality.</strong> Often, agents
incorrectly assume that there is no risk of ruin. The failure of
financial institutions during the 2008 financial crisis, which sparked
the Great Recession, is a famous example of poor risk assessment. Take
the American International Group (AIG), a multinational insurance
company worth hundreds of billions of dollars <span class="citation"
data-cites="mcdonald2015went"></span>. By 2008, they had accumulated
billions of dollars worth of financial products related to the real
estate sector. AIG believed that their investments were sufficiently
uncorrelated, and therefore ruled out risk of ruin. However, AIG had not
considered a black swan: in 2008, many financial products related to the
housing market crashed. AIG’s investments were highly correlated with
the housing market, and the firm needed to be bailed out by the Federal
Reserve for <span class="math inline">\(\$\)</span>180 billion dollars.
Even institutions with sophisticated mathematical analysis fail to
identify risk of ruin—playing it safe might, unsurprisingly, be safer.
Artificial agents may operate in environments where risk of ruin is a
real and not a far-fetched possibility. We would not want a risk-neutral
artificial agent endangering human lives because of a naive expected
value calculation.<br />
</p>
<h3 id="why-be-risk-seeking">Why Be Risk Seeking?</h3>
<p><strong>Risk-seeking behavior is not always unreasonable.</strong> As
we previously defined, risk-seeking agents prefer to gamble for the
chance of a larger outcome rather than settle for the certainty of a
smaller one. In some cases, a risk-seeking agent’s behavior may be
regarded as unreasonable. For example, gambling addicts take frequent
risks that lower their utility and wellbeing in the long run. On the
other hand, many individuals and organizations may be motivated to seek
risks for a number of strategic reasons, which is the focus of this
section. We will consider four situations as examples where agents might
want to be risk seeking.</p>
<p><strong>In games with many players and few winners, risk-seeking
behavior can be justified.</strong> Consider a multi-player game where a
thousand participants compete for a single grand prize, which is given
to the player who accumulates the most points. An average player expects
to only win <span class="math inline">\(\frac{1}{1000}^{th}\)</span>of
the time. Even skilled players would reason that due to random chance,
they are unlikely to be the winner. Therefore, participants may seek