-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.xml
2243 lines (1773 loc) · 150 KB
/
index.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Jonathan Weisberg</title>
<link>http://jonathanweisberg.org/index.xml</link>
<description>Recent content on Jonathan Weisberg</description>
<generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<lastBuildDate>Tue, 10 Nov 2020 00:00:00 -0500</lastBuildDate>
<atom:link href="http://jonathanweisberg.org/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>How Scientific is Scientific Polarization?</title>
<link>http://jonathanweisberg.org/post/ow-commutativity/</link>
<pubDate>Tue, 10 Nov 2020 00:00:00 -0500</pubDate>
<guid>http://jonathanweisberg.org/post/ow-commutativity/</guid>
<description>
<p>As Joe Biden cleared 270 last week, some people remarked on how different the narrative would&rsquo;ve been had the votes been counted in a different order:</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">It&#39;s staggering to think about how differently PA would be viewed/covered right now if the EDay/mail ballots were being counted in the opposite order.</p>&mdash; Dave Wasserman (@Redistrict) <a href="https://twitter.com/Redistrict/status/1324456769817640961?ref_src=twsrc%5Etfw">November 5, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>The idea that order shouldn&rsquo;t affect your final take is a classic criterion of rationality. Whatever order the evidence comes in, your final opinion should be the same if it&rsquo;s the same total evidence in the end.</p>
<p>This post is about how <a href="https://doi.org/10.1007/s13194-018-0213-9" target="_blank">O&rsquo;Connor &amp; Weatherall&rsquo;s model</a> of &ldquo;scientific polarization&rdquo; runs afoul of this constraint. In their model, divergent opinions arise from a shared body of evidence, despite everyone involved being rational. Just how rational is what we&rsquo;re considering here.</p>
<h1 id="the-model">The Model</h1>
<p>Here&rsquo;s the quick version of O&rsquo;Connor &amp; Weatherall&rsquo;s model. You can find the gory details in my <a href="http://jonathanweisberg.org/post/ow">previous post</a>, but we won&rsquo;t need them here.</p>
<p>A community of medical doctors is faced with a novel treatment for some condition. Currently, patients with this condition have a .5 chance of recovering. The new treatment either increases that chance or decreases it. In actual fact it increases the chance of recovery, but our doctors don&rsquo;t know that yet.</p>
<p>Some doctors start out more skeptical of the new treatment, others more optimistic. Those with credence &gt; .5 try the new treatment on their patients, and share the results with the others. Everybody then updates their credence in the new treatment. The cycle of experimentation, sharing, and updating then repeats.</p>
<p>Crucially though, our doctors don&rsquo;t fully trust one another&rsquo;s results. If a doctor has a very different opinion about the new treatment than her colleague, she won&rsquo;t fully trust that colleague&rsquo;s data. She may even discount them entirely, if their credences differ enough.</p>
<p>As things develop, this medical community is apt to split. Some doctors learn the truth about the new treatment&rsquo;s superiority, while others remain skeptical and even come to completely disregard the results reported by their colleagues. This won&rsquo;t always happen, but it&rsquo;s the likely outcome given certain assumptions. Crucial for us here: doctors must discount one other&rsquo;s data entirely when their credences differ significantly&mdash;by .5 let&rsquo;s say, just for concreteness.</p>
<h1 id="the-problem">The Problem</h1>
<p>This way of evaluating evidence depends on the order. Here&rsquo;s an extreme example to make the point vivid.</p>
<p>Suppose Dr. Hibbert has credence .501 in the new treatment&rsquo;s benefits, and his colleagues Nick and Zoidberg are both at 1.0. Nick and Zoidberg each have a report to share with Hibbert, containing bad news about the new treatment. Nick found that it failed in all but 1 of his 10 patients, while Zoidberg found that it failed in all 10 of his. Whose report should Dr. Hibbert update on first?</p>
<p>If he listens to Nick first, he&rsquo;ll fall below .5 and ignore Zoidberg&rsquo;s report as a result. His difference of opinion with Zoidberg will be so large that Hibbert will come to discount him entirely. But if he listens to Zoidberg first, he&rsquo;ll ignore Nick then, for the same reason.</p>
<p>So Hibbert can only really listen to one of them. And since their reports are different, he&rsquo;ll end up with different credences depending on who he listens to. Zoidberg&rsquo;s report is slightly more discouraging. So Hibbert will end up more skeptical of the new treatment if he listens to Zoidberg first, than if he listens to Nick first.</p>
<h1 id="can-it-be-fixed">Can It Be Fixed?</h1>
<p>This problem isn&rsquo;t an artifact of the particulars of O&rsquo;Connor &amp; Weatherall&rsquo;s model. It&rsquo;s in the nature of the project. Any polarization model of the same, broad kind must have the same bug.</p>
<p>Polarization happens because skeptical agents come to ignore their optimistic colleagues at some point. Otherwise, skeptics would eventually be drawn to the truth. As long as they&rsquo;re still willing to give some credence to the experimental results, they&rsquo;ll eventually see that those results favour optimism about the new treatment.</p>
<p>But even if our agents never ignored one another completely, we&rsquo;d still have this problem. Suppose all three of our characters have the same credence. And Nick has one success to report where Zoidberg has one failure. Intuitively, once Hibbert hears them both out, he should end up right back where he started, no matter who he listens to first.</p>
<p>But if he listens to Nick first, his credence will move away from Zoidberg&rsquo;s. So when he gets to Zoidberg&rsquo;s report it&rsquo;ll carry less weight than Nick&rsquo;s did. He&rsquo;ll end up more confident than he started. Whereas he&rsquo;ll end up less confident if he proceeds in reverse order.</p>
<h1 id="does-it-matter">Does It Matter?</h1>
<p>It seems like polarization can&rsquo;t be fully scientific if it&rsquo;s driven by mistrust based on difference of opinion. But that doesn&rsquo;t make the model worthless, or even uninteresting. O&rsquo;Connor &amp; Weatherall are already clear that their agents aren&rsquo;t meant to be &ldquo;rational with a capital &lsquo;R&rsquo;&rdquo; anyway.</p>
<p>Quite plausibly, real people behave something like the agents in this model a lot of the time. The model might be capturing a very real phenomenon, even if it&rsquo;s an irrational one. We just have to take the &ldquo;scientific&rdquo; in &ldquo;scientific polarization&rdquo; with the right amount of salt.</p>
</description>
</item>
<item>
<title>Mistrust & Polarization</title>
<link>http://jonathanweisberg.org/post/ow/</link>
<pubDate>Mon, 09 Nov 2020 00:00:00 -0500</pubDate>
<guid>http://jonathanweisberg.org/post/ow/</guid>
<description>
<p>This is post 3 of 3 on simulated epistemic networks (code <a href="https://github.com/jweisber/sep-sen" target="_blank">here</a>):</p>
<ol>
<li><a href="http://jonathanweisberg.org/post/zollman/">The Zollman Effect</a></li>
<li><a href="http://jonathanweisberg.org/post/rbo">How Robust is the Zollman Effect?</a></li>
<li><a href="http://jonathanweisberg.org/post/ow">Mistrust &amp; Polarization</a></li>
</ol>
<p>The first post introduced a simple model of collective inquiry. Agents experiment with a new treatment and share their data, then update on all data as if it were their own. But what if they mistrust one another?</p>
<p>It&rsquo;s natural to have less than full faith in those whose opinions differ from your own. They seem to have gone astray somewhere, after all. And even if not, their views may have illicitly influenced their research.</p>
<p>So maybe our agents won&rsquo;t take the data shared by others at face value. Maybe they&rsquo;ll discount it, especially when the source&rsquo;s viewpoint differs greatly from their own. <a href="https://doi.org/10.1007/s13194-018-0213-9" target="_blank">O&rsquo;Connor &amp; Weatherall</a> (O&amp;W) explore this possibility, and find that it can lead to polarization.</p>
<h1 id="polarization">Polarization</h1>
<p>Until now, our communities always reached a consensus. Now though, some agents in the community may conclude the novel treatment is superior, while others abandon it, and even ignore the results of their peers using the new treatment.</p>
<p>In the example animated below, agents in blue have credence &gt;.5 so they experiment with the new treatment, sharing the results with everyone. Agents in green have credence ≤.5 but are still persuadable. They still trust the blue agents enough to update on their results&mdash;though they discount these results more the greater their difference of opinion with the agent who generated them. Finally, red agents ignore results entirely. They&rsquo;re so far from all the blue agents that they don&rsquo;t trust them at all.</p>
<div style="text-align: center;">
<video width="500" height="300" controls>
<source src="http://jonathanweisberg.org/img/sep-sen/ow-animate.mp4" type="video/mp4">
</video>
<figcaption style="font-style:italic; font-size: .8em; text-align: center; padding-bottom: .75em;">
Fig. 1. Example of polarization in the O'Connor–Weatherall model
</figcaption>
</div>
<p>In this simulation, we reach a point where there are no more green agents, only unpersuadable skeptics in red and highly confident believers in blue. And the blues have become so confident, they&rsquo;re unlikely to ever move close enough to any of the reds to get their ear. So we&rsquo;ve reached a stable state of polarization.</p>
<p>How often does such polarization occur? It depends on the size of the community, and on the &ldquo;rate of mistrust,&rdquo; $m$. Details on this parameter are below, but it&rsquo;s basically the rate at which difference of opinion increases discounting. The larger $m$ is, the more a given difference in our opinions will cause you to discount data I share with you.</p>
<p>Here&rsquo;s how these two factors affect the probability of polarization. (Note: we&rsquo;re considering only complete networks here.)</p>
<figure>
<img src="http://jonathanweisberg.org/img/sep-sen/ow-2.png" />
<figcaption style="font-style:italic; font-size: .8em; text-align: center; padding-bottom: .75em;">
Fig. 2. Probability of polarization depends on community size and rate of mistrust.
</figcaption>
</figure>
<p>So the more agents are inclined to mistrust one another, the more likely they are to end up polarized. No surprise there. But larger communities are also more disposed to polarize. Why?</p>
<p>As O&amp;W explain, the more agents there are, the more likely it is that strong skeptics will be present at the start of inquiry: agents with credence well below .5. These agents will tend to ignore the reports of the optimists experimenting with the new treatment. So they anchor a skeptical segment of the population.</p>
<p>The mistrust multiplier $m$ is essential for polarization to happen in this model. There&rsquo;s no polarization unless $m &gt; 1$. So let&rsquo;s see the details of how $m$ works.</p>
<h1 id="jeffrey-updating">Jeffrey Updating</h1>
<p>The more our agents differ in their beliefs, the less they&rsquo;ll trust each other. When Dr. Nick reports evidence $E$ to Dr. Hibbert, Hibbert won&rsquo;t simply <a href="https://plato.stanford.edu/entries/epistemology-bayesian/#SimPriCon" target="_blank">conditionalize</a> on $E$ to get his new credence $P&rsquo;(H) = P(H \mathbin{\mid} E)$. Instead he&rsquo;ll take a weighted average of $P(H \mathbin{\mid} E)$ and $P(H \mathbin{\mid} \neg E)$. In other words, he&rsquo;ll use <a href="https://plato.stanford.edu/entries/epistemology-bayesian/#ObjSimPriConRulInfOthObjBayConThe" target="_blank">Jeffrey conditionalization</a>:
$$ P&rsquo;(H) = P(H \mathbin{\mid} E) P&rsquo;(E) + P(H \mathbin{\mid} \neg E) P&rsquo;(\neg E). $$
But to apply this formula we need to know the value for $P&rsquo;(E)$. We need to know how believable Hibbert finds $E$ when Nick reports it.</p>
<p>O&amp;W note two factors that should affect $P&rsquo;(E)$.</p>
<ol>
<li><p>The more Nick&rsquo;s opinion differs from Hibbert&rsquo;s, the less Hibbert will trust him. So we want $P&rsquo;(E)$ to decrease with the absolute difference between Hibbert&rsquo;s credence in $H$ and Nick&rsquo;s. Call this absolute difference $d$.</p></li>
<li><p>We also want $P&rsquo;(E)$ to decrease with $P(\neg E)$. Nick&rsquo;s report of $E$ has to work against Hibbert&rsquo;s skepticism about $E$ to make $P&rsquo;(E)$ high.</p></li>
</ol>
<p>A natural proposal then is that $P&rsquo;(E)$ should decrease with the product $d \cdot P(\neg E)$, which suggests $1 - d \cdot P(\neg E)$ as our formula. When $d = 1$ this would mean Hibbert ignores Nick&rsquo;s report: $P&rsquo;(E) = 1 - P(\neg E) = P(E)$. And when they are simpatico, $d = 0$, Hibbert will trust Nick fully and just conditionalizes on his report, since then $P&rsquo;(E) = 1$.</p>
<p>This is fine from a formal point of view, but it means that Hibbert will basically never ignore Nick&rsquo;s testimony completely. There is zero chance of $d = 1$ ever happening in our models.</p>
<p>So, to explore models where agents fully discount one another&rsquo;s testimony, we introduce the mistrust multiplier, $m \geq 0$. This makes our final formula:
$$P&rsquo;(E) = 1 - \min(1, d \cdot m) \cdot P(\neg E).$$
The $\min$ is there to prevent negative values. When $d \cdot m &gt; 1$, we just replace it with $1$ so that $P&rsquo;(E) = P(E)$. Here&rsquo;s what this function looks like for one example, where $m = 1.5$ and $P(E) = .6$:</p>
<figure>
<img src="http://jonathanweisberg.org/img/sep-sen/ow-1.png" />
<figcaption style="font-style:italic; font-size: .8em; text-align: center; padding-bottom: .75em;">
Fig. 2. Posterior of the evidence $P'(E)$ when $m = 1.5$ and $P(E) = .6$
</figcaption>
</figure>
<p>Note the kink, the point after which agents just ignore one another&rsquo;s data.</p>
<p>O&amp;W also consider models where the line doesn&rsquo;t flatten, but keeps going down. In that case agents don&rsquo;t ignore one another, but rather &ldquo;anti-update.&rdquo; They take a report of $E$ as a reason to <em>decrease</em> their credence in $E$. This too results in polarization, more frequently and with greater severity, in fact.</p>
<h1 id="discussion">Discussion</h1>
<p>Polarization only happens when $m &gt; 1$. Only then do some agents mistrust their colleagues enough to fully discount their reports. If this never happened, they would eventually be drawn to the truth (however slowly) by the data coming from their more optimistic colleagues.</p>
<p>So is $m &gt; 1$ a plausible assumption? I think it can be. People can be so unreliable that their reports aren&rsquo;t believable at all. In some cases a report can even decrease the believability of the proposition reported. Some sources are known for their fabrications.</p>
<p>Ultimately it comes down to whether $P(E \,\vert\, R_E) &gt; P(E)$, i.e. whether someone reporting $E$ increases the probability of $E$. Nothing in-principle stops this association from being present, absent, or reversed. It&rsquo;s an empirical matter of what one knows about the source of the report.</p>
</description>
</item>
<item>
<title>How Robust is the Zollman Effect?</title>
<link>http://jonathanweisberg.org/post/rbo/</link>
<pubDate>Mon, 02 Nov 2020 00:00:00 -0500</pubDate>
<guid>http://jonathanweisberg.org/post/rbo/</guid>
<description>
<p>This is the second in a trio of posts on simulated epistemic networks:</p>
<ol>
<li><a href="http://jonathanweisberg.org/post/zollman/">The Zollman Effect</a></li>
<li><a href="http://jonathanweisberg.org/post/rbo">How Robust is the Zollman Effect?</a></li>
<li><a href="http://jonathanweisberg.org/post/ow">Mistrust &amp; Polarization</a></li>
</ol>
<p>This post summarizes some key ideas from <a href="http://doi.org/10.1086/690717" target="_blank">Rosenstock, Bruner, and O&rsquo;Connor&rsquo;s paper</a> on the Zollman effect, and reproduces some of their results in Python. As always you can grab <a href="https://github.com/jweisber/sep-sen" target="_blank">the code</a> from GitHub.</p>
<p><a href="http://jonathanweisberg.org/post/zollman/">Last time</a> we met the Zollman effect: sharing experimental results in a scientific community can actually hurt its chances of arriving at the truth. Bad luck can generate misleading results, discouraging inquiry into superior options. By limiting the sharing of results, we can increase the chance that alternatives will be explored long enough for their superiority to emerge.</p>
<p>But is this effect likely to have a big impact on actual research communities? Or is it rare enough, or small enough, that we shouldn&rsquo;t really worry about it?</p>
<h1 id="easy-like-sunday-morning">Easy Like Sunday Morning</h1>
<p>Last time we saw the Zollman effect can be substantial. The chance of success increased from .89 to .97 when 10 researchers went from full sharing to sharing with just two neighbours (from complete to cycle).</p>
<figure>
<img src="http://jonathanweisberg.org/img/sep-sen/zollman.png" />
<figcaption style="font-style:italic; font-size: .8em; text-align: center; padding-bottom: .75em;">
Fig. 1. The Zollman effect: less connected networks can have a better chance of discovering the truth
</figcaption>
</figure>
<p>But that was assuming the novel alternative is only slightly better: .501 chance of success instead of .5, a difference of .001. We&rsquo;d be less likely to get misleading results if the difference were .01, or .1. It should be easier to see the new treatment&rsquo;s superiority in the data then.</p>
<p>So RBO (Rosenstock, Bruner, and O&rsquo;Connor) rerun the simulations with different values for ϵ, the increase in probability of success afforded by the new treatment. Last time we held ϵ fixed at .001, now we&rsquo;ll let it vary up to .1. We&rsquo;ll only consider a complete network vs. a wheel this time, and we&rsquo;ll hold the number of agents fixed at 10. The number of trials each round continues to be 1,000.</p>
<figure>
<img src="http://jonathanweisberg.org/img/sep-sen/rbo-2.png" />
<figcaption style="font-style:italic; font-size: .8em; text-align: center; padding-bottom: .75em;">
Fig. 2. The Zollman effect vanishes as the difference in efficacy between the two treatments increases
</figcaption>
</figure>
<p>Here the Zollman effect shrinks as ϵ grows. In fact it&rsquo;s only visible up to about .025 in our simulations.</p>
<h1 id="more-trials-fewer-tribulations">More Trials, Fewer Tribulations</h1>
<p>Something similar can happen as we increase <em>n</em>, the number of trials each researcher performs. Last time we held <em>n</em> fixed at 1,000, now let&rsquo;s have it vary from 10 up to 10,000. We&rsquo;ll stick to 10 agents again, although this time we&rsquo;ll set ϵ to .01 instead of .001.</p>
<figure>
<img src="http://jonathanweisberg.org/img/sep-sen/rbo-3.png" />
<figcaption style="font-style:italic; font-size: .8em; text-align: center; padding-bottom: .75em;">
Fig. 3. The Zollman effect vanishes as the number of trials per iteration increases
</figcaption>
</figure>
<p>Again the Zollman effect fades, this time as the parameter <em>n</em> increases.</p>
<p>The emerging theme is that the easier the epistemic problem is, the smaller the Zollman effect. Before, we made the problem easier by making the novel treatment more effective. Now we&rsquo;re making things easier by giving our agents more data. These are both ways of making the superiority of the novel treatment easier to see. The easier it is to discern two alternatives, the less our agents need to worry about inquiry being prematurely shut down by the misfortune of misleading data.</p>
<h1 id="agent-smith">Agent Smith</h1>
<p>Last time we saw that the Zollman effect seemed to grow as our network grew, from 3 up to 10 agents. But RBO note that the effect reverses after a while. Let&rsquo;s return to <em>n</em> = 1,000 trials and ϵ = .001, so that we&rsquo;re dealing with a hard problem again. And let&rsquo;s see what happens as the number of agents grows from 3 up to 100.</p>
<figure>
<img src="http://jonathanweisberg.org/img/sep-sen/rbo-4.png" />
<figcaption style="font-style:italic; font-size: .8em; text-align: center; padding-bottom: .75em;">
Fig. 4. The Zollman effect eventually shrinks as the number of agents increases
</figcaption>
</figure>
<p>The effect grows from 3 agents up to around 10. But then it starts to shrink again, narrowing to a meagre .01 at 100 agents.</p>
<p>What&rsquo;s happening here? As RBO explain, in the complete network a larger community effectively means a larger sample size at each round. Since the researchers pool their data, a community of 50 will update on the results of 25,000 trials at each round, assuming half the community has credence &gt; 0.5. And a community of 100 people updates on the results of 50,000 trials, etc.</p>
<p>As the pooled sample size increases, so does the probability it will accurately reflect the novel treatment&rsquo;s superiority. The chance of the community being misled drops away.</p>
<h1 id="conclusion">Conclusion</h1>
<p>RBO conclude that the Zollman effect only afflicts epistemically &ldquo;hard&rdquo; problems, where it&rsquo;s difficult to discern the superior alternative from the data. But that doesn&rsquo;t mean it&rsquo;s not an important effect. Its importance just depends on how common it is for interesting problems to be &ldquo;hard.&rdquo;</p>
<p>Do such problems crop up in actual scientific research, and if so how often? It&rsquo;s difficult to say. As RBO note, the model we&rsquo;ve been exploring is both artificially simple and highly idealized. So it&rsquo;s unclear how often real-world problems, which tend to be messier and more complex, will follow similar patterns.</p>
<p>On the one hand, they argue, our confidence that the Zollman effect is important should be diminished by the fact that it&rsquo;s not robust against variations in the parameters. Fragile effects are less likely to come through in messy, real-world systems. On the other hand, they point to some empirical studies where Zollman-like effects seem to crop up in the real world.</p>
<p>So it&rsquo;s not clear. Maybe determining whether Zollman-hard problems are a real thing is itself a Zollman-hard problem?</p>
</description>
</item>
<item>
<title>The Zollman Effect</title>
<link>http://jonathanweisberg.org/post/zollman/</link>
<pubDate>Wed, 28 Oct 2020 00:00:00 -0500</pubDate>
<guid>http://jonathanweisberg.org/post/zollman/</guid>
<description>
<p>I&rsquo;m drafting a new social epistemology section for <a href="https://plato.stanford.edu/entries/formal-epistemology/" target="_blank">the SEP entry on formal epistemology</a>. It&rsquo;ll focus on a series of three papers that study epistemic networks using computer simulations. This post is the first in a series of three explainers, one on each paper.</p>
<ol>
<li><a href="http://jonathanweisberg.org/post/zollman">The Zollman Effect</a></li>
<li><a href="http://jonathanweisberg.org/post/rbo">How Robust is the Zollman Effect?</a></li>
<li><a href="http://jonathanweisberg.org/post/ow">Mistrust &amp; Polarization</a></li>
</ol>
<p>In each post I&rsquo;ll summarize the main ideas and replicate some key results in Python. You can grab <a href="https://github.com/jweisber/sep-sen" target="_blank">the final code from GitHub</a> if you want to play along and tinker.</p>
<h1 id="the-idea">The Idea</h1>
<p>More information generally means a better chance at discovering the truth, at least from an individual perspective. But not as a community, <a href="https://www.doi.org/10.1086/525605" target="_blank">Zollman finds</a>, at least not always. Sharing all our information with one another can make us less likely to reach the correct answer to a question we&rsquo;re all investigating.</p>
<p>Imagine there are two treatments available for some medical condition. One treatment is old, and its efficacy is well known: it has a .5 chance of success. The other treatment is new and might be slightly better or slightly worse: a .501 chance of success, or else .499.</p>
<p>Some doctors are wary of the new treatment, others are more optimistic. So some try it on their patients while others stick to the old ways.</p>
<p>As it happens the optimists are right: the new treatment is superior (chance .501 of success). So as they gather data about the new treatment and share it with the medical community, its superiority will eventually emerge as a consensus, right? At least, if all our doctors see all the evidence and weigh it fairly?</p>
<p>Not necessarily. It&rsquo;s possible that those trying the new treatment will hit a string of bad luck. Initial studies may get a run of less-than-stellar results, which don&rsquo;t accurately reflect the new treatment&rsquo;s superiority. After all, it&rsquo;s only slightly better than the traditional treatment. So it might not show its mettle right away. And if it doesn&rsquo;t, the optimists may abandon it before it has a chance to prove itself.</p>
<p>One way to mitigate this danger, it turns out, is to restrict the flow of information in the medical community. Imagine one doctor gets a run of bad luck&mdash;a string of patients who don&rsquo;t do so well with the new treatment, creating the misleading impression that the new treatment is inferior. If they share this result with everyone, it&rsquo;s more likely the whole community will abandon the new treatment. Whereas if they only share it with a few colleagues, others will keep trying the new treatment a while longer, hopefully giving them time to discover its superiority.</p>
<h1 id="the-model">The Model</h1>
<p>We can test this story by simulation. We&rsquo;ll create a network of doctors, each with their own initial credence that the new treatment is superior. Those with credence &gt; .5 will try the new treatment, others will stick to the old. Doctors directly connected in the network will share results with their neighbours, and everyone will update on whatever results they see using <a href="https://en.wikipedia.org/wiki/Bayes%27_theorem" target="_blank">Bayes&rsquo; theorem</a>.</p>
<p>We&rsquo;ll consider networks of different sizes, from 3 to 10 agents. And we&rsquo;ll try three different network &ldquo;shapes&rdquo;: complete, wheel, and cycle.</p>
<figure>
<img src="http://jonathanweisberg.org/img/sep-sen/graph-shapes.png" />
<figcaption style="font-style:italic; font-size: .8em; text-align: center; padding-bottom: .75em;">
Fig. 1. Three network configurations, illustrated here with 6 agents each
</figcaption>
</figure>
<p>These shapes vary in their connectedness. The complete network is fully connected, while the cycle is the least connected. Each doctor only confers with their two neighbours in the cycle. The wheel is in between.</p>
<p>Our conjecture is that the cycle will prove most reliable. A doctor who gets a run of bad luck&mdash;a string of misleading results&mdash;will do the least damage there. Sharing their results might discourage their two neighbours from learning the truth. But the others in the network may keep investigating, and ultimately learn the truth about the new treatment&rsquo;s superiority. The wheel should be more vulnerable to accidental misinformation, however, and the complete network most vulnerable.</p>
<h2 id="nitty-gritty">Nitty Gritty</h2>
<p>Initially, each doctor is assigned a random credence that the new treatment is superior, uniformly from the [0, 1] interval.</p>
<p>Those with credence &gt; .5 will then try the new treatment on 1,000 patients. The number of successes will be randomly determined, according to the <a href="https://en.wikipedia.org/wiki/Binomial_distribution" target="_blank">binomial distribution</a> with probability of success .501.</p>
<p>Each doctor then shares their results with their neighbours, and updates by Bayes&rsquo; theorem on all data available to them (their own + neighbors&rsquo;). Then we do another round of experimenting, sharing, and updating, followed by another, and so on until the community reaches a consensus.</p>
<p>Consensus can be achieved in either of two ways. Either everyone learns the truth that the new treatment is superior: credence &gt; .99 let&rsquo;s say. Alternatively, everyone might reach credence ≤ .5 in the new treatment. Then no one experiments with it further, so it&rsquo;s impossible for it to make a comeback. (The .99 cutoff is kind of arbitrary, but it&rsquo;s very unlikely the truth could be &ldquo;unlearned&rdquo; after that point.)</p>
<h1 id="results">Results</h1>
<p>Here&rsquo;s what happens when we run each simulation 10,000 times. Both the shape of the network and the number of agents affect how often the community finds the truth.</p>
<figure>
<img src="http://jonathanweisberg.org/img/sep-sen/zollman.png" />
<figcaption style="font-style:italic; font-size: .8em; text-align: center; padding-bottom: .75em;">
Fig. 2. Probability of discovering the truth depends on network configuration and number of agents.
</figcaption>
</figure>
<p>The less connected the network, the more likely they&rsquo;ll find the truth. And a bigger community is more likely to find the truth too. Why?</p>
<p>Bigger, less connected networks are better insulated against misleading results. Some doctors are bound to get data that don&rsquo;t reflect the true character of the new treatment once in a while. And when that happens, their misleading results risk polluting the community with misinformation, discouraging others from experimenting with the new treatment. But the more people in the network, the more likely the misleading results will be swamped by accurate, representative results from others. And the fewer people see the misleading results, the fewer people will be misled.</p>
<p>Here&rsquo;s an animated pair of simulations to illustrate the second effect. Here I set the six scientists&rsquo; starting credences to the same, even spread in both networks: .3, .4, .5, .6, .7, and .8. I also gave them the same sequence of random data. Only the connections in the networks are different, and in this case it makes all the difference. Only the cycle learns the truth. The complete network goes dark very early, abandoning the novel treatment entirely after just 26 iterations.</p>
<div style="text-align: center;">
<video width="600" height="300" controls>
<source src="http://jonathanweisberg.org/img/sep-sen/zollman.mp4" type="video/mp4">
</video>
<figcaption style="font-style:italic; font-size: .8em; text-align: center; padding-bottom: .75em;">
Fig. 3. Two networks with identical priors encounter identical evidence, but only one discovers the truth.
</figcaption>
</div>
<p>What saves the cycle network is the agent who starts with .8 credence (bottom left). She starts out optimistic enough to keep going after the group encounters an initial string of dismaying results. In the complete network, however, she receives so much negative evidence early on that she gives up almost right away. Her optimism is overwhelmed by the negative findings of her many neighbours. Whereas the cycle exposes her to less of this discouraging evidence, giving her time to keep experimenting with the novel treatment, ultimately winning over her neighbours.</p>
<p>As <a href="http://doi.org/10.1086/690717" target="_blank">Rosenstock, Bruner, and O&rsquo;Connor</a> put it: sometimes less is more, when it comes to sharing the results of scientific inquiry. But how important is this effect? How often is it present, and is it big enough to worry about in actual practice? Next time we&rsquo;ll follow Rosenstock, Bruner, and O&rsquo;Connor further and explore these questions.</p>
</description>
</item>
<item>
<title>The Beta Prior and the Lambda Continuum</title>
<link>http://jonathanweisberg.org/post/inductive-logic-3/</link>
<pubDate>Tue, 17 Dec 2019 00:00:00 -0500</pubDate>
<guid>http://jonathanweisberg.org/post/inductive-logic-3/</guid>
<description>
<p>In an <a href="http://jonathanweisberg.org/post/inductive-logic">earlier post</a> we met the $\lambda$-continuum, a generalization of <a href="http://jonathanweisberg.org/post/inductive-logic-2">Laplace&rsquo;s Rule of Succession</a>. Here is Laplace&rsquo;s rule, stated in terms of flips of a coin whose bias is unknown.</p>
<dl>
<dt>The Rule of Succession</dt>
<dd><p>Given $k$ heads out of $n$ flips, the probability the next flip will land heads is $$\frac{k+1}{n+2}.$$</p></dd>
</dl>
<p>To generalize we introduce an adjustable parameter, $\lambda$. Intuitively $\lambda$ captures how cautious we are in drawing conclusions from the observed frequency.</p>
<dl>
<dt>The $\lambda$ Continuum</dt>
<dd><p>Given $k$ heads out of $n$ flips, the probability the next flip will land heads is $$\frac{k + \lambda / 2}{n + \lambda}.$$</p></dd>
</dl>
<p>When $\lambda = 2$, this just is the Rule of Succession. When $\lambda = 0$, it becomes the &ldquo;Straight Rule,&rdquo; which matches the observed frequency, $k/n$. The general pattern is: the larger $\lambda$, the more flips we need to see before we tend toward the observed frequency, and away from the starting default value of $1/ 2$.$\newcommand{\p}{P}\newcommand{\given}{\mid}\newcommand{\dif}{d}$</p>
<p>So what&rsquo;s so special about $\lambda = 2$? Why did Laplace and others take a special interest in the Rule of Succession? Because it derives from the Principle of Indifference. <a href="http://jonathanweisberg.org/post/inductive-logic/">We saw</a> that setting $\lambda = 2$ basically amounts to assuming all possible frequencies have equal prior probability. Or that all possible biases of the coin are equally likely. The Rule of Succession thus corresponds to a uniform prior.</p>
<p>What about other values of $\lambda$ then? What kind of prior do they correspond to? This question has an elegant and illuminating answer, which we&rsquo;ll explore here.</p>
<ul>
<li><a href="http://jonathanweisberg.org/pdf/inductive-logic-3.pdf">PDF version here</a></li>
</ul>
<h1 id="a-preview">A Preview</h1>
<p>Let&rsquo;s preview the result we&rsquo;ll arrive at. Because, although the core idea isn&rsquo;t very technical, deriving the full result does takes some noodling. It will be good to have some sense of where we&rsquo;re going.</p>
<p>Here&rsquo;s a picture of the priors that correspond to various choices of $\lambda$. The $x$-axis is the bias of the coin, the $y$-axis is the probability density.</p>
<p><img src="http://jonathanweisberg.org/img/inductive-logic/betas.png" alt="" /></p>
<p>Notice how $\lambda = 2$ is a kind of inflection point. The plot goes from being concave up to concave down. When $\lambda &lt; 2$, the prior is U-shaped. Then, as $\lambda$ grows above $2$, we approach a normal distribution centered on $1/ 2$.</p>
<p>So, when $\lambda &lt; 2$, we start out pretty sure the coin is biased, though we don&rsquo;t know in which direction. When $\lambda &lt; 2$ we&rsquo;re inclined to run with the observed frequency, whatever that is. If we observe a heads on the first toss, we&rsquo;ll be pretty confident the next toss will land heads too. And the lower $\lambda$ is, the more confident we&rsquo;ll be about that.</p>
<p>Whereas $\lambda &gt; 2$ corresponds to an inclination to think the coin fair, or at least fair-ish. So it takes a while for the observed frequency to draw us away from our initial expectation of $1/ 2$. (Unless the observed frequency is itself $1/ 2$.)</p>
<p>That&rsquo;s the intuitive picture we&rsquo;re working towards. Let&rsquo;s see how to get there.</p>
<h1 id="pseudo-observations">Pseudo-observations</h1>
<p>Notice that the Rule of Succession is the same as pretending we&rsquo;ve already observed one heads and one tails, and then using the Straight Rule. A $3$rd toss landing heads would give us an observed frequency of $2/3$, precisely what the Rule of Succession gives when just $1$ toss has landed heads. If $k = n = 1$, then
$$ \frac{k+1}{n+2} = \frac{2}{3}. $$
So, setting $\lambda = 2$ amounts to imagining we have $2$ observations already, and then using the observed frequency as the posterior probability.</p>
<p>Setting $\lambda = 4$ is like pretending we have $4$ observations already. If we have $2$ heads and $2$ tails so far, then a heads on the $5$th toss would make for an observed frequency of $3/5$. And this is the posterior probability the $\lambda$-continuum dictates for a single heads when $\lambda = 4$:
$$ \frac{k + \lambda/2}{n + \lambda} = \frac{1 + 4/2}{1 + 4} = \frac{3}{5}. $$
In general, even values of $\lambda &gt; 0$ amount to pretending we&rsquo;ve already observed $\lambda$ flips, evenly split between heads and tails, and then using the observed frequency as the posterior probability.</p>
<p>This doesn&rsquo;t quite answer our question, but it&rsquo;s the key idea. We know that the uniform prior distribution gives rise to the posterior probabilities dictated by $\lambda = 2$. We want to know what prior distribution corresponds to other settings of $\lambda$. We see here that, for $\lambda = 4, 6, 8, \ldots$ the relevant prior is the same as the &ldquo;pseudo-posterior&rdquo; we would have if we updated the uniform prior on an additional $2$ &ldquo;pseudo-observations&rdquo;, or $4$, or $6$, etc.</p>
<p>So we just need to know what these pseudo-posteriors look like, and then extend the idea beyond even values of $\lambda$.</p>
<h1 id="pseudo-posteriors">Pseudo-posteriors</h1>
<p>Let&rsquo;s write $S_n = k$ to mean that we&rsquo;ve observed $k$ heads out of $n$ flips. We&rsquo;ll use $p$ for the unknown, true probability of heads on each flip. Our uniform prior distribution is $f(p) = 1$ for $0 \leq p \leq 1$. We want to know what $f(p \given S_n = k)$ looks like.</p>
<p>In <a href="https://jonathanweisberg.org/post/inductive-logic-2/" target="_blank">a previous post</a> we derived a formula for this:
$$ f(p \given S_n = k) = \frac{(n+1)!}{k!(n-k)!} p^k (1-p)^{n-k}. $$
This is the posterior distribution after observing $k$ heads out of $n$ flips, assuming we start with a uniform prior which corresponds to $\lambda = 2$. So, when we set $\lambda$ to a larger even number, it&rsquo;s the same as starting with $f(p) = 1$ and updating on $S_{\lambda - 2} = \lambda/2 - 1$. We subtract $2$ here because $2$ pseudo-observations were already counted in forming the uniform prior $f(p) = 1$.</p>
<p>Thus the prior distribution $f_\lambda$ for a positive, even value of $\lambda$ is:
$$
\begin{aligned}
f_\lambda(p) &amp;= f(p \given S_{\lambda - 2} = \lambda/2 - 1)\\<br />
&amp;= \frac{(\lambda - 1)!}{(\lambda/2 - 1)!(\lambda/2 - 1)!} p^{\lambda/2 - 1} (1-p)^{\lambda/2 - 1}.
\end{aligned}
$$
This prior generates the picture we started with for $\lambda \geq 2$.</p>
<p><img src="http://jonathanweisberg.org/img/inductive-logic/betas-even.png" alt="" /></p>
<p>As $\lambda$ increases, we move from a uniform prior towards a normal distribution centered on $p = 1/ 2$. This makes intuitive sense: the more we accrue evenly balanced observations, the more our expectations come to resemble those for a fair coin.</p>
<p>So, what about odd values of $\lambda$? Or non-integer values? To generalize our treatment beyond even values, we need to generalize our formula for $f_\lambda$.</p>
<h1 id="the-beta-prior">The Beta Prior</h1>
<p>Recall our formula for $f(p \given S_n = k)$:
$$ \frac{(n+1)!}{k!(n-k)!} p^k (1-p)^{n-k}. $$
This is a member of a famous family of probability densities, the <a href="https://en.wikipedia.org/wiki/Beta_distribution" target="_blank"><em>beta densities</em></a>. To select a member from this family, we specify two parameters $a,b &gt; 0$ in the formula:
$$ \frac{1}{B(a,b)} p^{a-1} (1-p)^{b-1}. $$
Here $B(a,b)$ is the beta function, defined:
$$ B(a,b) = \int_0^1 x^{a-1} (1-x)^{b-1} \dif x. $$
<a href="https://jonathanweisberg.org/post/inductive-logic-2/#the-beta-function" target="_blank">We showed</a> that, when $a$ and $b$ are natural numbers,
$$ B(a,b) = \frac{(a-1)!(b-1)!}{(a+b-1)!}. $$
To generalize our treatment of $f_\lambda$ beyond whole numbers, we first need to do the same for the beta function. We need $B(a,b)$ for all positive real numbers.</p>
<p>As it turns out, this is a matter of generalizing the notion of factorial. The generalization we need is called the gamma function, and it looks like this:</p>
<p><img src="http://jonathanweisberg.org/img/inductive-logic/gamma.png" alt="" /></p>
<p>The formal definition is
$$ \Gamma(x) = \int_0^\infty u^{x-1} e^{-u} \dif u. $$
The gamma function connects to the factorial function because it has the property:
$$ \Gamma(x+1) = x\Gamma(x). $$
This entails, by induction, that $\Gamma(n) = (n-1)!$ for any natural number $n$.</p>
<p>In fact we can substitute gammas for factorials in our formula for the beta function:
$$ B(a,b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}. $$
Proving this formula would require a long digression, so we&rsquo;ll take it for granted here.</p>
<p>Now we can now work with beta densities whose parameters are not whole numbers. For any $a, b &gt; 0$, the beta density is
$$ \frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)} p^{a-1} (1-p)^{b-1}. $$
We can now show our main result: setting $a = b = \lambda/2$ generates the $\lambda$-continuum.</p>
<h1 id="from-beta-to-lambda">From Beta to Lambda</h1>
<p>We&rsquo;ll write $X_{n+1} = 1$ to mean that toss $n+1$ lands heads. We want to show
$$ \p(X_{n+1} = 1 \given S_n = k) = \frac{k + \lambda/2}{n + \lambda}, $$
given two assumptions.</p>
<ul>
<li>The tosses are independent and identically distributed with probability $p$ for heads.</li>
<li>The prior distribution $f_\lambda(p)$ is a beta density with $a = b = \lambda/2$.</li>
</ul>
<p>We start by applying the Law of Total Probability:
$$
\begin{aligned}
P(X_{n+1} = 1 \given S_n = k)
&amp;= \int_0^1 P(X_{n+1} = 1 \given S_n = k, p) f_\lambda(p \given S_n = k) \dif p\\<br />
&amp;= \int_0^1 p f_\lambda(p \given S_n = k) \dif p.
\end{aligned}
$$
Notice, this is the expected value of $p$, according to the posterior $f_\lambda(p \given S_n = k)$. To analyze it further, we use two facts proved below.</p>
<ol>
<li>The posterior $f_\lambda(p \given S_n = k)$ is itself a beta density, but with parameters $k + \lambda/2$ and $n - k + \lambda/2$.</li>
<li>The expected value of any beta density with parameters $a$ and $b$ is $a/(a+b)$.</li>
</ol>
<p>Thus
$$
\begin{aligned}
P(X_{n+1} = 1 \given S_n = k)
&amp;= \int_0^1 p f_\lambda(p \given S_n = k) \dif p \\<br />
&amp;= \frac{k + \lambda/2}{k + \lambda/2 + n - k + \lambda/2}\\<br />
&amp;= \frac{k + \lambda/2}{n + \lambda}.
\end{aligned}
$$
This is the desired result, we just need to establish Facts 1 and 2.</p>
<h2 id="fact-1">Fact 1</h2>
<p>Here we show that, if $f(p)$ is a beta density with parameters $a$ and $b$, then $f(p \given S_n = k)$ is a beta density with parameters $k+a$ and $n - k + b$.</p>
<p>Suppose $f(p)$ is a beta density with parameters $a$ and $b$:
$$ f(p) = \frac{1}{B(a, b)} p^{a-1} (1-p)^{b-1}. $$
We calculate $f(p \given S_n = k)$ using Bayes&rsquo; theorem:
\begin{align}
f(p \given S_n = k)
&amp;= \frac{f(p) P(S_n = k \given p)}{P(S_n = k)}\\<br />
&amp;= \frac{p^{a-1} (1-p)^{b-1} \binom{n}{k} p^k (1-p)^{n-k}}{B(a,b) P(S_n = k)}\\<br />
&amp;= \frac{\binom{n}{k}}{B(a,b) \p(S_n = k)} p^{k+a-1} (1-p)^{n-k+b-1} .\tag{1}
\end{align}
To analyze $\p(S_n = k)$, we begin with the Law of Total Probability:
$$
\begin{aligned}
P(S_n = k)
&amp;= \int_0^1 P(S_n = k \given p) f(p) \dif p\\<br />
&amp;= \int_0^1 \binom{n}{k} p^k (1-p)^{n-k} \frac{1}{B(a, b)} p^{a-1} (1-p)^{b-1} \dif p\\<br />
&amp;= \frac{\binom{n}{k}}{B(a, b)} \int_0^1 p^{a+k-1} (1-p)^{b+n-k-1} \dif p\\<br />
&amp;= \frac{\binom{n}{k}}{B(a, b)} B(k+a, n-k+b).
\end{aligned}
$$
Substituting back into Equation (1), we get:
$$ f(p \given S_n = k) = \frac{1}{B(k+a, n-k+b)} p^{k+a-1} (1-p)^{n-k+b-1}. $$
So $f(p \given S_n = k)$ is the beta density with parameters $k + a$ and $n - k + b$.</p>
<h2 id="fact-2">Fact 2</h2>
<p>Here we show that the expected value of a beta density with parameters $a$ and $b$ is $a/(a+b)$. The expected value formula gives:
$$
\frac{1}{B(a, b)} \int_0^1 p p^{a-1} (1-p)^{b-1} \dif p\\<br />
= \frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)} \int_0^1 p^a (1-p)^{b-1} \dif p.
$$
The integrand look like a beta density, with parameters $a+1$ and $b$. So we multiply by $1$ in a form that allows us to pair it with the corresponding normalizing constant:
$$
\begin{aligned}
\begin{split}
\frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)} &amp; \int_0^1 p^a (1-p)^{b-1} \dif p \\<br />
&amp;= \frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)} \frac{\Gamma(a + 1)\Gamma(b)} {\Gamma(a + b + 1)}\int_0^1 \frac{\Gamma(a + b + 1)}{\Gamma(a + 1)\Gamma(b)} p^a (1-p)^{b-1} \dif p\\<br />
&amp;= \frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)} \frac{\Gamma(a + 1)\Gamma(b)} {\Gamma(a + b + 1)}.
\end{split}
\end{aligned}
$$
Finally, we use the the property $\Gamma(a+1) = a \Gamma(a)$ to obtain:
$$
\frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)} \frac{a\Gamma(a)\Gamma(b)} {(a+b) \Gamma(a + b)} = \frac{a} {a+b}.
$$</p>
<h1 id="picturing-it">Picturing It</h1>
<p>What do our priors corresponding to $\lambda &lt; 2$ look like? Above we saw that they&rsquo;re U-shaped, approaching a flat line as $\lambda$ increases. Here&rsquo;s a closer look:</p>
<p><img src="http://jonathanweisberg.org/img/inductive-logic/betas-small.png" alt="" /></p>
<p>We can also look at odd values $\lambda \geq 2$ now, where the pattern is the same as we observed previously.</p>
<p><img src="http://jonathanweisberg.org/img/inductive-logic/betas-odd.png" alt="" /></p>
<h1 id="what-about-zero">What About Zero?</h1>
<p>What about when $\lambda = 0$? This is a permissible value on the $\lambda$-continuum, giving rise to the Straight Rule as we&rsquo;ve noted. But it doesn&rsquo;t correspond to any beta density. The parameters would be $a = b = \lambda/2 = 0$. Whereas we require $a, b &gt; 0$, since the integral
$$ \int_0^1 p^{-1}(1-p)^{-1} \dif p $$
diverges.</p>
<p>In fact no prior can agree with the Straight Rule. At least, not on the standard axioms of probability. The Straight Rule requires $\p(HH \given H) = 1$, which entails $\p(HT \given H) = 0$. By the usual definition of conditional probability then, $\p(HT) = 0$. Which means $\p(HTT \given HT)$ is undefined. Yet the Straight Rule says $\p(HTT \given HT) = 1/ 2$.</p>
<p>We can accommodate the Straight Rule by switching to a nonstandard axiom system, where conditional probabilities are primitive, rather than being defined as ratios of unconditional probabilities. This is approach is sometimes called &ldquo;Popper&ndash;Rényi&rdquo; style probability.</p>
<p>Alternatively, we can stick with the standard, Kolmogorov system and instead permit <a href="https://en.wikipedia.org/wiki/Prior_probability#Improper_priors" target="_blank">&ldquo;improper&rdquo; priors</a>: prior distributions that don&rsquo;t integrate to $1$, but which deliver posteriors that do.</p>
<p>Taking this approach, the beta density with $a = b = 0$ is called the <a href="https://en.wikipedia.org/wiki/Beta_distribution#Haldane.27s_prior_probability_.28Beta.280.2C0.29.29" target="_blank">Haldane prior</a>. It&rsquo;s sometimes regarded as &ldquo;informationless,&rdquo; since its posteriors just follow the observed frequencies. But other priors, like the uniform prior, also have some claim to representing perfect ignorance. The <a href="https://en.wikipedia.org/wiki/Jeffreys_prior" target="_blank">Jeffreys prior</a>, which is obtained by setting $a = b = 1/ 2$ (so $\lambda = 1$), is another prior with a similar claim.</p>
<p>That multiple priors can make this claim is a reminder of one of the great tragedies of epistemology: <a href="https://plato.stanford.edu/entries/epistemology-bayesian/" target="_blank">the problem of priors</a>.</p>
<h1 id="acknowledgments">Acknowledgments</h1>
<p>I&rsquo;m grateful to Boris Babic for reminding me of the beta-lambda connection. For more on beta densities I recommend the videos at <a href="http://stat110.net" target="_blank">stat110.net</a>.</p>
</description>
</item>
<item>
<title>Belief in Psyontology</title>
<link>http://jonathanweisberg.org/publication/Belief%20in%20Psyontology/</link>
<pubDate>Tue, 10 Dec 2019 21:59:04 -0500</pubDate>
<guid>http://jonathanweisberg.org/publication/Belief%20in%20Psyontology/</guid>
<description><p>Which is more fundamental, full belief or partial belief? I argue that neither is, ontologically speaking. A survey of some relevant cognitive psychology supports a dualist ontology instead. Beliefs come in two kinds, categorical and graded, with neither kind more fundamental than the other. In particular, the graded kind is no more fundamental. When we discuss belief in on/off terms, we are not speaking coarsely or informally about states that are ultimately credal.</p>
</description>
</item>
<item>
<title>Could've Thought Otherwise</title>
<link>http://jonathanweisberg.org/publication/Couldve%20Thought%20Otherwise/</link>
<pubDate>Tue, 10 Dec 2019 21:58:08 -0500</pubDate>
<guid>http://jonathanweisberg.org/publication/Couldve%20Thought%20Otherwise/</guid>
<description><p>Evidence is univocal, not equivocal. Its implications don&rsquo;t depend on our beliefs or values, the evidence says what it says. But that doesn&rsquo;t mean there&rsquo;s no room for rational disagreement between people with the same evidence. Evaluating evidence is a lot like polling an electorate: getting an accurate reading requires a bit of luck, and even the best pollsters are bound to get slightly different results. So even though evidence is univocal, rationality&rsquo;s requirements are not &ldquo;unique&rdquo;. Understanding this resolves several puzzles to do with uniqueness and disagreement.</p>
</description>
</item>
<item>
<title>Laplace's Rule of Succession</title>
<link>http://jonathanweisberg.org/post/inductive-logic-2/</link>
<pubDate>Tue, 10 Dec 2019 00:00:00 -0500</pubDate>
<guid>http://jonathanweisberg.org/post/inductive-logic-2/</guid>
<description>
<p>The Rule of Succession gives a simple formula for &ldquo;enumerative induction&rdquo;: reasoning from observed instances to unobserved ones. If you&rsquo;ve observed 8 ravens and they&rsquo;ve all been black, how certain should you be the next raven you see will also be black? According to the Rule of Succession, 90%. In general, the probability is $(k+1)/(n+2)$ that the next observation will be positive, given $k$ positive observations out of $n$ total.</p>
<p>When does the Rule of Succession apply, and why is it $(k+1)/(n+2)$? Laplace first derived a special case of the rule in 1774, using certain assumptions. The same assumptions also allow us to derive the general rule, and following the derivation through answers both questions.
$\newcommand{\p}{P}\newcommand{\given}{\mid}\newcommand{\dif}{d}$</p>
<ul>
<li><a href="http://jonathanweisberg.org/pdf/inductive-logic-2.pdf">PDF version here</a></li>
</ul>
<p>As motivation, imagine we&rsquo;re drawing randomly, with replacement, from an urn of marbles some proportion $p$ of which are black. Strictly speaking, $p$ must be a rational number in this setup. But formally, we&rsquo;ll suppose $p$ can be any real number in the unit interval.</p>
<p>If we have no idea what $p$ is, it&rsquo;s natural to start with a uniform prior over its possible values. Formally, $p$ is a random variable with a uniform density on the $[0,1]$ interval. Each draw induces another random variable,
$$
X_i =
\begin{cases}
1 &amp; \text{ if the $i^\text{th}$ draw is black},\newline
0 &amp; \text{ otherwise}.
\end{cases}
$$
We&rsquo;ll define one last random variable $S_n$, which counts the black draws:
$$ S_n = X_1 + \ldots + X_n . $$
Laplace&rsquo;s assumptions are then as follows.</p>
<ol>
<li>Each $X_i$ has the same chance $p$ of being $1$.</li>
<li>That chance is independent of whatever values the other $X_j$&rsquo;s take.</li>
<li>The prior distribution over $p$ is uniform: $f(p) = 1$ for $0 \leq p \leq 1$.</li>
</ol>
<p>Given these assumptions, the Rule of Succession follows:
$$ \p(X_{n+1} = 1 \given S_n = k) = \frac{k+1}{n+2}. $$
We&rsquo;ll start by deriving this result for the special case where all observations are positive, so that $k = n$.</p>
<h1 id="laplace-s-special-case">Laplace&rsquo;s Special Case</h1>
<p>When $k = n$, the Rule of Succession says:
$$ \p(X_{n+1} = 1 \given S_n = n) = \frac{n+1}{n+2}. $$
To derive this result, we start with the Law of Total Probability.
\begin{align}
\p(X_{n+1} = 1 \given S_n = n)
&amp;= \int_0^1 \p(X_{n+1} = 1 \given S_n = n, p) f(p \given S_n = n) \dif p\newline
&amp;= \int_0^1 \p(X_{n+1} = 1 \given p) f(p \given S_n = n) \dif p\newline
&amp;= \int_0^1 p \, f(p \given S_n = n) \dif p. \tag{1}
\end{align}
To finish the calculation, we need to compute $f(p \given S_n = n)$. We need to know how observing $n$ out of $n$ black marbles changes the probability density over $p$.</p>
<p>For this we turn to Bayes&rsquo; theorem.
$$
\begin{aligned}
f(p \given S_n = n)
&amp;= \frac{ f(p) \p(S_n = n \given p) }{ \p(S_n = n) }\newline
&amp;= \frac{ \p(S_n = n \given p) }{ \p(S_n = n) }\newline
&amp;= \frac{ p^n }{ \p(S_n = n) }\newline
&amp;= c p^n.
\end{aligned}
$$
Here $c$ is an as-yet unknown constant: the inverse of $\p(S_n = n)$, whatever that is. To find $c$, first observe by calculus that:
$$ \int_0^1 c p^n \dif p = \left. \left(\frac{c p^{n+1}}{n+1}\right) \right|_0^1 = \frac{c}{n+1}. $$
Then observe that this quantity must equal $1$, since we&rsquo;ve integrated $f(p \given S_n = n)$, a probability density. Thus $c = n + 1$, and hence
$$ f(p \given S_n = n) = (n+1) p^n. $$
Returning now to finish our original calculation in Equation (1):
$$
\begin{aligned}
\p(X_{n+1} = 1 \given S_n = n)
&amp;= \int_0^1 p \, f(p \given S_n = n) \dif p\newline
&amp;= \int_0^1 p \, (n+1) p^n \dif p\newline
&amp;= (n+1) \int_0^1 p^{n+1} \dif p\newline
&amp;= (n+1) \left. \left(\frac{p^{n+2}}{n+2}\right) \right|_0^1\newline
&amp;= \frac{n+1}{n+2}.
\end{aligned}
$$
This is the Rule of Succession when $k = n$, as desired.</p>
<h1 id="the-general-case">The General Case</h1>
<p>The proof of the general case starts similarly. We first apply the Law of Total Probability to obtain
$$ \p(X_{n+1} = 1 \given S_n = k) = \int_0^1 p \, f(p \given S_n = k) \dif p. \tag{2} $$
Then we use Bayes&rsquo; theorem to compute $f(p \given S_n = k)$.
\begin{align}
f(p \given S_n = k)
&amp;= \frac{ \p(S_n = k \given p) }{ \p(S_n = k) }\newline
&amp;= \frac{ \binom{n}{k} p^k (1-p)^{n-k} }{ \p(S_n = k) } \tag{3}.
\end{align}
Note that we used the formula for a <a href="https://en.wikipedia.org/wiki/Binomial_distribution#Probability_mass_function" target="_blank">binomial probability</a> here to calculate the numerator $\p(S_n = k \given p)$.</p>
<p>Computing the denominator $\p(S_n = k)$ requires a different approach from the special case. We start with the Law of Total Probability:
\begin{align}
\p(S_n = k)
&amp;= \int_0^1 \p(S_n = k \given p) f(p) \dif p\newline
&amp;= \int_0^1 \p(S_n = k \given p) \dif p \newline
&amp;= \int_0^1 \binom{n}{k} p^k (1-p)^{n-k} \dif p \newline
&amp;= \binom{n}{k} \int_0^1 p^k (1-p)^{n-k} \dif p.
\end{align}
This leaves us facing an instance of a famous function, the <a href="https://en.wikipedia.org/wiki/Beta_function" target="_blank">&ldquo;beta function,&rdquo;</a> which is defined:
$$ B(a, b) = \int_0^1 x^a (1-x)^{b} \dif x. $$
In our case $a$ and $b$ are natural numbers, so $B(a,b)$ has an elegant formula, which we use now and prove later:
$$
B(a, b) = \frac{a!b!}{(a + b + 1)!}.
$$
For us, $a = k$ and $b = n-k$, so we have
$$ \p(S_n = k) = \binom{n}{k} B(k, n-k) = \binom{n}{k} \frac{k!(n-k)!}{(n + 1)!}. $$
Substituting back into our calculation of $f(p \given S_n = k)$ in Equation (3):
$$
\begin{aligned}
f(p \given S_n = k)
&amp;= \frac{ \binom{n}{k} p^k (1-p)^{n-k} }{ \binom{n}{k} B(k, n-k) }\newline
&amp;= \frac{(n + 1)!}{k!(n-k)!} p^k (1-p)^{n-k} .
\end{aligned}
$$
Then we finish our original calculation from Equation (2):
\begin{align}
\p(X_{n+1} = 1 \given S_n = k)
&amp;= \int_0^1 p \frac{(n + 1)!}{k!(n-k)!} p^k (1-p)^{n-k} \dif p\newline
&amp;= \frac{(n + 1)!}{k!(n-k)!} \int_0^1 p^{k+1} (1-p)^{n-k} \dif p\newline
&amp;= \frac{(n + 1)!}{k!(n-k)!} B(k+1, n-k)\newline
&amp;= \frac{(n + 1)!}{k!(n-k)!} \frac{(k+1)!(n-k)!}{(k+1 + n-k + 1)!}\newline
&amp;= \frac{k+1}{n + 2}.
\end{align}
This is the Rule of Succession, as desired.</p>
<h1 id="the-beta-function">The Beta Function</h1>
<p>Finally, let&rsquo;s derive the formula we used for the beta function:
$$ \int_0^1 x^a (1-x)^{b} \dif x = \frac{a!b!}{(a + b + 1)!}, $$
where $a$ and $b$ are natural numbers. We proceed in two steps: integration by parts, then a proof by induction.</p>
<p>Notice first that when $b = 0$ our integral simplifies and is straightforward:
$$ \int_0^1 x^a \dif x = \frac{1}{a+1}. $$
So let&rsquo;s assume $b &gt; 0$ and pursue integration by parts. If we let
$$ u = (1 - x)^b, \quad \dif v = x^a \dif x, $$
then
$$ \dif u = -b (1 - x)^{b-1}, \quad v = \frac{x^{a+1}}{a+1}.
$$
So
$$
\begin{aligned}
\int_0^1 x^a (1-x)^{b} \dif x
&amp;= \left. \left(\frac{ x^{a+1} (1 - x)^b }{ a+1 }\right) \right|_0^1 +
\frac{b}{a+1} \int_0^1 x^{a+1} (1 - x)^{b-1} \dif x\newline
&amp;= \frac{b}{a+1} \int_0^1 x^{a+1} (1 - x)^{b-1} \dif x.
\end{aligned}
$$</p>
<p>Now we use this identity in an argument by induction. We already noted that when $b = 0$ we have $B(a, 0) = 1/(a+1)$. This satisfies the general formula
$$
B(a, b) = \frac{a!b!}{(a+b+1)!}.
$$
By induction on $b &gt; 0$, we find the formula holds in general:
$$
\begin{aligned}
B(a, b)
&amp;= \int_0^1 x^a (1-x)^{b} \dif x\newline
&amp;= \frac{b}{a+1} \int_0^1 x^{a+1} (1 - x)^{b-1} \dif x\newline
&amp;= \frac{b}{a+1} B(a+1, b-1)\newline
&amp;= \frac{b}{a+1} \frac{(a+1)!(b-1)!}{(a + 1 + b - 1 + 1)!}\newline
&amp;= \frac{a!b!}{(a + b + 1)!}.
\end{aligned}
$$</p>
<h1 id="acknowledgments">Acknowledgments</h1>
<p>Our proof of the special case follows <a href="https://youtu.be/N8O6zd6vTZ8?t=2245" target="_blank">this excellent video</a> by Joe Blitzstein. And our proof of the general case comes from Sheldon Ross&rsquo; classic textbook, <em>A First Course in Probability</em>, Exercise 30 on page 128 of the 7th edition.</p>
</description>
</item>
<item>
<title>Crash Course in Inductive Logic</title>
<link>http://jonathanweisberg.org/post/inductive-logic/</link>
<pubDate>Tue, 19 Nov 2019 00:00:00 -0500</pubDate>
<guid>http://jonathanweisberg.org/post/inductive-logic/</guid>
<description>
<p>There are four ways things can turn out with two flips of a coin:
$$ HH, \quad HT, \quad TH, \quad TT.$$
If we know nothing about the coin&rsquo;s tendencies, we might assign equal probability to each of
these four possible outcomes:
$$ Pr(HH) = Pr(HT) = Pr(TH) = Pr(TT) = 1/ 4. $$
But from another point of view, there are primarily three possibilities. If we ignore order,
the possible outcomes are $0$ heads, $1$ head, or $2$ heads. So we might
instead assign equal probability to these three outcomes, then divide
the middle $1/ 3$ evenly between $HT$ and $TH$: $$
Pr(HH) = 1/3 \qquad Pr(HT) = Pr(TH) = 1/6 \qquad Pr(TT) = 1/ 3.
$$</p>
<p>This two-stage approach may seem odd. But it&rsquo;s actually friendlier
from the point of view of inductive reasoning. On the first
scheme, a heads on the first toss doesn&rsquo;t increase the probability of
another heads. It stays fixed at $1/ 2$:
$$
\newcommand{\p}{Pr}
\newcommand{\given}{\mid}
\renewcommand{\neg}{\mathbin{\sim}}
\renewcommand{\wedge}{\mathbin{\text{&amp;}}}
\p(HH \given H) = \frac{1/ 4}{1/ 4 + 1/ 4} = \frac{1}{2}.
$$
Whereas it does increase on the second strategy, from $1/ 2$ to $2/ 3$:
$$ \p(HH \given H) = \frac{1/ 3}{1/ 3 + 1/ 6} = \frac{2}{3}. $$
The two-stage approach thus learns from experience, where the
single-step division is skeptical about induction.</p>
<p>This holds true as we increase the number of flips. If we do three tosses
for example, we&rsquo;ll find that $\p(HHH \given HH) = 3/ 4$ on the
two-stage analysis. Whereas this probability stays stubbornly fixed at
$1/ 2$ on the first approach. It won&rsquo;t budge no matter how many heads
we observe, so we can&rsquo;t learn anything about the coin&rsquo;s bias this way.</p>
<p>This is the difference between Carnap&rsquo;s famous account of induction, from
his 1950 book <em>Logical Foundations of Probability</em>, and the account
he finds <a href="http://www.kfs.org/jonathan/witt/t515en.html" target="_blank">in Wittgenstein&rsquo;s
<em>Tractatus</em></a>. <sup class="footnote-ref" id="fnref:peirce"><a rel="footnote" href="#fn:peirce">1</a></sup> Although
Carnap had actually been scooped by W. E. Johnson, who worked out a similar
analysis about $25$ years earlier.</p>
<p>This is a short explainer on some key elements of inductive logic worked out by
Johnson and Carnap and the place of those ideas in the story of
inductive logic.</p>
<ul>
<li><a href="http://jonathanweisberg.org/pdf/inductive-logic.pdf">PDF version here</a></li>
</ul>
<h1 id="states-structures">States &amp; Structures</h1>
<p>Carnap calls a fine-grained specification like $TH$ a <em>state-description</em>.
The coarser grained &ldquo;$1$ head&rdquo; is a <em>structure-description</em>. A
state-description specifies which flips land heads, and which tails.
While a structure-description specifies <em>how many</em> land heads and tails,
without necessarily saying which.</p>
<p>It needn&rsquo;t be coin flips landing heads or tails, of course. The same
ideas apply to any set of objects or events, and any feature they might
have or lack.</p>
<p>Suppose we have two objects $a$ and $b$, each of which might have some
property $F$. Working for a moment as Carnap did, in
first-order logic, here is an example of a structure-description:
$$ (Fa \wedge \neg Fb) \vee (\neg Fa \wedge Fb). $$ But this isn&rsquo;t a
state-description, since it doesn&rsquo;t specify which object has $F$. It
only says how many objects have $F$, namely $1$. One of the disjuncts alone would be a
state-description though:
$$ Fa \wedge \neg Fb. $$</p>
<p>Carnap&rsquo;s initial idea was that all structure-descriptions start out with the
same probability. These probabilities are then divided equally among
the state-descriptions that make up a structure-description.</p>
<p>For example, if we do three flips, there are four
structure-descriptions: $0$ heads, $1$ head, $2$ heads, and $3$ heads.
Some of these have only one state-description. For example, there&rsquo;s only
one way to get $0$ heads, namely $TTT$. So $$ \p(TTT) = 1/ 4. $$ But
others have multiple state-descriptions. There are three ways to get $1$
head for example, so we divide $1/ 4$ between them:
$$ \p(HTT) = \p(THT) = \p(TTH) = 1/ 12. $$</p>
<p>The effect is that more homogeneous sequences start out more probable.
There&rsquo;s only one way to get all heads, so the $HH$ state-description
inherits the full probability of the corresponding &ldquo;$2$ heads&rdquo;
structure-description. But a $50$-$50$ split has multiple permutations,
each of which inherits only a portion of the same quantum of
probability. A heterogeneous sequence of heads and tails thus starts out
less probable than a homogeneous one.</p>
<p>That&rsquo;s why the two-stage analysis is induction-friendly. It effectively
builds Hume&rsquo;s &ldquo;uniformity of nature&rdquo; assumption into the prior
probabilities.</p>
<h1 id="the-rule-of-succession">The Rule of Succession</h1>
<p>The two-stage assignment also yields a very simple formula for induction:
Laplace&rsquo;s famous Rule of Succession. (Derivation in the Appendix.)</p>
<dl>
<dt>The Rule of Succession</dt>
<dd><p>Given $k$ heads out of $n$ observed flips, the probability of heads
on a subsequent toss is $$\frac{k+1}{n+2}.$$</p></dd>
</dl>
<p>Laplace arrived at this rule about $150$ years earlier by somewhat
different means. But there is a strong similarity.</p>
<p>Laplace supposed that our coin has some fixed, but unknown, chance $p$
of landing heads on each toss. Suppose we regard all possible values
$0 \leq p \leq 1$ as equally likely.<sup class="footnote-ref" id="fnref:1"><a rel="footnote" href="#fn:1">2</a></sup> If we then update our beliefs
about the true value of $p$ using Bayes&rsquo; theorem, we arrive at the Rule
of Succession. (Proving this is a bit involved. Maybe I&rsquo;ll go over it
another time.)</p>
<p>The two-stage way of assigning prior probabilities is essentially the same
idea, just applied in a discrete setting. By treating all
structure-descriptions as equiprobable, we make all possible frequencies
of heads equiprobable. This is a discrete analogue of treating all
possible values of $p$ as equiprobable.</p>
<h1 id="the-continuum-of-inductive-methods">The Continuum of Inductive Methods</h1>
<p>Both Johnson and Carnap eventually realized that the two methods of assigning priors we&rsquo;ve
considered are just two points on a larger continuum.</p>
<dl>
<dt>The $\lambda$ Continuum</dt>
<dd><p>Given $k$ heads out of $n$ observed flips, the probability of heads
on a subsequent toss is $$\frac{k + \lambda/2}{n + \lambda},$$ for
some $\lambda$ in the range $0 \leq \lambda \leq \infty$.</p></dd>
</dl>
<p>What value should $\lambda$ take here? Notice we get the Rule of
Succession if $\lambda = 2$. And we get inductive skepticism if we let
$\lambda$ approach $\infty$. For then $k$ and $n$ fall away and the
ratio converges to $1/ 2$, no matter what $k$ and $n$ are.</p>
<p>If we set $\lambda = 0$, we get a formula we haven&rsquo;t discussed yet:
$k/n$. Reichenbach called this the Straight Rule. (In modern statistical
parlance it&rsquo;s the &ldquo;maximum likelihood estimate.&rdquo;)<sup class="footnote-ref" id="fnref:2"><a rel="footnote" href="#fn:2">3</a></sup></p>
<p>The overall pattern is: the higher $\lambda$, the more &ldquo;cautious&rdquo; our
inductive inferences will be. A larger $\lambda$ means less influence
from $k$ and $n$: the probability of another heads stays closer to the
initial value of $1/ 2$. In the extreme case where $\lambda = \infty$,
it stays stuck at exactly $1/ 2$ forever.</p>
<p>A low value of $\lambda$, on the other hand, will make our inferences
more ambitious. In the extreme case $\lambda = 0$, we jump immediately
to the observed frequency. Our expectation about the next toss is just
$k/n$, the frequency we&rsquo;ve observed so far. If we&rsquo;ve observed only one
flip and it was heads ($k = n = 1$), we&rsquo;ll be certain of heads on
the second toss! <sup class="footnote-ref" id="fnref:3"><a rel="footnote" href="#fn:3">4</a></sup></p>
<p>We can illustrate this pattern in a plot. First let&rsquo;s consider what
happens if the coin keeps coming up heads, i.e. $k = n$. As $n$
increases, various settings of $\lambda$ behave as follows.</p>
<p><img src="http://jonathanweisberg.org/img/inductive-logic/lambda-continuum-k1-n1.png" alt="" /></p>
<p>Now suppose the coin only lands heads every third time, so that
$k \approx n/3$.</p>
<p><img src="http://jonathanweisberg.org/img/inductive-logic/lambda-continuum-k1-n3.png" alt="" /></p>
<p>Notice how lower settings of $\lambda$ bounce around more here before
settling into roughly $1/ 3$. Higher settings approach $1/ 3$ more
steadily, but they take longer to get there.</p>
<h1 id="carnap-s-program">Carnap&rsquo;s Program</h1>
<p>Johnson and Carnap went much further, and others since have gone further still. For
example, we can include more than one predicate, we can use relational
predicates, and much more.</p>
<p>But philosophers aren&rsquo;t too big on this research program nowadays. Why not?</p>
<p>Choosing $\lambda$ is one issue. Once we see that it&rsquo;s more than a
binary choice, between inductive optimism and skepticism, it&rsquo;s hard to
see why we should plump for any particular value of $\lambda$. We could
set $\lambda = 2$, or $\pi$, or $42$. By what criterion could we make
this choice? No clear answer emerged from Carnap&rsquo;s program.</p>
<p>Another issue is Goodman&rsquo;s famous <a href="http://www.wi-phi.com/video/puzzle-grue" target="_blank">grue
puzzle</a>. Suppose we trade our
coin flips for emeralds. We might replace the heads/tails dichotomy with
green/not-green then. But we could instead replace it with
grue/not-grue. The prescriptions of our inductive logic depend on
our choice of predicate&mdash;on the underlying language to which we apply
our chosen value of $\lambda$.</p>
<p>So the Johnson/Carnap system doesn&rsquo;t provide us with rules for inductive
reasoning, more a framework for formulating such rules. We have to
decide which predicates should be projectible by choosing the underlying
language. And then we have to decide how projectible they should be by
choosing $\lambda$. Only then does the framework tell us what
conclusions to draw from a given set of observations.</p>
<p>Personally, I still find the framework useful. It provides a
lovely way to express informal ideas more rigorously. In it we can frame
questions about induction, skepticism, and prior probabilities with
lucidity.</p>
<p>I also like it as a source of toy models. For example, I might test when
a given claim about induction holds and when it doesn&rsquo;t, by playing with
different incarnations of $\lambda$.</p>
<p>The framework&rsquo;s utility is thus a lot like that of its
deductive cousins. Compare Timothy Williamson&rsquo;s use of modal logic to
create <a href="https://philpapers.org/rec/WILANO-22" target="_blank">models of Gettier cases</a>,
for example, or his model of <a href="https://philpapers.org/rec/WILIKN" target="_blank">improbable
knowledge</a>.</p>
<p>Even in deductive logic, we only get as much out as we put in. We have
to choose our connectives in propositional logic, our accessibility
relation in modal logic, etc. But a flexible system like possible-world
frames still has its uses. We can use it to explore
philosophical options and their interconnections.</p>