-
Notifications
You must be signed in to change notification settings - Fork 0
/
atom.xml
1324 lines (1021 loc) · 137 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title></title>
<link href="https://barghouthi.github.io/atom.xml" rel="self"/>
<link href="https://barghouthi.github.io/"/>
<updated>2023-01-25T18:35:45-06:00</updated>
<id>https://barghouthi.github.io</id>
<author>
<name>Aws Albarghouthi</name>
<email>[email protected]</email>
</author>
<entry>
<title>A Quantum Circuit Simulator in 27 Lines of Python</title>
<link href="https://barghouthi.github.io/2021/08/05/quantum/"/>
<updated>2021-08-05T00:00:00-05:00</updated>
<id>https://barghouthi.github.io/2021/08/05/quantum</id>
<content type="html"><p>We’re going to write a quantum circuit interpreter (or <em>simulator</em>) using just 27 lines of Python!</p>
<p>To understand this post, you don’t need to know anything about quantum computing. All you need to know is matrix multiplication! I’ll walk you through the rest! We’re going to treat the operations of a quantum computer as yet another programming language for which we want to build an interpreter. So we won’t get too much into quantum mechanics or fancy quantum algorithms.</p>
<p>You can find the entire simulator code at the <a href="#the-entire-quantum-circuit-simulator">bottom</a> of this post,
or as a <a href="https://colab.research.google.com/drive/1sP64Lt0OFpXZOeycK4MaFpqRPHh79VEU?usp=sharing">notebook</a>.</p>
<hr />
<h2 id="classical-circuits">Classical circuits</h2>
<p>We will begin by writing an interpreter for classical circuits—you know,
good old <em>not</em>, <em>and</em>, <em>or</em>. Then, we will generalize our interpreter to quantum circuits. A classical circuit in our setting applies logical operations to $n$ bits.</p>
<h3 id="classical-state">Classical state</h3>
<p>Let’s begin by representing the state of $n$ bits. We’ll do this in an unusual way. We will define a Boolean vector (i.e., a vector of 0s and 1s) of size $2^n$ where each index of the vector represents one possible state of the $n$ bits.
There will be a <em>single non-zero (1) element</em> in the vector indicating the state.</p>
<p>For $n=1$, we have a vector of size 2, e.g.:</p>
<p><img src="https://barghouthi.github.io/assets/classic1.png" alt="drawing" width="300" /></p>
<p>The vector is in black; the numbers in pink (left) are the indices denoting the state of the bit. The vector above, therefore, represents a bit that is set to 1, since it has element 1 at index 1.
Similarly, the vector
\(\begin{bmatrix}
1\\0
\end{bmatrix}\)
denotes a bit set to 0, because it has element 1 at index 0.</p>
<p>For $n=2$, we have a vector of size 4, like the following, which denotes that both bits are 0, because it has element 1 at index 00.</p>
<p><img src="https://barghouthi.github.io/assets/classic2.png" alt="drawing" width="400" /></p>
<p>You get the idea. It’s a terribly inefficient representation, but I chose it on purpose because we’ll later generalize it to <em>qubits</em>, and quantum simulation is inherently <a href="https://en.wikipedia.org/wiki/BQP">inefficient</a> as far as we can tell.</p>
<p>Here’s a Python class to represent a classical state with $n$ bits.
Note that a <code class="language-plaintext highlighter-rouge">state</code> is initialized to all bits being 0.
Also, note that we’re using numpy (<code class="language-plaintext highlighter-rouge">np</code>)</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Cstate</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">n</span><span class="p">):</span>
<span class="c1"># number of bits
</span> <span class="bp">self</span><span class="p">.</span><span class="n">n</span> <span class="o">=</span> <span class="n">n</span>
<span class="c1"># create vector of size 2^n
</span> <span class="bp">self</span><span class="p">.</span><span class="n">state</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="mi">2</span><span class="o">**</span><span class="bp">self</span><span class="p">.</span><span class="n">n</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">int</span><span class="p">)</span>
<span class="c1"># initialize bits to 0s
</span> <span class="c1"># by setting index 0 of vector to 1
</span> <span class="bp">self</span><span class="p">.</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
</code></pre></div></div>
<p>For example, the <code class="language-plaintext highlighter-rouge">state</code> field for 2 bits initially looks like this,
denoting the state 00 (just like the 2-bit vector illustrated above)</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="mi">1</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mi">0</span><span class="p">]</span>
</code></pre></div></div>
<p>Numpy represents vectors as rows instead of columns like we do—it won’t make a difference for us.</p>
<h3 id="flipping-a-bit">Flipping a bit</h3>
<p>Let’s now apply a NOT (negation) operation to a bit.
In a circuit of 1 bit, this looks as follows
(where the pink stuff is an example input/output of the circuit):</p>
<p><img src="https://barghouthi.github.io/assets/not1.png" alt="drawing" width="400" /></p>
<p>NOT is a transformation that takes a bit from one state to another.
We’re going to represent it as a transformation matrix:</p>
<p><img src="https://barghouthi.github.io/assets/notmat.png" alt="drawing" width="300" /></p>
<p>The way to read the matrix is by looking at columns then rows.
Note that each column has a single 1 and the rest of the entries are 0.
The position of the 1 denotes the transformation.
Take the bottom left 1 in the matrix, which is at column 0 and row 1;
this means that a bit that is 0 is transformed into 1.
Take the top right 1 now, at column 1 and row 0;
this means that a bit that is 1 is transformed into 0.</p>
<p>Now to apply this transformation to a state in our representation,
we simply multiply the NOT matrix above with the state vector.
For example, we can apply NOT to a single bit set to 0:</p>
<p><img src="https://barghouthi.github.io/assets/notmult.png" alt="drawing" width="300" /></p>
<p>The above multiplication results in
\(\begin{bmatrix}
0\\1
\end{bmatrix}\), denoting a bit set to 1.</p>
<p>Intuitively, multiplication by a transformation matrix simply moves around the 1 in the input state vector to some other position in the output state vector.</p>
<h3 id="handling-multiple-bits">Handling multiple bits</h3>
<p>But what happens when we have $n$ bits and we only want to negate
a specific one, say the $i$th one?</p>
<p><img src="https://barghouthi.github.io/assets/noti.png" alt="drawing" width="300" /></p>
<p>We will construct a bigger transformation matrix that only applies
the NOT to the $i$th bit and leaves the rest untouched.
To do so, we will “compose” the NOT matrix with two identity matrices.
One identity matrix will say that all bits before bit $i$ (bits $0$ to $i-1$) are untouched;
the other will say that all bits after bit $i$ (bits $i+1$ to $n-1$) are untouched.</p>
<p>Let’s first do this for the simple case of two bits
where we want to negate the second bit.
We take the <em>Kronecker product</em> ($\otimes$) of the identity matrix (of size 2, denoted $I_2$) with the NOT transformation.</p>
<p><img src="https://barghouthi.github.io/assets/not2.png" alt="drawing" width="300" /></p>
<p>If you haven’t seen Kroenecker product before, don’t be scared;
it just multiplies each element of the left matrix with the entire right matrix.
So in this case, we get a $4 \times 4$ matrix as follows:</p>
<p><img src="https://barghouthi.github.io/assets/not4.png" alt="drawing" width="300" />
Look at the $2 \times 2$ sub-matrix on the top left.
It’s the result of multiplying 1 (the top-left element of $I_2$) with the NOT matrix.</p>
<p>Consider the element highlighted in yellow. It says that if the bits
are 01 (column), then turn them into 00 (row). Observe that the second bit flips, but not the first bit, as desired. The same can be seen with the element highlighted in pink.</p>
<p>Alright, let’s capture this Kronecker product idea in its general form
and implement it.
$I_m$ is an indentity matrix that of size $m \times m$.</p>
<p><img src="https://barghouthi.github.io/assets/notg.png" alt="drawing" width="500" /></p>
<p>We will implement this as a method of the <code class="language-plaintext highlighter-rouge">Cstate</code> class
that takes an arbitrary transformation matrix <code class="language-plaintext highlighter-rouge">t</code> over contiguous bits and
applies it to all $n$ bits.
<code class="language-plaintext highlighter-rouge">eye</code> is numpy’s identity matrix function,
<code class="language-plaintext highlighter-rouge">kron</code> is Kronecker product,
and <code class="language-plaintext highlighter-rouge">matmul</code> is matrix multiplication.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">op</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">t</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
<span class="c1"># I_{2^i}
</span> <span class="n">eyeL</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">eye</span><span class="p">(</span><span class="mi">2</span><span class="o">**</span><span class="n">i</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">int</span><span class="p">)</span>
<span class="c1"># I_{2^{n-i-1}}
</span> <span class="c1"># t.shape[0]**0.5 denotes how many bits t applies to
</span> <span class="c1"># in case of NOT, t.shape[0]**0.5 == 1
</span> <span class="n">eyeR</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">eye</span><span class="p">(</span><span class="mi">2</span><span class="o">**</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">n</span> <span class="o">-</span> <span class="n">i</span> <span class="o">-</span> <span class="nb">int</span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">**</span><span class="mf">0.5</span><span class="p">)),</span>
<span class="n">dtype</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nb">int</span><span class="p">)</span>
<span class="c1"># eyeL ⊗ t ⊗ eyeR
</span> <span class="n">t_all</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">kron</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">kron</span><span class="p">(</span><span class="n">eyeL</span><span class="p">,</span> <span class="n">t</span><span class="p">),</span> <span class="n">eyeR</span><span class="p">)</span>
<span class="c1"># apply transformation to state
</span> <span class="bp">self</span><span class="p">.</span><span class="n">state</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">t_all</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">state</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">NOT</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
<span class="n">not_matrix</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">]</span>
<span class="p">])</span>
<span class="bp">self</span><span class="p">.</span><span class="n">op</span><span class="p">(</span><span class="n">not_matrix</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
</code></pre></div></div>
<p>The method <code class="language-plaintext highlighter-rouge">op</code> takes a transformation matrix <code class="language-plaintext highlighter-rouge">t</code>, e.g., the NOT matrix,
and applies it to the bit <code class="language-plaintext highlighter-rouge">i</code>.
The <code class="language-plaintext highlighter-rouge">NOT</code> method calls <code class="language-plaintext highlighter-rouge">op</code> with the NOT matrix.
Note that <code class="language-plaintext highlighter-rouge">op</code> also works with operations that apply to more than 1 bit,
e.g., <code class="language-plaintext highlighter-rouge">AND</code>, so long as the bits are contiguous.</p>
<h3 id="binary-operations">Binary operations</h3>
<p>Let’s now look at some binary operations.
The following transformation swaps two bits.</p>
<p><img src="https://barghouthi.github.io/assets/swap.png" alt="drawing" width="300" /></p>
<p>We can implement it as follows.
Note that <code class="language-plaintext highlighter-rouge">swap(i)</code> swaps bits <code class="language-plaintext highlighter-rouge">i</code> and <code class="language-plaintext highlighter-rouge">i+1</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">swap</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
<span class="n">swap_matrix</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">]</span>
<span class="p">])</span>
<span class="bp">self</span><span class="p">.</span><span class="n">op</span><span class="p">(</span><span class="n">swap_matrix</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
</code></pre></div></div>
<p>As a circuit, a swap is shown like this:</p>
<p><img src="https://barghouthi.github.io/assets/swapc.png" alt="drawing" width="400" /></p>
<p>We can similarly implement AND and OR, where the result is stored
in the first of the two bits.
<img src="https://barghouthi.github.io/assets/andor.png" alt="drawing" width="600" /></p>
<h3 id="simple-example">Simple example</h3>
<p>Finally, we end our discussion of classical circuits
with a simple circuit that checks if two bits are both zero.
The result is stored in the first bit.</p>
<p><img src="https://barghouthi.github.io/assets/cex.png" alt="drawing" width="400" /></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># initialize state
# Recall that Cstate initializes all bits to 0
</span><span class="n">s</span> <span class="o">=</span> <span class="n">Cstate</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="n">s</span><span class="p">.</span><span class="n">NOT</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="c1"># negate first bit
</span><span class="n">s</span><span class="p">.</span><span class="n">NOT</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># negate second bit
</span><span class="n">s</span><span class="p">.</span><span class="n">AND</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="c1"># AND first and second bits
</span>
<span class="k">print</span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">state</span><span class="p">)</span>
</code></pre></div></div>
<p>We get</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="mi">0</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mi">1</span><span class="p">]</span>
</code></pre></div></div>
<p>which means that the final state of the 2 bits is 11.
Since the first bit is 1, this means that
the two bits were initially zero.</p>
<p>In summary, with AND, OR, and NOT we can write an arbitrary Boolean function from $n$ bits to $n$-bits as a sequence of <code class="language-plaintext highlighter-rouge">Cstate</code> operations. Each one of these <code class="language-plaintext highlighter-rouge">Cstate</code> operations is implemented using matrix multiplication,
and the state of the bits is encoded as a vector of size $2^n$
with a single 1 denoting the state.</p>
<h2 id="quantum-circuits">Quantum circuits</h2>
<p>We now generalize the above to quantum circuits.
Instead of bits, we have <em>qubits</em>.
A qubit can be 0, 1, or a superposition of 0 and 1.
So if you write out its vector, it can have numbers in different indices.
E.g.,</p>
<p><img src="https://barghouthi.github.io/assets/qubit.png" alt="drawing" width="300" /></p>
<p>What this says is that if you <em>measure</em> the qubit—read its value—you will read 0 with probability 1/2 and 1 with probability 1/2. The probability is the sqaure of the absolute value of $1/\sqrt{2}$, the <em>amplitude</em>.
Amplitudes can be complex numbers.
If you sum up the squares of absolute amplitudes, you should get 1,
since they encode a probability distribution. In the above example, we have</p>
\[\left\vert\frac{1}{\sqrt{2}}\right\vert^2 + \left\vert\frac{1}{\sqrt{2}}\right\vert^2 = 1\]
<p>The above qubit vector is usually written with the following notation:
\(\frac{1}{\sqrt{2}} \vert0\rangle + \frac{1}{\sqrt{2}} \vert1\rangle\).
Amplitudes are multiplied by the classical states, 0 and 1, which are wrapped in the notation \(\vert\cdot\rangle\) for historical reasons.</p>
<p>We can easily represent this quantum state by copy-pasting the class definition
of classical states and changing the types from <code class="language-plaintext highlighter-rouge">int</code> to <code class="language-plaintext highlighter-rouge">complex</code>.
Who said copy-paste is bad?
Voila! We now have a quantum state class <code class="language-plaintext highlighter-rouge">Qstate</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Qstate</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">n</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">n</span> <span class="o">=</span> <span class="n">n</span>
<span class="bp">self</span><span class="p">.</span><span class="n">state</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="mi">2</span><span class="o">**</span><span class="bp">self</span><span class="p">.</span><span class="n">n</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">complex</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span> <span class="c1">#initialize qubits to 0s
</span></code></pre></div></div>
<h3 id="hadamard-gate">Hadamard gate</h3>
<p>Transformations of quantum states are also matrices,
but they can have complex numbers.
The matrices are <em>unitary</em>, which means that they are invertible
and maintain that the state represents a probability distribution.
This fact comes from the postulates of quantum mechanics, but doesn’t really concern us when implementing an interpreter.
It’s interesting to note though that AND and OR are not unitary,
because they’re not reversible,
and therefore are not quantum operations.</p>
<p>The first transformation we’ll look at is <em>Hadamard</em>, which applies to a single qubit.</p>
<p><img src="https://barghouthi.github.io/assets/hadamard.png" alt="drawing" width="300" /></p>
<p>A Hadamard gate puts a state in superposition.
For example, given the classical state $\vert0\rangle$, i.e., the vector \(\begin{bmatrix}
1\\ 0
\end{bmatrix}\),
it transforms it into the superposition we saw above,</p>
\[\frac{1}{\sqrt{2}} \vert0\rangle + \frac{1}{\sqrt{2}} \vert1\rangle\]
<p>As a circuit, we write this as follows:</p>
<p><img src="https://barghouthi.github.io/assets/hadamard2.png" alt="drawing" width="500" /></p>
<p>Similarly, given the state $\vert1\rangle$, Hadamard
transforms it into the superposition</p>
\[\frac{1}{\sqrt{2}} \vert0\rangle - \frac{1}{\sqrt{2}} \vert1\rangle\]
<p>or equivalently the vector</p>
\[\begin{bmatrix}
\frac{1}{\sqrt{2}} \\ - \frac{1}{\sqrt{2}}
\end{bmatrix}\]
<p>Note the negative amplitude of $\vert1\rangle$.
This is a key property of quantum mechanics that quantum algorithms exploit,
allowing amplitudes to cancel out (interfere), which we cannot achieve with classical randomized algorithms. We won’t get into it, but I recommend taking a look at <a href="https://quantum.country/search">Grover’s algorithm</a> (which is beautiful).</p>
<p>We will implement Hadamard just as we did with NOT.
I’m using <code class="language-plaintext highlighter-rouge">isq2</code> as a shorthand for $1/\sqrt{2}$.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># this function is the same as in the classical case
# the only difference is dtype, which is np.complex now
</span><span class="k">def</span> <span class="nf">op</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">t</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
<span class="c1">#I_{2^i}
</span> <span class="n">eyeL</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">eye</span><span class="p">(</span><span class="mi">2</span><span class="o">**</span><span class="n">i</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">complex</span><span class="p">)</span>
<span class="c1">#I_{2^{n-i-1}}
</span> <span class="n">eyeR</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">eye</span><span class="p">(</span><span class="mi">2</span><span class="o">**</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">n</span> <span class="o">-</span> <span class="n">i</span> <span class="o">-</span> <span class="nb">int</span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">**</span><span class="mf">0.5</span><span class="p">)),</span>
<span class="n">dtype</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nb">complex</span><span class="p">)</span>
<span class="c1"># eyeL ⊗ t ⊗ eyeR
</span> <span class="n">t_all</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">kron</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">kron</span><span class="p">(</span><span class="n">eyeL</span><span class="p">,</span> <span class="n">t</span><span class="p">),</span> <span class="n">eyeR</span><span class="p">)</span>
<span class="c1"># apply transformation to state
</span> <span class="bp">self</span><span class="p">.</span><span class="n">state</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">t_all</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">state</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">hadamard</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
<span class="n">h_matrix</span> <span class="o">=</span> <span class="n">isq2</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="p">])</span>
<span class="bp">self</span><span class="p">.</span><span class="n">op</span><span class="p">(</span><span class="n">h_matrix</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
</code></pre></div></div>
<h3 id="controlled-not-gate">Controlled NOT gate</h3>
<p>Next, we’ll look at the CNOT (controlled NOT) gate, which is a binary gate.</p>
<p><img src="https://barghouthi.github.io/assets/cnot.png" alt="drawing" width="300" /></p>
<p>Classically speaking,
this takes the XOR of two bits and stores the result
in the second bit.
But, as we shall see, it is fundamental in quantum computing,
as it allows us to <em>entangle</em> two qubits (more on this in a bit).
Pictorially, CNOT is denoted as follows:</p>
<p><img src="https://barghouthi.github.io/assets/cnotc.png" alt="drawing" width="300" /></p>
<p>Again, we will implement CNOT just like
in the classical setting:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">cnot</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
<span class="n">cnot_matrix</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
<span class="p">])</span>
<span class="bp">self</span><span class="p">.</span><span class="n">op</span><span class="p">(</span><span class="n">cnot_matrix</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
</code></pre></div></div>
<p>Qubit swaps are the same as classical bit swaps.</p>
<h3 id="phase-shift-gates">Phase-shift gates</h3>
<p>In the classical setting, AND and NOT suffice to
implement any Boolean function.
In the quantum setting, we’re missing two single-bit gates
that give us the full power of a quantum computer.
By full power, I mean a set of gates that can approximate any
unitary transformation to an arbitrary degree of accuracy.</p>
<p>The two missing gates are the $S$ and $T$ gates.
These gates don’t change the probability
of measurement, but they change the <em>phase</em> of the amplitudes.
This is where complex numbers come into play.</p>
<p>We’ll take a look at the $S$ gate:
<img src="https://barghouthi.github.io/assets/s.png" alt="drawing" width="200" />
Applying $S$ to a state doesn’t change the amplitude of $\vert0\rangle$,
but it multiplies the amplitude of $\vert1\rangle$ by $i$—the imaginary unit. (Remember complex numbers?)
For example, if we apply $S$ to the superposition state, we get:</p>
<p><img src="https://barghouthi.github.io/assets/s2.png" alt="drawing" width="400" />
Check out how the amplitude of $\vert1\rangle$
changed to $i / \sqrt{2}$.
If we turn the amplitude of $\vert 1 \rangle$ into a probability, we get the same probability as before:</p>
\[\left\vert\frac{1}{\sqrt{2}}\right\vert^2 = \left\vert\frac{i}{\sqrt{2}}\right\vert^2 = 1/2\]
<p>So while the amplitude has changed, the probabilities haven’t.</p>
<p>Here are the $S$ and $T$ gates in code.
Note that numpy uses $j$ instead of $i$ for the imaginary unit.
The $T$ gate is similar to the $S$ gate, in that it only changes the amplitude of $\vert1\rangle$,
but it changes it in a different way.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># S gate
</span><span class="k">def</span> <span class="nf">s</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
<span class="n">s_matrix</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mf">1j</span><span class="p">]</span>
<span class="p">])</span>
<span class="bp">self</span><span class="p">.</span><span class="n">op</span><span class="p">(</span><span class="n">s_matrix</span><span class="p">,</span><span class="n">i</span><span class="p">)</span>
<span class="c1"># T gate
</span><span class="k">def</span> <span class="nf">t</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
<span class="n">t_matrix</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="n">isq2</span> <span class="o">+</span> <span class="n">isq2</span> <span class="o">*</span> <span class="mf">1j</span><span class="p">]</span>
<span class="p">])</span>
<span class="bp">self</span><span class="p">.</span><span class="n">op</span><span class="p">(</span><span class="n">t_matrix</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="epr-pairs">EPR pairs</h2>
<p>At this point, we have a full-blown quantum-circuit simulator.
You can always add more gates as needed, but we have already implemented a <em>universal gate set</em>.</p>
<p>We will end with constructing an <em>EPR pair</em>, a special entangled state of two qubits proposed by Einstein, Podolsky and Rosen in 1935 to argue that quantum mechanics is incomplete.</p>
<p>Here’s how the circuit looks. Apply a Hadamard to the first qubit,
putting it in superposition, then apply a CNOT to both qubits.
<img src="https://barghouthi.github.io/assets/epr.png" alt="drawing" width="600" /></p>
<p>This results in the following state, which is called an EPR pair or a <em>Bell state</em>:</p>
\[\frac{1}{\sqrt{2}} \vert00\rangle + \frac{1}{\sqrt{2}} \vert11\rangle\]
<p>In vector notation, an EPR pair is
<img src="https://barghouthi.github.io/assets/epr2.png" alt="drawing" width="200" /></p>
<p>The beauty of this state is that the two qubits are entangled.
This means that if we measure the first bit, we will get 0 or 1
with equal probability.
But then the other qubit will also <em>collapse</em> to the same
answer that we get.
So if each of us has one of the two entangled bits,
it appears that we can achieve instantaneous communication!
This didn’t sit well with Einstein and his friends.
Indeed, their construction of EPR pairs was to demonstrate a paradox.
EPR pairs are key ingredients in <a href="https://quantum.country/teleportation">quantum teleportation</a>, which I encourage you to read about.</p>
<p>Here’s how we construct an EPR pair with our interpreter.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># constructing an EPR pair
</span><span class="n">s</span> <span class="o">=</span> <span class="n">Qstate</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> <span class="c1"># create a 2-qubit state
</span><span class="n">s</span><span class="p">.</span><span class="n">hadamard</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="c1"># hadamard on first qubit
</span><span class="n">s</span><span class="p">.</span><span class="n">cnot</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="c1"># CNOT the two qubits
</span>
<span class="k">print</span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">state</span><span class="p">)</span>
</code></pre></div></div>
<p>We get the following output,
which I’ve simplified for legibility.
(Note that $1/\sqrt{2} \approx 0.70710678$)</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="mf">0.70710678</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mf">0.70710678</span><span class="p">]</span>
</code></pre></div></div>
<h2 id="notes">Notes</h2>
<p>That’s it, folks. We’ve implemented a quantum circuit simulator.</p>
<ol>
<li>Our simulator will get exponentially slower with more qubits.
This is sadly unavoidable. Nonetheless, researchers have come up with lots of techniques
to make quantum simulations faster on classical computers,
e.g., using BDDs and parallelism.</li>
<li>Our simulator doesn’t implement measurement,
because it represents the entire probability distribution explicitly.</li>
<li>Our binary gates apply to contiguous (qu)bits.
This makes the <code class="language-plaintext highlighter-rouge">op</code> function simpler to write. We can generalize the <code class="language-plaintext highlighter-rouge">op</code> function to apply binary gates to any pair of (qu)bits, but it gets uglier. Since we have <code class="language-plaintext highlighter-rouge">swap</code>, we can always move qubits next to each other.</li>
<li>As an introduction to quantum computing, I recommend Matuschak and Nielsen’s <a href="https://quantum.country/">Quantum Country</a>.</li>
</ol>
<p><em>Thanks to John Cyphert for his insightful comments.</em></p>
<h2 id="the-entire-quantum-circuit-simulator">The Entire Quantum Circuit Simulator</h2>
<p>Here are all 27 lines of code.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="n">isq2</span> <span class="o">=</span> <span class="mf">1.0</span><span class="o">/</span><span class="p">(</span><span class="mf">2.0</span><span class="o">**</span><span class="mf">0.5</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Qstate</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">n</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">n</span> <span class="o">=</span> <span class="n">n</span>
<span class="bp">self</span><span class="p">.</span><span class="n">state</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="mi">2</span><span class="o">**</span><span class="bp">self</span><span class="p">.</span><span class="n">n</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">complex</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
<span class="c1"># apply transformation t to bit i
</span> <span class="c1"># (or i and i+1 in case of binary gates)
</span> <span class="k">def</span> <span class="nf">op</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">t</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
<span class="c1"># I_{2^i}
</span> <span class="n">eyeL</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">eye</span><span class="p">(</span><span class="mi">2</span><span class="o">**</span><span class="n">i</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">complex</span><span class="p">)</span>
<span class="c1"># I_{2^{n-i-1}}
</span> <span class="c1"># t.shape[0]**0.5 denotes how many bits t applies to
</span> <span class="c1"># in case of NOT, t.shape[0]**0.5 == 1
</span> <span class="n">eyeR</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">eye</span><span class="p">(</span><span class="mi">2</span><span class="o">**</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">n</span> <span class="o">-</span> <span class="n">i</span> <span class="o">-</span> <span class="nb">int</span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">**</span><span class="mf">0.5</span><span class="p">)),</span>
<span class="n">dtype</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nb">complex</span><span class="p">)</span>
<span class="c1"># eyeL ⊗ t ⊗ eyeR
</span> <span class="n">t_all</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">kron</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">kron</span><span class="p">(</span><span class="n">eyeL</span><span class="p">,</span> <span class="n">t</span><span class="p">),</span> <span class="n">eyeR</span><span class="p">)</span>
<span class="c1"># apply transformation to state (multiplication)
</span> <span class="bp">self</span><span class="p">.</span><span class="n">state</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">t_all</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">state</span><span class="p">)</span>
<span class="c1"># Hadamard gate
</span> <span class="k">def</span> <span class="nf">hadamard</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
<span class="n">h_matrix</span> <span class="o">=</span> <span class="n">isq2</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="p">])</span>
<span class="bp">self</span><span class="p">.</span><span class="n">op</span><span class="p">(</span><span class="n">h_matrix</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
<span class="c1"># T gate
</span> <span class="k">def</span> <span class="nf">t</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
<span class="n">t_matrix</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="n">isq2</span> <span class="o">+</span> <span class="n">isq2</span> <span class="o">*</span> <span class="mf">1j</span><span class="p">]</span>
<span class="p">])</span>
<span class="bp">self</span><span class="p">.</span><span class="n">op</span><span class="p">(</span><span class="n">t_matrix</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
<span class="c1"># S gate
</span> <span class="k">def</span> <span class="nf">s</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
<span class="n">s_matrix</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="o">+</span><span class="mf">1j</span><span class="p">]</span>
<span class="p">])</span>
<span class="bp">self</span><span class="p">.</span><span class="n">op</span><span class="p">(</span><span class="n">s_matrix</span><span class="p">,</span><span class="n">i</span><span class="p">)</span>
<span class="c1"># CNOT gate
</span> <span class="k">def</span> <span class="nf">cnot</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
<span class="n">cnot_matrix</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
<span class="p">])</span>
<span class="bp">self</span><span class="p">.</span><span class="n">op</span><span class="p">(</span><span class="n">cnot_matrix</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
<span class="c1"># Swap two qubits
</span> <span class="k">def</span> <span class="nf">swap</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
<span class="n">swap_matrix</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">]</span>
<span class="p">])</span>
<span class="bp">self</span><span class="p">.</span><span class="n">op</span><span class="p">(</span><span class="n">swap_matrix</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
</code></pre></div></div>
</content>
</entry>
<entry>
<title>Teaching Your SMT Solver Probability Theory</title>
<link href="https://barghouthi.github.io/2019/07/15/smt-probability/"/>
<updated>2019-07-15T00:00:00-05:00</updated>
<id>https://barghouthi.github.io/2019/07/15/smt-probability</id>
<content type="html"><p>The unexpected rise of SAT and SMT solvers has revolutionized software verification, both the automated and deductive flavors.
Simply, you encode program semantics as logical circuits and ask the SMT solver questions about them. How elegant.
But what happens when your program is randomized? Good luck! The first-order world of SMT solvers does not have the ingredients to sustain your stochastic existence. Go find another home.</p>
<p>In this post, I will show you how to teach your SMT solver probability theory.
The ideas here are a simplified view of a recent <a href="http://pages.cs.wisc.edu/~aws/papers/popl19.pdf">POPL paper</a> by my student <a href="http://pages.cs.wisc.edu/~cjsmith/">Calvin Smith</a>.</p>
<hr />
<h2 id="classical-verification">Classical verification</h2>
<p>To ground things, I will start with a classical (non-probabilistic) verification problem and encode it as a formula in first-order logic.
Take this program:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">f</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">x</span> <span class="o">+</span> <span class="mi">1</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">z</span> <span class="o">*</span> <span class="mi">2</span>
<span class="k">return</span> <span class="n">y</span>
</code></pre></div></div>
<p>If we’re working with infinite-precision integers, the following Hoare triple is valid — a positive input results in a positive output.</p>
\[\vdash \{x &gt; 0\} ~f(x)~ \{y &gt; 0\}\]
<p>To formally prove that this Hoare triple holds, we encode the precondition, postcondition, and program semantics as the following formula:</p>
\[(\underbrace{x &gt; 0}_{\text{pre}} \land \underbrace{z = x + 1 \land y = z * 2}_{\text{encoding of } f \ (\text{strongest post})}) \Longrightarrow \underbrace{y &gt; 0}_{\text{post}}\]
<p>If the SMT solver tells you that the formula is valid, then the Hoare triple is valid. If it’s not valid, the SMT solver will give you a counterexample.</p>
<h2 id="randomized-algorithm-example">Randomized algorithm example</h2>
<p>Let’s now look at a very simple randomized program, where <code class="language-plaintext highlighter-rouge">uniform(a,b)</code> returns a sample from the uniform distribution between the values <code class="language-plaintext highlighter-rouge">a</code> and <code class="language-plaintext highlighter-rouge">b</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">f</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">uniform</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">3</span><span class="o">*</span><span class="n">x</span><span class="p">)</span>
<span class="k">return</span> <span class="n">y</span>
</code></pre></div></div>
<p>Say we want to prove the following (probabilistic) Hoare triple:<sup id="fnref:union-bound" role="doc-noteref"><a href="#fn:union-bound" class="footnote" rel="footnote">1</a></sup></p>
\[\vdash_{\color{red}{1/3}} \{x &gt; 0\} ~f(x)~ \{y \geq x\}\]
<p>Let’s unpack this: If $x$ is positive, then $f$ returns a value of $y \geq x$, <em>but</em> there is at most a $\color{red}{1/3}$ probability of failing to satisfy the postcondition.</p>
<p>This Hoare triple is intuitively valid: values of $y$ are uniformly distributed between $0$ and $3x$, so getting a value of $y \geq x$ has a failure probability of $1/3$.</p>
<p><img src="https://barghouthi.github.io/assets/probability1.png" alt="Probability density function" /></p>
<p>Cool. But we want to automatically establish this Hoare triple with an SMT solver.
How? We’ll get rid of probability. Adios!</p>
<h2 id="turning-sampling-into-non-determinism">Turning sampling into non-determinism</h2>
<p>The idea is that the SMT solver needs to only know a few <em>axioms</em> about the probability distributions in order to construct the proof.
In our example, the proof relies on the obvious fact that $y \geq x$ with a probability of $2/3$.</p>
<p>If we know this fact, we can transform the program into a non-deterministic version that <em>tracks probability of failure</em>:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">f_nondet</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">pick</span> <span class="n">a</span> <span class="n">value</span> <span class="ow">in</span> <span class="p">[</span><span class="n">x</span><span class="p">,...,</span><span class="mi">3</span><span class="o">*</span><span class="n">x</span><span class="p">]</span>
<span class="n">w</span> <span class="o">=</span> <span class="mi">1</span><span class="o">/</span><span class="mi">3</span>
<span class="k">return</span> <span class="n">y</span><span class="p">,</span><span class="n">w</span>
</code></pre></div></div>
<p>We now have a non-probabilistic program:
we force
<code class="language-plaintext highlighter-rouge">y</code> to receive an arbitrary (non-deterministic) value between <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">3*x</code>;
but we know that this may not be true with a probability of <code class="language-plaintext highlighter-rouge">1/3</code>, so we store this fact in a new <em>ghost</em> variable <code class="language-plaintext highlighter-rouge">w</code>.
The transformation relies on the following insight:</p>
<ol>
<li>make whatever assumptions you want about the value of <code class="language-plaintext highlighter-rouge">y</code></li>
<li><em>but</em> remember the probability with which your assumptions might fail</li>
</ol>
<p>So now we can prove the above Hoare triple \(\vdash_{\color{red}{1/3}} \{x &gt; 0\} ~f(x)~ \{y \geq x\}\) using the transformed, non-deterministic program instead:</p>
\[(\underbrace{x &gt; 0}_{\text{pre}} \land \underbrace{x \leq y \leq 3x \land w = 1/3}_{\text{encoding of } f_\textit{nondet}}) \Longrightarrow (\underbrace{y \geq x}_{\text{post}} \land \underbrace{w \leq \color{red}{1/3}}_{\text{failure prob.}})\]
<h2 id="picking-the-right-axioms">Picking the right axioms</h2>
<p>In our example, we gave the SMT solver exactly the axiom it needs to know about the uniform distribution.
But, in general, we want to automatically discover the right axiom to get the proof to go through.
Calvin’s insight was that we can see this as a <em>program synthesis</em> problem!</p>
<p>The idea is to use an <em>axiom family</em> and synthesize the appropriate axiom from this family.
Check out this parameterized version of <code class="language-plaintext highlighter-rouge">f_nondet</code> above:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">f_synth</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">pick</span> <span class="n">a</span> <span class="n">value</span> <span class="ow">in</span> <span class="p">[</span><span class="err">?</span><span class="mi">1</span><span class="p">,...,</span><span class="err">?</span><span class="mi">2</span><span class="p">]</span>
<span class="n">w</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">-</span> <span class="p">(</span><span class="err">?</span><span class="mi">2</span> <span class="o">-</span> <span class="err">?</span><span class="mi">1</span><span class="p">)</span> <span class="o">/</span> <span class="mi">3</span><span class="o">*</span><span class="n">x</span>
<span class="k">return</span> <span class="n">y</span><span class="p">,</span><span class="n">w</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">?1</code> and <code class="language-plaintext highlighter-rouge">?2</code> are two unknown expressions that we want to synthesize; they define the assumption we are making.
Depending on what we choose, we will <em>incur</em> a different probability of failure <code class="language-plaintext highlighter-rouge">w</code>.</p>
<p><img src="https://barghouthi.github.io/assets/probability2.png" alt="Probability density function of axiom family" /></p>
<p>So now you can use your favorite program synthesis engine to synthesize values for the unknowns such that the postcondition \(y \geq x\) is true and
and the failure probability $w \leq 1/3$.
Say we pick <code class="language-plaintext highlighter-rouge">2*x</code> and <code class="language-plaintext highlighter-rouge">3*x</code> for <code class="language-plaintext highlighter-rouge">?1</code> and <code class="language-plaintext highlighter-rouge">?2</code>.
We get the following program:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">f_synth_inst1</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">pick</span> <span class="n">a</span> <span class="n">value</span> <span class="ow">in</span> <span class="p">[</span><span class="mi">2</span><span class="o">*</span><span class="n">x</span><span class="p">,...,</span><span class="mi">3</span><span class="o">*</span><span class="n">x</span><span class="p">]</span>
<span class="n">w</span> <span class="o">=</span> <span class="mi">2</span><span class="o">/</span><span class="mi">3</span>
<span class="k">return</span> <span class="n">y</span><span class="p">,</span><span class="n">w</span>
</code></pre></div></div>
<p>This satisfies our postcondition — that $y \geq x$ — but with a failure probability of $2/3$, higher than our goal of $1/3$.</p>
<p>Now check out this other instantiation where we set <code class="language-plaintext highlighter-rouge">?1</code> to <code class="language-plaintext highlighter-rouge">0</code> and <code class="language-plaintext highlighter-rouge">?2</code> to <code class="language-plaintext highlighter-rouge">3*x</code>:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">f_synth_inst2</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">pick</span> <span class="n">a</span> <span class="n">value</span> <span class="ow">in</span> <span class="p">[</span><span class="mi">0</span><span class="p">,...,</span><span class="mi">3</span><span class="o">*</span><span class="n">x</span><span class="p">]</span>
<span class="n">w</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">return</span> <span class="n">y</span><span class="p">,</span><span class="n">w</span>
</code></pre></div></div>
<p>This has a 0 probability of failure: <code class="language-plaintext highlighter-rouge">y</code> always is between <code class="language-plaintext highlighter-rouge">0</code> and <code class="language-plaintext highlighter-rouge">3*x</code> with a probability of 1. But it does not satisfy the postcondition, since <code class="language-plaintext highlighter-rouge">y</code> may very well be less than <code class="language-plaintext highlighter-rouge">x</code>.</p>
<p>The synthesizer should return the program <code class="language-plaintext highlighter-rouge">f_nondet</code> above,
which sets <code class="language-plaintext highlighter-rouge">?1</code> to <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">?2</code> to <code class="language-plaintext highlighter-rouge">3*x</code>.</p>
<h2 id="synthesis-problem">Synthesis problem</h2>
<p>To solve the synthesis problem above with an SMT solver,
we encode the problem in the form of \(\exists.\forall .\varphi\):</p>
\[\exists ?_1, ?_2 . \forall x,y,z,w .\]
\[(\underbrace{x &gt; 0}_{\text{pre}} \land \underbrace{?_1 \leq y \leq ?_2 \land w = 1 - (?_2-?_1)/3x }_{\text{encoding of } f_\textit{synth}}) \Longrightarrow (\underbrace{y \geq x}_{\text{post}} \land \underbrace{w \leq \color{red}{1/3}}_{\text{failure prob.}})\]
<p>The idea is we want to find ($\exists$) solutions to the unknowns $?_1$ and $?_2$
such that for any execution ($\forall$) where $x&gt;0$ the postcondition holds and the failure probability is no more than $1/3$.</p>
<p>A solution to this problem is one that sets $?_1$ to $x$ and $?_2$ to $3x$,
resulting in the program <code class="language-plaintext highlighter-rouge">f_nondet</code> above, whose correctness implies the Hoare triple \(\vdash_{\color{red}{1/3}} \{x &gt; 0\} ~f(x)~ \{y \geq x\}\).</p>
<h2 id="conclusion">Conclusion</h2>
<p>That’s it. We’ve thrown probability away.
Now you can reason about randomized algorithms with first-order logic.
But that’s not to say that solving the resulting formulas is easy!</p>
<p>Our <a href="http://pages.cs.wisc.edu/~aws/papers/popl19.pdf">paper</a> gives a full-blow, soundness-police-compliant view of this idea – and a lot of implementation details because some of these formulas involve non-linear arithmetic and quantifier alternation. We manage to automatically prove accuracy properties of some sophisticated algorithms from the differential privacy literature.
It’s really fascinating how far we can take SMT solvers.</p>
<p><em>Thanks to Calvin Smith for comments on an earlier draft. I stole the figures from his slides.</em></p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:union-bound" role="doc-endnote">
<p>Notation from union bound logic, a probabilistic Hoare logic due to <a href="https://arxiv.org/abs/1602.05681">Barthe et al.</a> <a href="#fnref:union-bound" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
</li>
</ol>
</div>
</content>
</entry>
<entry>
<title>Differentiable Programming: A Semantics Perspective</title>
<link href="https://barghouthi.github.io/2018/05/01/differentiable-programming/"/>
<updated>2018-05-01T00:00:00-05:00</updated>
<id>https://barghouthi.github.io/2018/05/01/differentiable-programming</id>
<content type="html"><p>So deep learning has taken the world by storm.
Frameworks for training deep neural networks, like <a href="https://www.tensorflow.org/">TensorFlow</a>, allow you to construct so-called <em>differentiable programs</em>.
The idea is that one can compute the derivative of some program (usually some neural net), and then use that to optimize its parameters.</p>
<p>I wrote this post to introduce researchers in the verification and programming languages community to automatic differentiation of programs.
The assumption is that—extrapolating from myself here—you got into this field for the love of logic and discrete math (and an unhealthy aversion to continuous mathematics).</p>
<hr />
<h2 id="programming-language">Programming language</h2>
<p>Let’s consider a very simple programming language where there are no loops or conditions, just a sequence of assignment statements of the form:</p>
\[v_1 \gets c\]
\[v_1 \gets v_2 \times v_3\]
\[v_1 \gets v_2 + v_3\]
\[v_1 \gets cos(v_2)\]
<p>Here, $c$ is a real-valued constant and $v_i$ are real-valued program variables.
Any program $P$ in this language is assumed to have a single special input variable $x$ and an output variable $y$.</p>
<p>$P$ is also assumed to be in <em>static single assignment</em> (SSA) form—i.e., each variable gets assigned to at most once.
This is equivalent to <em>continuation-passing style</em> (CPS). If you’ve used TensorFlow, the <em>computation graph</em> that you construct there is effectively a program in SSA, where each graph node represents one variable’s assignment.</p>
<h2 id="example-program">Example program</h2>
<p>Note that programs in our language are functions in $\mathbb{R} \to \mathbb{R}$.
Consider the function $f(x) = x^2 + cos(x^2)$.
We can write this in our language as the program $P$ below:</p>
\[\begin{align*}
v_1 &amp;\gets x \times x\\
v_2 &amp;\gets cos(v_1)\\
y &amp;\gets v_1 + v_2
\end{align*}\]
<p>If you plot this function, you get the following spooky graph:
<img src="https://barghouthi.github.io/assets/graph.png" alt="Graph of running example" /></p>
<p>If you remember your calculus, the partial derivative of a function $\frac{\partial f}{\partial x}$ is essentially the rate of change of the output $y$ as $x$ changes.
For our function $f$,</p>
\[\frac{\partial f}{\partial x}(x) = 2x - 2x \times sin(x)\]
<p>Notice that $\frac{\partial f}{\partial x}(0) = 0$,
since $x = 0$ is a <em>stationary point</em>, so the rate of change at that point is 0.</p>
<p><em>Technically, we’re computing total derivatives in this post, since we only have one input variable $x$, which I enforce for simplicity. The general methodology I lay out here easily extends to functions with multiple input arguments.</em></p>
<h2 id="language-semantics">Language semantics</h2>
<p>The semantics of our little language is standard.
A state $s$ of a program $P$ is a map from variables
to real numbers.
The function $\textit{post}$ below takes a program and a state $s$ and returns the state resulting from executing $P$:</p>
<ol>
<li>$\textit{post}(P_1;P_2, s) \triangleq \textit{post}(P_2,\textit{post}(P_1,s))$</li>
<li>$\textit{post}(v_1 \gets c, s) \triangleq s[v_1 \mapsto c]$</li>
<li>$\textit{post}(v_1 \gets v_2 \times v_3, s) \triangleq s[v_1 \mapsto s(v_2) \times s(v_3)]$</li>
<li>$\textit{post}(v_1 \gets v_2 + v_3, s) \triangleq s[v_1 \mapsto s(v_2) + s(v_3)]$</li>
<li>$\textit{post}(v_1 \gets cos(v_2), s) \triangleq s[v_1 \mapsto cos(s(v_2))]$</li>
</ol>
<p>Above, $P_1;P_2$ denotes sequential composition,
$s(v)$ denotes the value of $v$ in state $s$, and $s[v \mapsto c]$ denotes state $s$ but with $v$ mapping to the value $c$.</p>
<h2 id="forward-differentiation">Forward differentiation</h2>
<p>We will now extend the semantics such that evaluating $P$ on input $x$ not only returns $P(x)$, but also $\frac{\partial P}{\partial x}(x)$, the partial derivative of $P$ w.r.t. the input variable $x$.</p>
<p>Below, we define the new semantics with a function $\partial\textit{post}$, where we keep track of two copies of the program variables, the variables $v_i$ and a new copy $\dot v_i$, which denotes the rate of change of $v_i$ w.r.t. the input $x$, i.e.,</p>
\[\dot v_i = \frac {\partial v_i}{\partial x}(x)\]
<p>Finally, when the program terminates with the new semantics, we can recover the variable $\dot y$, which will hold the value $\frac{\partial P}{\partial x}(x)$.</p>
<p><em>Note that, by definition, $\dot x = 1$.</em></p>
<h3 id="sequential-composition">Sequential composition</h3>
<p>For sequential composition, $P_1;P_2$, $\partial\textit{post}$ behaves just like $\textit{post}$.</p>
<h3 id="constant-assignment">Constant assignment</h3>
<p>For the constant assignment $v_1 \gets c$,
we have</p>
\[\partial\textit{post}(v_1 \gets c, s) \triangleq s[v_1 \mapsto c][ \dot v_1 \mapsto 0]\]
<p>In other words, the rate of change of $v_1$ is zero, since it’s not dependent on $x$ in any way.</p>
<h3 id="addition">Addition</h3>
<p>For addition, we have</p>
\[\partial\textit{post}(v_1 \gets v_2 + v_3, s) \triangleq s[v \mapsto s(v_1) + s(v_2)][ \dot v \mapsto s(\dot v_1) + s(\dot v_2)]\]
<p>That is, the rate of change of $v_1$ is the sum of the rates of change of $v_2$ and $v_3$.</p>
<h3 id="multiplication">Multiplication</h3>
<p>For multiplication,</p>
\[\partial\textit{post}(v_1 \gets v_2 \times v_3, s) \triangleq s[v_1 \mapsto s(v_2) \times s(v_3)][\dot v_1 \mapsto \dot v_2 \times v_3 + v_2 \times \dot v_3]\]
<p>In other words, the rate of change of $v_1$ w.r.t. $x$ is the rate of change of $v_2$, scaled by $v_3$, plus the rate of change of $v_3$, scaled by $v_2$.</p>
<h3 id="trigonometric-functions">Trigonometric functions</h3>
<p>For cosine, we have</p>
\[\partial\textit{post}(v_1 \gets cos(v_2), s) \triangleq s[v \mapsto cos(s(v_2))] [\dot v \mapsto \dot v_2 \times - sin(s(v_2))]\]
<p>This follows from the <em>chain rule</em>, which says that the rate of change of $f(u)$ is the rate of change of $f$ scaled by the rate of change of its argument $u$.
You might remember that the derivative of $cos(x)$ is $-sin(x)$, so, following the chain rule, we simply scale $-sin(v_2)$ by $\dot v_2$.</p>
<h2 id="example-continued">Example continued</h2>
<p>Continuing our above example with the program $P$ encoding the function $f(x) = x^2 + cos(x^2)$,
we can now execute $P$ using our new semantics.
Say, we begin executing $P$ from the state where $x = 0$.
At the end of the execution, we will get a state
where $y = 1$, and $\dot y = 0$.</p>
<p>Let’s step through the program one instruction at a time, maintaining both copies of the variables at every point along the way.</p>
\[\begin{align*}
[x = 0, \dot x = 1, \ldots]\\
v_1 &amp;\gets x \times x\\
[v_1 = 0, \dot v_1 = 0, \ldots]\\
v_2 &amp;\gets cos(v_1)\\
[v_2 = 1, \dot v_2 = 0, \ldots]\\
y &amp;\gets v_1 + v_2\\
[y = 1, \dot y = 0, \ldots]\\
\end{align*}\]
<h2 id="notes">Notes</h2>
<p>I covered the simpler case of forward differentiation, which proceeds by executing the program in a forward manner. For functions with more than one input, it is more efficient to perform backward differentiation, which the popular <em><a href="https://en.wikipedia.org/wiki/Backpropagation">backpropagation</a></em> algorithm is an instance of. Adapting the above semantics to backpropagation is not hard, it’s just messier, as we have to execute the program forward and then backward. Therefore, I decided to illustrate the forward mode only. For more information, I encourage you to read the excellent survey by <a href="https://arxiv.org/abs/1502.05767">Baydin et al.</a>, which heavily influenced my presentation.</p>
<p><em>Thanks to Kartik Agaram, Ben Liblit, and David Cabana for catching typos and errors.</em></p>
</content>
</entry>
<entry>
<title>Fairification: Making Unfair Programs Fair</title>
<link href="https://barghouthi.github.io/2017/05/01/debiasing/"/>
<updated>2017-05-01T00:00:00-05:00</updated>
<id>https://barghouthi.github.io/2017/05/01/debiasing</id>
<content type="html"><p>Over the past year, we have been exploring the notion of <a href="http://pages.cs.wisc.edu/~aws/papers/fatml16.pdf"><em>algorithmic fairness</em> from a PL/verification perspective</a>.
Today I’m going to talk about a new paper we have that is appearing at CAV 2017: <a href="http://pages.cs.wisc.edu/~aws/papers/cav17.pdf"><em>Repairing Decision-making Programs under Uncertainty</em></a>, with Samuel Drews and Loris D’Antoni.</p>
<hr />
<h2 id="algorithmic-fairness">Algorithmic fairness</h2>
<p>With software rapidly overtaking sensitive decision-making processes, like policing and sentencing, many people have been very concerned with unfairness in automated decision-making.
The past year or so has seen lots of attention in this space, for example:</p>
<ul>
<li><a href="https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing">ProPublica’s investigation</a> uncovered bias against African Americans in software used for risk assessment in courtrooms (including here in the state of Wisconsin), which judges can use to inform their decisions.</li>
<li>Cathy O’Neil published <a href="https://www.amazon.com/Weapons-Math-Destruction-Increases-Inequality/dp/0553418815"><em>Weapons of Math Destruction</em></a>, an excellent book warning about a world run by unregulated, opaque algorithms.</li>
<li>The pre-Trump White House released a <a href="https://obamawhitehouse.archives.gov/sites/default/files/whitehouse_files/microsites/ostp/NSTC/preparing_for_the_future_of_ai.pdf">report</a> on AI that explicitly warned about encoding discrimination in automated decision-making.</li>
</ul>
<p>And this is just part of the popular coverage of algorithmic fairness.
In our unpopular (academic) world, the action has been primarily in the machine learning arena, where researchers have been studying ways to learn fair classifiers, for certain definitions of fairness.</p>
<h2 id="unfair-programs-well-fairify-them-for-you">Unfair programs? We’ll fairify them for you!</h2>
<p>While algorithmic unfairness is an alarming
issue with potentially large-scale negative effects,
I believe that the move to algorithmic decision-making
has a silver lining: We can rigorously
reason about programs, debug them, and fix them.</p>
<p>In our work, we went after the following problem:
Say we’re given a program that decides
whether to hire a job applicant
that is unfair (more on what that means in a bit).
By program I mean a piece of code that
is maybe a machine learning model, a
script distilled from the wisdom of a VP,
an SQL query written by a data scientist, whatever!
Our view is that such a program is probably
not designed to be blatantly unfair.
So what we’d like to do is to tweak it a little
bit and make it fair—I like to (unofficially)
call this process <em>fairification</em>.</p>
<p>The main question that I’ve avoided
so far is <em>how do you formalize fairness?!</em>
This is a deep philosophical problem.
But computer scientists love to formalize
the unformalizable!
Recent work in the area has proposed
several definitions.
Let’s look at the one from <a href="https://arxiv.org/pdf/1412.3756.pdf">Feldman et al.</a>,
which formalizes the 80–20
rule of thumb from the Equality of Employment
Commission here in the US:</p>
\[\frac{Pr [hire | minority]}
{Pr [hire | \neg minority]} &gt; 0.8\]
<p>The intuition is simple: the probability
of hiring from the minority applicant pool
is at least 80% that of hiring from the non-minority pool—assuming a binary split
of the population.</p>
<p>OK, great. We’ve fully formalized
fairness, leaving no room for philosophy.</p>
<p>For an ultra-simple illustration,
say the program we have is the following:
It only takes one thing about the applicant,
the rank of the college they attended.
If the applicant attended a top-ten
school, then, good for them, they get hired!</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">hire</span><span class="p">(</span><span class="n">urank</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">1</span> <span class="o">&lt;=</span> <span class="n">urank</span> <span class="o">&lt;=</span> <span class="mi">10</span>
</code></pre></div></div>
<p>Is this program fair?
Well, it depends on the population!
We will represent the population as
a probabilistic model:
10% of the population are minorities;
non-minorities go to schools ranked 10
on average; minorities go to schools
ranked 15 on average (here $min$
is 1 for minority and 0 for non-minority).</p>
\[min \sim Bernoulli(0.1)\\
urank \sim Gaussian(10 + 5*min, 10)\]
<p>With this population model,
this program is unfair; on the above
fairness definition, the ratio
evaluates to ~0.6.
How do we fix it?</p>
<p>Our approach proceeds like this:
First, we characterize a class
of programs using a <em>sketch</em>
(I talked about sketches in the last <a href="/2017/04/24/synthesis-primer/">post</a>).
One possible sketch here is the following,
where <code class="language-plaintext highlighter-rouge">??</code> are unknowns.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">hire</span><span class="p">(</span><span class="n">urank</span><span class="p">):</span>
<span class="k">return</span> <span class="err">??</span> <span class="o">&lt;=</span> <span class="n">urank</span> <span class="o">&lt;=</span> <span class="err">??</span>
</code></pre></div></div>
<p>Essentially, the sketch characterizes
a family of programs
(ML people call this a hypothesis class).
In this case, we’ve knocked out
the constants in the program and
we’re hoping to replace them with new ones
to make it fair.
The same idea can be extended to not only
constants, but also instructions and branching.
The sketch encodes our <em>repair model</em>,
the various ways in which we can tweak
the original program.</p>
<p>Now, we want to find a completion of this
sketch such that</p>
<ul>
<li>1) The completion is <em>semantically close</em> to the original program.
(Semantic closeness just means that
the two programs agree on most inputs.)</li>
<li>2) The completion is fair according
to the definition above.</li>
</ul>
<p>The idea is that we want to give the program a small <em>nudge</em> to make it fair.
One possible completion is the following:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">hire</span><span class="p">(</span><span class="n">urank</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">1</span> <span class="o">&lt;=</span> <span class="n">urank</span> <span class="o">&lt;=</span> <span class="mi">15</span>
</code></pre></div></div>
<p>This program happens to be fair, per the above definition, and is semantically close to the original program.
In a sense, we kept increasing the upper bound on the college ranking until we got a fair program. Our tool would find such completion.</p>