index.html

<!doctype html>
<html>
<head>
    <meta charset="utf-8">
    <title>A/B Testing</title>
    <meta name="author" content="James Ha">

    <meta name="apple-mobile-web-app-capable" content="yes">
    <meta name="apple-mobile-web-status-bar-style" content="black-translucent">
    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">

    <link rel="stylesheet" href="css/reveal.min.css">
    <link rel="stylesheet" href="custom/css/foo.min.css">
    <link rel="stylesheet" href="css/theme/custom.css" id="theme">

    <!-- Theme used for syntax highlighting of code -->
    <link rel="stylesheet" href="lib/css/zenburn.css">

    <!-- Printing and PDF exports -->
    <script>
        var link = document.createElement('link');
        link.rel = 'stylesheet';
        link.type = 'text/css';
        link.href = window.location.search.match(/print-pdf/gi) ? 'css/print/pdf.css' : 'css/print/paper.css';
        document.getElementsByTagName('head')[0].appendChild(link);
    </script>
</head>
<body>
<div class="reveal">
    <div class="slides">
        <section
                data-transition="fade-in">
            <h1>A/B Testing</h1>
            <ul>
                <li class="fragment">Controlled experiments to determine a causal relationship between changes to an
                    application and their influence on observable behavior.
                </li>
            </ul>
            <aside class="notes">
                <p>These are my notes</p>
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <h2>Kinds of A/B Tests</h2>
            <ul>
                <li class="fragment">Split Tests, a.k.a. <strong>Hpothesis Testing</strong></li>
                <li class="fragment"><strong>Multivariate Tests</strong>
                    <ul>
                        <li><em>Which really aren't A/B tests, but get grouped in with them anyway</em></li>
                    </ul>
                </li>
                <li class="fragment"><strong>A/A Tests</strong> - to make sure the service isn&rsquo;t fake news</li>
                <li class="fragment">Bayesian craziness, a.k.a. <strong>Simulations</strong></li>
            </ul>
            <div class="slide-footer-left">
                <p class="footer-text">Introduction</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <h2>Yeah, but why?</h2>
            <ul>
                <li>Improves User Experience</li>
                <li>Accelerates Innovation</li>
                <li><strong style="font-family: didot;">CASH MONEY $$$ SYNERGY ROI</strong></li>
                <li><strong style="font-family: didot;">THE BLOCKCHAIN</strong></li>
            </ul>
            <div class="slide-footer-left">
                <p class="footer-text">Introduction</p>
            </div>
            <aside class="notes" data-markdown>
                * **Improves User Experience** - we pick the variant that users prefer
                * **Accelerates Innovation** - instead of wondering for ourselves which variant is better, we just test, and then have data to back up our choices
                * Do you know that part on Amazon where it says, "people who bought this item also bought...."
                * That part is the result of an A/B Test, implemented by a guy named Greg Linden.
                * Actually, marketing senior vice president was very against it because he thought adding more additional information would overload the user when the use should be focused on checking out.
                * So Greg Linden stuck the code in there in an A/B Test
                * It turns out that implementing that feature made them a ton of money
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <h2>Concerns in A/B Tests</h2>
            <ul>
                <li class="fragment">How can I measure and analyze my test?</li>
                <li class="fragment">How big does my test need to be?</li>
                <li class="fragment">What can I test?</li>
                <li class="fragment">What counts as a test?</li>
                <li class="fragment">What are "statistics?"</li>
                <li class="fragment">What is "probability?"</li>
            </ul>
            <div class="slide-footer-left">
                <p class="footer-text">Introduction</p>
            </div>
            <aside class="notes" data-markdown>
                * If the test is too small, the data might just be noise.
                * If the test is too big, we will move too slowly
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <h2>Presentation Overview</h2>
            <ol>
                <li>Define Probability and Statistics</li>
                <li>Frequentist Statistics</li>
                <li>Bayesian Statistics</li>
                <li><span style="text-decoration: line-through;">Multivariate Testing</span></li>
                <li>Implementation</li>
                <li>Faults in A/B Tests</li>
                <li>A picture of a red panda</li>
            </ol>
            <div class="slide-footer-left">
                <p class="footer-text">Introduction</p>
            </div>
            <aside class="notes" data-markdown>
                * Sorry, I'm gonna skip multivariate testing (or at least the math for it) because it is nuts
                complicated
                * Everything that makes A/B testing complicated is orders of magnitude crazier when you introduce
                multi-dimensional parameters
                * Towards the end of part of Bayesian statics I'll mention briefly a family of tools called Markov Chain
                Monte Carlo simulations, which is one of the major elements of multivariate testing.
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <h1>1. Define Probability & Statistics</h1>
            <video data-autoplay loop>
                <source data-src="custom/media/adventure-time.mp4" type="video/mp4"/>
            </video>
        </section>
        <section
                data-background-video="custom/media/adventure-time.mp4"
                data-background-video-loop="true"
                data-transition="fade-in">
            <aside class="notes">
                <ul>
                    <li>So I'm gonna try to cram a bunch of statistics in the next hour</li>
                    <li>I really do think it's important to understand why or how decisions in A/B testing are made.
                    </li>
                    <li>We could skip all this and just say, okay show me the formulas and I'll plug the A/B test data
                        in
                    </li>
                    <li>I hope someday we can implement that for our applications, but it's worth noting that very
                        critical business and technical decisions can be made over A/B testing analysis, we have to be
                        sure of the reasoning
                    </li>
                    <li>I'm gonna throw a lot of math at you guys and it's not too important to memorize the formulas or
                        proofs, but getting the general concepts and approaches to thinking about statistics is key.
                    </li>
                    <li>I know there might be a lot of questions, so I guess stop me if I totally lose you.</li>
                </ul>
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <h2>What&rsquo;s the point?</h2>
            <ol>
                <li>If I flip a coin 10 times and it lands heads 7 times, is the coin rigged? \(P(X = 7)\)</li>
                <li class="fragment">10% of contributors who debrief a session will create an annotation.
                    <ul>
                        <li>A change is made to the application, and 8 of the next 50 debriefing contributors create
                            annotations.
                        </li>
                        <li>Did the change have an effect?</li>
                        <li>Is a sample size \(n=50\) good enough?</li>
                    </ul>
            </ol>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
            <aside class="notes" data-markdown>
                * If you can solve for the first problem, you are well on your way to solve the second problem
                * This will be a pattern for most of the talk.
                * I'll introduce an example using coin flips or dice throws
                * And we'll use the solution to those examples to solve real A/B testing problems
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <h2>Probability 101</h2>
            <ul>
                <li>What is Probability \((P)\)? A measure of the likelihood an event will occur.</em></li>
                <li>\(P(A)\) - The probability of event A</li>
                <li>\(p=0\) - the event will not occur</li>
                <li>\(p=1\) - the event will occur</li>
                <li>\(P(X = x) = p(x)\)</li>
            </ul>
            <aside class="notes">
                <ul>
                    <li>Depending on which textbook or paper you read, different authors use different notations.</li>
                    <li>The notation I'll be using will the ones I see most commonly, and I'll try to consistent</li>
                    <li>Some people write the variance symbole as sigma squared, and some use capital V-A-R</li>
                    <li>Probability of an event is capital P</li>
                    <li>Events are always uppercase letters</li>
                    <li>Lower case p is the probability function, or if you can express the probability of something as
                        a formula
                    </li>
                </ul>
            </aside>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <h2>Conditional Probability</h2>
            <ul>
                <li>
                    <strong>Conditional Probability</strong> \(P(A|B)\) - the probability of event A occurring, given B
                    has occurred
                    <span class="math-formula">\(P(A|B) = \frac{P(A \cap B)}{P(B)}\)</span>
                </li>
                <li>
                    <strong>Multiplication Rule</strong>
                    <span class="math-formula">\(P(A \cap B) = P(A|B) \cdot P(B)\)</span>
                </li>
                <li>
                    <p><strong>Mutually Exclusive</strong></p>
                    <span class="math-formula">\(P(A \cap B) = P(A) \times P(B)\)</span>
                </li>
            </ul>
            <aside class="notes">
                <ul>
                    <li>upside-down U is intersection</li>
                </ul>
            </aside>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <h2>Conditional Probability</h2>
            <p>What is the probability that you draw from a pack of cards 2 diamonds in a row?</p>
            <span class="math-formula fragment">\(P(D_2 \cap D_1)\)</span>
            <span class="math-formula fragment">= \(P(D_2|D_1) \times P(D_1)\)</span>
            <div class="fragment">
                <p class="math-formula">\(P(D_1) = \frac{13}{52} = \frac{1}{4}\)</p>
                <p class="math-formula">\(P(D_2|D_1) = \frac{12}{51}\)</p>
            </div>
            <p class="math-formula fragment">\(P(D_2 \cap D_1) = \frac{1}{4} * \frac{12}{51} = \frac{3}{51}\)</p>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
            <aside class="notes">
                <ul>
                    <li>Ahem, What is the probability that the second card drawn from a pack of cards is a diamond
                        <span>\(P(D_2 \cap D_1)\)</span>, given that the first card drawn was a diamond P(D1)?
                    </li>
                    <li>Probability of drawing the first diamond is 13/52</li>
                    <li>Since there is 1 less card, the probability of drawing the second diamond given that a diamond
                        has been drawn is 12/51
                    </li>
                </ul>
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <h2>Sally Clark</h2>
            <p></p>
            <ul>
                <li><em>What is the probability that 2 babies from one mother die of SIDS?</em></li>
                <li><em>Are these events dependent or independent?</em></li>
                <li class="fragment">Given that a mother's first baby died of SIDS, what is the probability that her
                    second baby will also die of SIDS? \(P(B_2|B_1)\)
                </li>
            </ul>
            <ul class="fragment">
                <li>Expert testimony from Dr. Roy Meadow: \(P(B)^2 = (1/8543)^2 \approx 1/73000000\)</li>
                <li>Convicted in Nov. 1999 on this evidence</li>
            </ul>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
            <aside data-markdown class="notes">
                * What is the probability that a mother's first 2 babies dies of SIDS? Are these events dependent or
                independent?
                * Given that a mother's first baby died of SIDS, what is the probability that her second baby will also
                die of SIDS?
                * Professor Sir Roy Meadow incorrectly concluded that these were 2 independent events, and famously said
                it was a "1 in 73 million" chance.
                * This false statistic lead to her false conviction in November 1999.
                * Probability and statistics got distorted - people understood "1 in 73 million" to be a measure of her
                innocence.
                * Also the pathologist hid knowledge that one of the babies had a staph infection
                * She was release on January 2003, but died in March 2007 from alcohol poisoning.
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Discrete Random Variables</h2>
                <ul>
                    <li><strong>Random Variables</strong> - variable with a value determined by a chance event.</li>
                    <li>A <strong>discrete sample space</strong> \(\Omega\) is a finite set of outcomes \(\{\omega_1,
                        \omega_2...*\}\). The probability of outcome \(\omega\) is \(P(\omega)\).
                    </li>
                </ul>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div data-markdown>
                ## Discrete Probability Distribution

                *a distribution that can be represented by a table.*
            </div>

            <div class="fragment">
                <div data-markdown>
                    * Example: there are only 4 outcomes from flipping a coin 2 times.
                </div>
                <table>
                    <tr>
                        <th>Number of Heads, \(x\)</th>
                        <th>Probability \(P(x)\)</th>
                    </tr>
                    <tr>
                        <td>0</td>
                        <td>0.25</td>
                    </tr>
                    <tr>
                        <td>1</td>
                        <td>0.5</td>
                    </tr>
                    <tr>
                        <td>2</td>
                        <td>0.25</td>
                    </tr>
                </table>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
            <aside class="notes" data-markdown>
                * Note that the sum of all these probabilities equals 0.
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Discrete Functions</h2>
                <ul>
                    <li><strong>Probability Mass Function pmf</strong> - the function for a discrete random
                        variable<br/>
                        <span class="math-formula">$$p(a) = P(X = a)$$ $$0 \le p(a) \le 1$$</span>
                    </li>
                    <li><strong>Cumulative Distribution Function cdf</strong> - the function that gives the total
                        probabilities from \(\infty\) to \(a\)<br/>
                        <span class="math-formula">$$F(a) = P(X \le a)$$</span>
                    </li>
                </ul>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
            <aside class="notes" data-markdown>
                * You can often determine which one you need by listening to the question.
                * If someone asks what is the probability of getting exactly blah heads in so many coin flips, it's a
                pmf
                * If someone asks what is the probability of getting at least blah head in so many coin flips, it's a
                cdf
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Example: Discrete Functions</h2>
                <ul>
                    <li>Let the sample space \(\Omega\) be 2 dice rolls</li>
                    <li>Random variable \(M\) is the maximum of 2 dice rolls.
                        <span class="math-formula">$$M(1,4) = 4$$</span>
                    </li>
                </ul>
                <div class="fragment">
                    <table>
                        <tr>
                            <th>value</th>
                            <th>\(a\)</th>
                            <th>1</th>
                            <th>2</th>
                            <th>3</th>
                            <th>4</th>
                            <th>5</th>
                            <th>6</th>
                        </tr>
                        <tr>
                            <td>pmf</td>
                            <td>\(p(a)\)</em></td>
                            <td>1/36</td>
                            <td>3/36</td>
                            <td>5/36</td>
                            <td>7/36</td>
                            <td>9/36</td>
                            <td>11/36</td>
                        </tr>
                        <tr>
                            <td>cdf</td>
                            <td>\(F(a)\)</td>
                            <td>1/36</td>
                            <td>4/36</td>
                            <td>9/36</td>
                            <td>16/36</td>
                            <td>25/36</td>
                            <td>36/36</td>
                        </tr>
                    </table>
                </div>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Discrete Mean &amp; Variance</h2>
                <div>
                    <ul>
                        <li><strong>Discrete Mean</strong> - the average of a discrete random variable \(X\) a.k.a.
                            <strong>expected value</strong> of \(X\)
                            <span class="math-formula">$$E(X) = \mu_{x} = \sum_{i=1}^n p(x_{i}){x_{i} }$$</span>
                        </li>
                    </ul>
                </div>
                <div class="fragment">
                    <ul>
                        <li><strong>Discrete Variance</strong> - a measure of how much the probability mass is spread
                            out around \(\mu\).
                            <span class="math-formula">$$Var(X) = E((X- \mu )^{2}) = \sum_{i=1}^n p({x_{i})(x_{i}- \mu )^{2} }$$</span>
                        </li>
                    </ul>
                </div>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Discrete Standard Deviation</h2>
                <div>
                    <ul>
                        <li><strong>Discrete Standard Deviation \(\sigma\)</strong> - a measure of the spread, expressed
                            in the same units as the expected Value
                            <span class="math-formula">$$\sigma = \sqrt{Var(X)}$$</span>
                            <span class="math-formula fragment">$$Var(X) = \sigma^2$$</span></li>
                    </ul>
                </div>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
            <aside class="notes">
                <ul>
                    <li>Expected Value is written as capital E of x</li>
                    <li>Mean is usually written with lowercase Greek &mu;</li>
                </ul>
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <div data-markdown>
                ## Bernoulli Trial

                * An experiment that only has two possible results - *success* and *failure* - is a **Bernoulli Trial**
                if:
                * The results are mutually exclusive,
                * The probability of these two results do not change each time the experiment is done
            </div>
            <span class="math-formula">$$X =\begin{cases}1 & success\\0 & failure\end{cases}$$</span>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Bernoulli Distribution</h2>
                <ul>
                    <li>Success:<br/>
                        <span class="math-formula">\(P(X = 1) = p\)</span>
                    </li>
                    <li>Failure:<br/>
                        <span class="math-formula">\(P(X = 0) = 1-p\)</span>
                    </li>
                    <li>Expected Value:<br/>
                        <span class="math-formula">\(E(X) = \mu = 0(1-p) + 1(p) = p\)</span>
                    </li>
                    <li>Variance:<br/>
                        <span class="math-formula">\(\sigma_x^2 = p(1-p)\)</span>
                    </li>
                </ul>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
            <aside class="notes">
                <ul>
                    <li>If we say a success is rolling a die for 3 or higher, then the probability of success is 0.67,
                        and the probability of failure is 0.33.
                    </li>
                </ul>
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Binomial Distribution</h2>
                <p><em>Lots and lots of Bernoulli Trials</em></p>
                <ul class="fragment">
                    <li><strong>Binomial Coefficient</strong> - the number of ways to &ldquo;choose&rdquo; \(k\)
                        unordered outcomes from \(n\) possibilities
                        <span class="math-formula">$$_nC_k =  \begin{pmatrix}n \\k \end{pmatrix} = \frac{n!}{k!(n - k)!}$$</span>
                    </li>
                    <li class="fragment">Binomial probability mass function:
                        <span class="math-formula">$$P(X = k) = \begin{pmatrix}n \\k \end{pmatrix}p^{k}(1-p)^{n-k}$$</span>
                    </li>
                </ul>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
            <aside class="notes" data-markdown>
                * A bernoulli trial is one user coming to your website, he either buys or he doesn't buy
                * A binomial distribution is 1000 users coming to your site. Some buy and some don't buy
                * For example, how many ways can you choose 2 heads from flipping 4 coins? it would be 4! divided by 2!
                times 2! which equals 6
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Binomial Math</h2>
                <ul>
                    <li>Mean:<br/>
                        <span class="math-formula">\(\mu_x = np\)</span>
                    </li>
                    <li>Variance:<br/>
                        <span class="math-formula">\(\sigma_x^2 = np(1-p)\)</span>
                    </li>
                    <li>Standard deviation:<br/>
                        <span class="math-formula">\(\sigma_x = \sqrt{np(1-p)}\)</span>
                    </li>
                </ul>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
            <aside class="notes" data-markdown>
                * So if I have 100 coins and they are all fair, the expected value of heads is 100 * 0.5 = 50
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Binomial Coins</h2>
                <p><em>What is the probability of getting 3 heads in 5 coin flips?</em></p>
                <span class="math-formula fragment">$${P(X = 3) = \begin{pmatrix}5 \\3 \end{pmatrix}.5^{3}(1-0.5)^2 = 31.25\%}$$</span>
                <p class="fragment"><em>What is the cdf of 3 out of 5?</em></p>

                <table class="fragment" style="font-size: 75%;">
                    <tr>
                        <th>Value</th>
                        <th>\(a\)</th>
                        <th>0</th>
                        <th>1</th>
                        <th>2</th>
                        <th>3</th>
                        <th>4</th>
                        <th>5</th>
                    </tr>
                    <tr>
                        <td>pmf</td>
                        <td>\(p(a)\)</td>
                        <td>\(p^5\)</td>
                        <td>\(5p^5\)</td>
                        <td>\(10p^5\)</td>
                        <td>\(10p^5\)</td>
                        <td>\(5p^5\)</td>
                        <td>\(p^5\)</td>
                    </tr>
                    <tr>
                        <td>pmf</td>
                        <td>\(p(a)\)</td>
                        <td>\(0.03125\)</td>
                        <td>\(0.15625\)</td>
                        <td>\(0.3125\)</td>
                        <td>\(0.3125\)</td>
                        <td>\(0.15625\)</td>
                        <td>\(0.03125\)</td>
                    </tr>
                    <tr>
                        <td>cdf</td>
                        <td>\(F(a)\)</td>
                        <td>\(0.03125\)</td>
                        <td>\(0.1875\)</td>
                        <td>\(0.5\)</td>
                        <td>\(0.8125\)</td>
                        <td>\(0.99875\)</td>
                        <td>\(1\)</td>
                    </tr>
                </table>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Binomial Views</h2>
                <p><em>90% of students will view their scores if released to them.</em></p>
                <p><em>What's the probability all of a class of 20 will view?</em></p>
                <span class="math-formula fragment">$${P(X = 20) = \begin{pmatrix}20 \\20 \end{pmatrix}.9^{20}(1-.9)^0 = 12.16\%}$$</span>
                <p class="fragment"><em>What's the probability at least 90% will view?</em></p>

                <span class="fragment">$${P(X \ge 18) = 1 - P(X=20 )- P(X=19 ) = 60.83\%}$$</span>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
            <aside class="notes">
                <ul>
                    <li>Like the coin flipping table, we can add up all the probabilites of 0 students, 1 student, 2
                        students but that takes forever
                    </li>
                    <li>Instead take advantage that the sum of all probabilities always equal 1</li>
                    <li>It's faster do 1 minus the probabilities of 20 students and 19 students</li>
                </ul>
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Binomial Views With Code</h2>
                <p><em>Ain&rsquo;t nobody got time to math.</em></p>

                <pre><code data-trim data-noescape>
                    from scipy.stats import binom

                    sample_size = 20
                    p = 0.9

                    # all 20 will view score
                    all_probability = binom.pmf(20, sample_size, p)
                    print(all_probability)
                    # 0.121576654591

                    # at least 18 students will view score
                    cumulative_probability = binom.cdf(20, sample_size, p)
                    print(cumulative_probability)
                    # 0.608253001875
                </code></pre>
            </div>

        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Continuous Random Variables</h2>

                <ul>
                    <li>A random variable \(X\) is <strong>continuous</strong> if there is a function \(f(x)\) such that
                        for any \(c \le d\), the probability density function (pdf) is
                        <span class="math-formula">$$P(c \leq d) = \int_c^d f(x)dx$$</span>
                        <span class="math-formula">$${P(-\infty \leq X \leq \infty) = \int_{-\infty}^\infty f(x) dx = 1}$$</span>
                    </li>
                </ul>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
            <aside class="notes">
                <ul>
                    <li>So note that when dealing with discrete distributions we are looking for a probability mass
                        function.
                    </li>
                    <li>For continuous distribution, we are looking for a probability density function.</li>
                    <li>Mass is the integral of density</li>
                    <li>If you take the integral of the pdf at all vales, then the area of all probability is 1</li>
                </ul>
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Continuous Distributions</h2>

                <table>
                    <tr>
                        <td>Uniform</td>
                        <td>angle of a dart throw</td>
                        <td>\(U(a,b)\)</td>
                    </tr>
                    <tr>
                        <td>Beta</td>
                        <td>batting average</td>
                        <td>\(beta(a,b)\)</td>
                    </tr>
                    <tr>
                        <td>Exponential</td>
                        <td>Finding an Uber</td>
                        <td>\(exp(\lambda)\)</td>
                    </tr>
                    <tr>
                        <td>Normal</td>
                        <td>IQ</td>
                        <td>\(N(\mu,\sigma^2)\)</td>
                    </tr>
                    <tr>
                        <td>Student \(t\)</td>
                        <td>Guinness Beer</td>
                        <td>\(t(df)\)</td>
                    </tr>
                </table>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <h2>The Boring Distribution</h2>

            <p><em>a.k.a &ldquo;Bell Curve&rdquo;, Gaussian Distribution</em></p>
            <span class="math-formula">$$f(x) = \phi(z) = \frac{1}{\sigma \sqrt{2 \pi } }e^{\frac{-(x - \mu)^2}{2\sigma^2}}$$</span>

            <ul class="fragment">
                <li>Defined by mean and standard deviation \(N(\mu,\sigma^2)\)</li>
                <li>Curve is symmetrical, more data at center vs. edges</li>
                <li>Mean = Median = Mode</li>
                <li>Don't assume everything is normally distributed</li>
            </ul>

        </section>
        <section
                data-transition="fade-in">
            <h2>Standard Z</h2>
            <p><em>A normal distribution where \(\mu\ = 0\), \(\sigma = 1\), a.k.a. \(N(0,1)\)</em></p>
            <span class="math-formula">$$Z = \frac{X -  \mu }{ \sigma }$$</span>
            <ul class="fragment">
                <li>If \(X=1\), then it is 1 standard deviation from the mean</li>
                <li>pdf: \(f(z) = \phi(z) = 1/2\)</li>
                <li>\(P(-1 \le Z \le 1) \approx 0.6827\)</li>
                <li>\(P(-2 \le Z \le 2) \approx 0.9545\)</li>
                <li>\(P(-3 \le Z \le 3) \approx 0.9973\)</li>
            </ul>
            <aside class="notes">
                <ul>
                    <li>Draw out the percentages!</li>
                    <li>The probability that Z is between -1 and 1 is 68%</li>
                </ul>
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <h2>Expected Value of Continuous Distributions</h2>
            <p><em>Expected Value measures central tendency</em></p>
            <span class="math-formula">$$E(X) =  \int_a^b xf(x)dx$$</span>
            <div class="fragment">
                <p>Expected Value of a standard z distribution</p>

                <span class="math-formula">$$E(z) = \phi(z) = \frac{1}{ \sqrt{2 \pi } }e^{-z^2/2} \Big|_{-\infty}^{\infty} = 0$$</span>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <h2>Statistics vs. Probability</h2>
            <ul>
                <li><strong>Statistic</strong>
                    <ul>
                        <li><em>&mu;</em> of 100 dice rolls</li>
                        <li>Number times rolling a 5</li>
                    </ul>
                </li>
                <li><strong>Probability</strong>
                    <ul>
                        <li>Likelihood of rolling a 4</li>
                        <li>Likelihood of rolling a 6 three times</li>
                    </ul>
                </li>
            </ul>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
        </section>
        <!--
        <section
                data-transition="fade-in">
            <div data-markdown>
                ## Statistics

                * What is **Statistic**? Information that can be computed from data
                  * **Pure Statistics** - a single value computed from data, such as a sample average
                  * **Interval Statistics** - an interval [*a, b*] computed from data.
                * **Statistical Inference** - draw conclusions about a large data set by analyzing smaller sets of data
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
        </section>
        -->
        <section
                data-transition="fade-in">
            <div data-markdown>
                ## Inferential Statistics


                * **Parameter estimation** - some value that determines the properties of the distribution, such as
                \\(\mu\\) or \\(\sigma\\)
                * **Data Prediction** - use information about sample to predict a random selection
                * **Model comparison** - selecting a model which best explains the observed data, something that
                postulates the relationship between factors and the data
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Central Limit Theorem</h2>

                <p><em>Without it, we can't make math happen!</em></p>

                <span class="math-formula">$$\lim_{n \rightarrow \infty} P(| \overline{X} - \mu | < a) = 1$$</span>

                <ul>
                    <li>The bigger the sample, the closer the mean and variance of the sample gets to the population
                    </li>
                    <li>The means of many samples is a normal distribution</li>
                    <li>The means of those means approximates the mean of the population</li>
                </ul>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
            <aside class="notes">
                <ul>
                    <li>The normal distribution just doesn't happen everywhere, contrary to what we think</li>
                    <li>Suppose you're researching penguins. Today you grab 10 penguins and get their average weight.
                        Tomorrow you grab another 10 penguins and get their average weight. You do this over an over
                        again and if you graph all those averages, you get a normal distribution.
                    </li>
                    <li>Experimentation is what makes the normal distribution</li>
                </ul>
            </aside>
        </section>
        <!--
        <section
                data-transition="fade-in">
            <div>
                <h2>Sample vs Population</h2>

                <ul>
                    <li>Let \(S_n\) the the sum of \(x_1, x_2, ..., x_n\) random variables</li>
                    <li>Each has a mean \(\mu\) and a standard deviation \(\sigma\).</li>
                    <li>Then the weighted average of the random variables is:</li>
                </ul>
                <span class="math-formula">$$\overline{X}_{n} = \frac{S_{n}}{n} = \frac{X_{1} + ... + X_{n}}{n} =  \big(\sum_{i=1}^nX_{i}\big)/n$$</span>

            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
        </section>
        -->
        <section
                data-transition="fade-in">
            <div>
                <h2>Sample Statistics</h2>

                <p><em>Let \(S_n\) the the sum of \(x_1, x_2, ..., x_n\) random variables</em></p>

                <table>
                    <tr>
                        <td>Mean</td>
                        <td>\(E( \overline{X}_{n}) =\mu\)</td>
                        <td>\(E(S_{n}) = n\mu\)</td>
                    </tr>
                    <tr>
                        <td>Variance</td>
                        <td>\(Var( \overline{X}_{n}) =\frac{\sigma^2}{n}\)</td>
                        <td>\(Var(S_{n}) = n\sigma^2\)</td>
                    </tr>
                    <tr>
                        <td>S.D.</td>
                        <td>\(\sigma_{\overline{X}_{n}} =\frac{\sigma}{\sqrt{n}}\)</td>
                        <td>\(\sigma_{S_{n}} =\sqrt{n}\sigma\)</td>
                    </tr>
                </table>

                <p class="fragment">As \(n\) gets larger, the standard deviation gets smaller</p>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <h2>Theory Time!<br/>Bayesian vs. Frequentist</h2>
            <p><em>Statistics require making lots of inferences</em></p>
            <ul>
                <li class="fragment"><strong>Bayesian</strong>: probability is subjective and deduced from prior
                    knowledge
                </li>
                <li class="fragment"><strong>Frequentist</strong>: probability is objective and obtained through
                    experimentation
                </li>
            </ul>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
            <aside class="notes">
                <ul>
                    <li>So honestly, when you ask "what is Probability?" you can't even get a unified answer.</li>
                    <li>The reason is statistics require dealing a lot of unknowns. The process by which we we obtain
                        knowledge from a unknown information has two main approaches
                    </li>
                    <li>I'm going to give you a very simplified definition. If a real statistician were sitting here,
                        that person would cringe a little for trying compress an entire discipline into a few sentences.
                    </li>
                    <li>The first school of thought is Bayesian Statistics, which was popular until the 20th century
                    </li>
                    <li>Bayesian: probability is subjective and deduced from prior knowledge</li>
                    <li>The second school of thought is Frequentist Statistics, which is the predominant approach to
                        statistics since the 20th century.
                    </li>
                    <li>Frequentist: probability is objective obtained from experimentation</li>
                    <li>The funny thing is, in this 21st century, Bayesian inference is making a huge comeback with
                        regards to machine learning and data analysis
                    </li>
                    <li>The main point to remember is neither one is inherently wrong or superior, it's just that the
                        two have different goals when computing data.
                    </li>
                </ul>
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Bayesian vs Frequentist</h2>

                <p><em>A coin is flipped 100 times and it lands heads 60 times.</em></p>

                <ul>
                    <li><strong>Frequentist</strong>: What's the probability of getting 60 or more heads \(P(X > 59)\)
                        if the coin is fair?
                    </li>
                    <li><strong>Bayesian</strong>: Is the coin fair?</li>
                </ul>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>What about A/B Testing?</h2>

                <ul>
                    <li><strong>Frequentist</strong>: Do the results of the new variant support rejecting the current
                        variant?
                    </li>
                    <li><strong>Bayesian</strong>: What's the probability the new variant is better than the current
                        variant?
                    </li>
                </ul>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div class="full-page-background"
                 style="background-image: url('custom/media/xkcd-1132.png');height: 680px;"></div>
            <aside class="notes" data-markdown>
                * Of course there's a relevant XKCD. This one is number 1132
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <h1>Frequentist Statistics</h1>
            <p><em>Probabilities present long-term frequencies of repeatable random experiments.</em></p>
            <ul>
                <li>Conclusions are objective</li>
                <li>Approach problems like a flowchart</li>
                <li>Assign probability to the data, not degrees of belief</li>
                <li>Dominant approach for 20th century</li>
            </ul>
            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
            <aside class="notes" data-markdown>
                * Whenever you hear some study where half the participants were assigned the drug, and the other half
                were taking the placebo, then you are likely reading about a study using Frequentist tools
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <h2>Frequentist Weaknesses</h2>
            <ul>
                <li>Can't do small samples</li>
                <li>People make dumb conclusions</li>
                <li>So many false positives</li>
                <li>Counter-intuitive way to reason about probability</li>
                <li><em>The p-value is the probability of seeing your result if \(H_0\) were true.</em> Wait, what?</li>
            </ul>
            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
            <aside class="notes" data-markdown>
                * Researchers will typically assign a 5% significance level, which is another way of saying 5% of the
                time, you will see a false positive.
                * It's another way of admitting 1 in every 20 studies is wrong.
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <h2>R. A. Fisher</h2>
            <img src="custom/media/fisher.jpg" style="height: 45vh;"/>
            <p>A guy wearing a college professor costume</p>
            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
            <aside class="notes" data-markdown>
                * R.A. Fisher is known as the father of modern statistics
                * He may get too much credit versus other statisticians such as William Gosset, but he's still the go-to
                guy when you think about hypothesis testing and p-values
                * He read some books on Bayesian statistics and thought it was garbage.
                * His career was spent research genetics, and the tools he developed to analyze and model his testing
                grew into modern statistics.
                * He published "The Design of Experiments" in 1935 which laid the foundations for modern experimental
                design
                * He wasn't all cool. He was a strong believer in eugenics and used statistics to show there were
                difference between races

            </aside>
        </section>
        <section
                data-transition="fade-in">
            <h2>Null Hypothesis Significance Testing</h2>

            <ul>
                <li class="fragment"><strong>Null Hypothesis</strong> \(H_0\) - Currently accepted value for a
                    parameter.
                </li>
                <li class="fragment"><strong>Alternate Hypothesis</strong> \(H_A\) - Claim to be tested</li>
                <li class="fragment">\(H_0\) and \(H_A\) are opposites. They must represent all possible answers.</li>
                <li class="fragment"><strong>Neyman-Pearson Paradigm</strong> - If the data is well outside the what is
                    expected under the null hypothesis, then reject the null hypothesis.
                </li>
            </ul>
            </ul>
            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
            <aside class="notes" data-markdown>
                * **Alternate Hypothesis** is also called &ldquo;research hypothesis&rdquo;
                * In A/B Testing, the null hypothesis is called the control, the Alternate hypothesis is called the
                Treatment
                * The most important thing here, and the thing that everyone gets wrong - you are not trying to prove
                your alternate hypothesis. You are trying to prove the status quo.
                * It's very counter-intuitive
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <div data-markdown>
                ## Innocent until proven guilty

                *We assume the null hypothesis is true.*

                * \\(H_0\\) - defendant is innocent
                * \\(H_A\\) - defendant is guilty
                * If evidence proves otherwise, then we will ***reject*** the null hypothesis (innocent verdict)
                * If there isn't evidence to prove otherwise, then we will ***fail to reject*** the null hypothesis (not
                guilty)

            </div>
            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
            <aside class="notes" data-markdown>
                * Analogy to US courts: you are presumed to be innocent.
                * It is up to the evidence to prove guilt.
                * You don't have to prove innocence (aka null hypothesis)
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Standardization</h2>
                <p><em>Map the sample to the Z distribution to math easier</em></p>
                <span class="math-formula">$$Z = \frac{ \overline{x} -  \mu_{0}}{  \sigma/\sqrt{n} }$$</span>

            </div>
            <div class="slide-footer-left">
                <p class="footer-text">1. Probability & Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <h2>Critical Value</h2>
            <p><em>Probability that the result will be within z-values, a.k.a. \(\sigma\)</em></p>
            <table>
                <tr>
                    <th>Probability %</th>
                    <th>Critical Value</th>
                    <th>\(\alpha/2\)</th>
                </tr>
                <tr>
                    <td>68.27%</td>
                    <td>1.000</td>
                    <td>\(z_{0.1}\)</td>
                </tr>
                <tr>
                    <td>80%</td>
                    <td>1.282</td>
                    <td>\(z_{0.159}\)</td>
                </tr>
                <tr>
                    <td>90%</td>
                    <td>1.645</td>
                    <td>\(z_{0.05}\)</td>
                </tr>
                <tr>
                    <td>95%</td>
                    <td>1.960</td>
                    <td>\(z_{0.025}\)</td>
                </tr>
                <tr>
                    <td>98%</td>
                    <td>2.326</td>
                    <td>\(z_{0.01}\)</td>
                </tr>
            </table>
            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <h2>Finding Critical Value</h2>

            <pre><code data-trim data-noescape>
                    from scipy.stats import norm

                    zscore_input = 'What is the z-score? '
                    zscore = input(zscore_input)

                    two_tail = 2 * (1 - norm.cdf(zscore))
                    print('2-tail: {0}'.format(two_tail))
                </code></pre>
            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <h2>Finding Z-Score</h2>

            <pre><code data-trim data-noescape>
                    from scipy.stats import norm

                    significance_input = 'What is the significance level (alpha)? '
                    alpha = input(significance_input)

                    z_score = norm.ppf((1 - alpha))
                    print('Z-score: {0}'.format(z_score))
                </code></pre>
            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <h2>Significance vs Power</h2>
            <ul>
                <li><strong>Significance Level</strong> \(\alpha\) - probability of rejecting \(H_0\) even though it&rsquo;s
                    true, a.k.a. a &ldquo;false positive&rdquo;.
                </li>
                <li><strong>Power Level</strong> \(\beta\) - probability of rejecting \(H_0\) when it&rsquo;s false.
                </li>
                <li class="fragment">Ideally we want to conduct a test with a low significance and a high power</li>
                <li class="fragment">&ldquo;Statistically significant&rdquo; rarely means what people think it means.
                    Inconceivable!
                </li>
            </ul>
            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <h2>Testing Errors</h2>


            <table>
                <tr>
                    <td></td>
                    <td>\(H_0\) is true IRL</td>
                    <td>\(H_A\) is true IRL</td>
                </tr>
                <tr>
                    <td>&ldquo;Reject&rdquo; \(H_0\)</td>
                    <td>Type I error \(P(\alpha)\)</td>
                    <td>Correct Decision</td>
                </tr>
                <tr>
                    <td>&ldquo;Don&rsquo;t reject&rdquo; \(H_0\)</td>
                    <td>Correct Decision</td>
                    <td>Type II error \(P(1-\beta)\)</td>
                </tr>
            </table>

            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <h2>Modeling a Test</h2>

            <ul>
                <li>Pick the null hypothesis \(H_0\)</li>
                <li class="fragment">Decide if the alternate hypothesis \(H_A\) is one- or two-sided
                    <ul>
                        <li>one-sided: new app version gets more clicks</li>
                        <li>two-sided: new app version has different click rate</li>
                    </ul>
                </li>
                <li class="fragment">Pick a test statistic, e.g., sample mean, sample total</li>
                <li class="fragment">Pick a significance Level \(\alpha\)</li>
                <li class="fragment">Determine the power \(\beta\)</li>
            </ul>

            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
            <aside class="notes" data-markdown>
                * Typically we will pick 0.05 significance level
                * and a 0.80 power level
                * Draw the distribution curve to illustrate tails
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <h2>One Tail or Two Tails?</h2>
            <ul>
                <li>Don't assume the new variant will either have no difference or outperform the current variant
                    (one-tail). It might be worse.
                </li>
                <li>A two-tail test means the critical value is split in half. The probability of not rejecting \(H_0\)
                    is halved if \(H_A\) is better.
                </li>
            </ul>
            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <h2>WTF is a <span style="text-transform: lowercase;">p</span>-value?</h2>
            <p><em>How rare is the result if the null hypothesis is true?</em></p>

            <ul class="fragment">
                <li>The p-value is the result of the test</li>
                <li>If it is less than the significance level \(\alpha\) then it&rsquo;s the probablity of seeing your
                    result if the null hypothesis \(H_0\) were true.
                </li>
                <li>If so, we can reject \(H_0\)</li>
            </ul>
            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
            <aside class="notes" data-markdown>
                * 30 / 214 is 14% so at first glance it seems like an improvement
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <h2>Testing a population Proportion</h2>
            <ul>
                <li>Historically, 12% of participants will upload a profile pic</li>
                <li>We&rsquo;ve updated the UI. From a sample of 214 new participants, 30 uploaded a profile picture
                </li>
                <li class="fragment">With \(\alpha = 0.05\), can we reject \(H_0\)? (\(z_{\alpha=0.05} = 1.645\))
                    <span class="fragment">\({Z = \frac{ \overline{x} -  \mu_{0}}{  \sigma/\sqrt{n} } = \frac{ \frac{k}{n} -  \mu_{0}}{  \sqrt{\frac{p(1-p)}{n}} } = \frac{ \frac{30}{214} - 0.12}{  \sqrt{\frac{0.12(1 - 0.12)}{214}} } = 0.909}\)</span>
                </li>
                <li class="fragment">Since \(Z_{H_A} = 0.909 < Z_{H_0} = 1.649\), we do not reject \(H_0\)</li>
            </ul>
            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <h2>Testing a population mean</h2>
            <ul>
                <li>A manufacturer claims its thermostats last more than 10,000 days. In a sample of 30, the \(\mu\)
                    time before everyone froze was 9,900 days with \(\sigma = 120\).
                </li>
                <li class="fragment">With \(\alpha = 0.05\), can we reject \(H_0\)? (\(z_{\alpha=0.05} = -1.645\))
                    <span class="fragment">$${Z = \frac{ \overline{x} -  \mu_{0}}{  \sigma/\sqrt{n} } = \frac{ 9900 -10000}{  120 \sqrt{30}  } = -4.456}$$</span>
                </li>
                <li class="fragment">Since \(Z_{H_A} = -4.456 < Z_{H_0} = -1.649\), we reject \(H_0\)</li>
            </ul>
            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <h2>Sampling Size</h2>
            <p><em>How big does a sample size need to be before you can trust the data not to be <span
                    style="text-decoration: line-through">total garbage</span> just noise?</em></p>
            <ul class="fragment">
                <li>More variance = More data required.</li>
                <li>Things with unknown variance (e.g., purchase $ amount) require t-test.</li>
                <li>Binomial distributions typically have less variance than normal distributions</li>
                <li>Conversions require the smallest sampling size</li>
                <li>Smaller sample sizes = more testing</li>
            </ul>
            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <h2>Sample Size For Proportions</h2>
            <p><em>How much data do I need to collect in order to<br/>measure an \(x\%\) change in conversion rate?</em>
            </p>
            <span>$$n = (Z_{\alpha/2}+Z_{1-\beta})^{2} \cdot \frac{p_1(1-p_1)+p_2(1-p_2) }{(p_1-p_2)^2}$$</span>
        </section>
        <section
                data-transition="fade-in">
            <h2>Measuring Uplift</h2>
            <ul>
                <li>The feature currently has a 10% conversion rate</li>
                <li>Detect at least a 25% uplift, with \(\alpha = 0.05\) and \(\beta = 0.80\)
                    <span class="math-formula fragment">$$p_{H_0} = 0.1, p_{H_A} = 1.25p_{H_0} = 0.125$$</span>
                    <span class="math-formula fragment">$${n = (1.960 + 0.842)^2 \cdot \frac{0.199}{-0.025^2} \approx 2504}$$</span>
                </li>
                <li class="fragment">Was that bigger than you thought?</li>
            </ul>
        </section>
        <section
                data-transition="fade-in">
            <h2>Peeking</h2>
            <p><em>In Frequentist testing, always determine the sample size beforehand.</em></p>
            <ul>
                <li>Never check progress even if one variant looks to be outperforming the other!!!</li>
                <li>You want garbage? This is how you get Type I errors.</li>
            </ul>


            <table class="fragment" style="font-size: 0.6em;margin-top: 0.5em">
                <tr>
                    <th></th>
                    <th>A/B Test 1</th>
                    <th>A/B Test 2</th>
                    <th>A/B Test 3</th>
                    <th>A/B Test 4</th>
                </tr>
                <tr>
                    <td>1000 observations</td>
                    <td>Don't reject <em>H<sub>0</sub></em></td>
                    <td>Don't reject <em>H<sub>0</sub></em></td>
                    <td bgColor="#CCFFCC">Reject <em>H<sub>0</sub></em></td>
                    <td bgColor="#CCFFCC">Reject <em>H<sub>0</sub></em></td>
                </tr>
                <tr>
                    <td>2000 observations</td>
                    <td>Don't reject <em>H<sub>0</sub></em></td>
                    <td bgColor="#CCFFCC">Reject <em>H<sub>0</sub></em></td>
                    <td bgColor="#CCFFCC">Reject <em>H<sub>0</sub></em></td>
                    <td>Don't reject <em>H<sub>0</sub></em></td>
                </tr>
                <tr>
                    <td>End Result</td>
                    <td>Don't reject <em>H<sub>0</sub></em></td>
                    <td bgColor="#CCFFCC">Reject <em>H<sub>0</sub></em></td>
                    <td bgColor="#CCFFCC">Reject <em>H<sub>0</sub></em></td>
                    <td bgColor="#CCFFCC">Reject <em>H<sub>0</sub></em></td>
                </tr>
            </table>

            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
            <aside class="notes" data-markdown>
                * Maybe after 2000 observations there's some regression
                * This is major problem with so many A/B testing services. They allow you to track progress.
                * They kinda have to. It's what the client wants. They can't just say, okay run your test, check back in
                a few weeks.
                * That is the dilemma. I'm sure all those A/B testing services have Ph.D statisticians working for them,
                and they are all probably saying over to the marketing team: STOP PEEKING, but the marketing team can't
                just sell a mystery black box to clients.
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <h2>Power Analysis</h2>
            <p><em>How to compare two population proportions</em></p>
            <span class="math-formula">$$Z= \frac{X - \mu}{\sigma} = \frac{\widehat{p}_1-\widehat{p}_2}{\sqrt{\widehat{p}(1-\widehat{p}) \left ( \frac{1}{n_1} + \frac{1}{n_2} \right )  }}$$</span>
            <span class="math-formula">$${\widehat{p_1} = \frac{x_1}{n_1} , \widehat{p_2} = \frac{x_2}{n_2} , \widehat{p} = \frac{x_1+x_2}{n_1+n_2} }$$</span>
            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
            <aside class="notes" data-markdown>
                * Okay this is the season finale of Frequentist testing.
                * It's finally the formula you can plug all the numbers into, and not have to worry where the formula
                came from
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <h2>Raw Fish</h2>
            <ul>
                <li>Of the 50 times I've placed an order for Sakedon under my name &ldquo;Jimmy,&rdquo; I&rsquo;ve
                    gotten extra salmon 23 times.
                </li>
                <li>Of the 70 times using the alias, &ldquo;Hot Carlos,&rdquo; I&rsquo;ve gotten extra salmon 43
                    times.
                </li>
                <li><strong>Should I continue to let them think I'm Jimmy?</strong></li>
                <li class="fragment">Test Statistic
                    <span style="font-size: 0.85em;">\(X = \widehat{p}_1-\widehat{p}_2 = \frac{23}{50}-\frac{43}{70} \approx -0.1543\)</span>
                </li>
                <li class="fragment">Z-score
                    <span style="font-size: 0.85em;">\(Z = \frac{-0.1543}{0.0921} \approx -1.6749\)</span></li>
                <li class="fragment"> \(p\)-value: \(P(Z < -1.6749) \approx 0.0470\) or 4.7%</li>
                <li class="fragment">If \(H_o\) is true, then there&rsquo;s a 4.7% chance \(\Delta\mu < 0.1543\).</li>
            </ul>
            <div class="slide-footer-left">
                <p class="footer-text">2. Frequentist Statistics</p>
            </div>
            <aside class="notes" data-markdown>
                * In other words, there's a 4.7% chance calling myself Hot Carlos makes no difference
            </aside>
        </section>
        <section
                data-background-video="custom/media/flamingo.mp4"
                data-background-video-loop="true"
                data-transition="fade-in">
        </section>
        <section
                data-transition="fade-in">
            <h1>3. Bayesian Statistics</h1>
            <p><em>This section punted to <a href="/custom/pages/bayesian-statistics.html">Bayesian Slides</a>.</em></p>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h1><span style="text-decoration: line-through;">4. Multivariate Testing</span></h1>
                <p><em>Sorry, this a whole talk onto itself.</em></p>

                <ul>
                    <li>Full vs. Fractional Factorial</li>
                    <li>Chi-Squared Tests</li>
                    <li>Analysis of Covariance</li>
                    <li>Multilevel Modeling</li>
                    <li>Markov Chain Monte Carlo (MCMC)</li>
                </ul>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h1>5. Implement- ation</h1>
                <ul>
                    <li>We should totally do it.</li>
                    <li>(Sorry, this section will be very high level.)</li>
                    <li>Perhaps an A/B Testing Part II talk?</li>
                </ul>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Can we pay someone else?</h2>
                <p><em>There are two major companies....</em></p>
                <ol>
                    <li><strong>Optimizely</strong> - Sequential Probability Ratio (Frequentist)</li>
                    <li><strong>VWO</strong> - Monte Carlo Simulations (Bayesian)</li>
                </ol>
                <div class="fragment">
                    <p><em>... and there&rsquo;s everyone else</em></p>
                    <ul>
                        <li><strong>Google Experiments</strong> - ties in with Google Analytics</li>
                        <li><strong>Adobe Target</strong> - ties in with Adobe Analytics, what?</li>
                        <li><strong>Qubit</strong> - strong focus on segmentation (Frequentist)</li>
                    </ul>
                </div>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">5. Implementation</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Do 3rd-party services just lie?</h2>
                <p><em>With the exception of the companies mentioned, everything else looks like quackery.</em></p>

                <ul>
                    <li>There's a disconnect between what statistics can do, and what customers expect to happen.</li>
                    <li>They also expect that every customer is just looking to increase <strong
                            style="font-family: didot;">CASH MONEY $$$ ROI SYNERGY BLOCKCHAIN</strong>, so they cater to
                        that.
                    </li>
                </ul>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">5. Implementation</p>
            </div>
            <aside class="notes" data-markdown>
                * I didn't mention a lot of other companies, like "Convert Experiences," "AB Tasty," "Sentient Ascend,"
                "SiteSpeck," "Conversion sciences," but if you go to those websites, it's trash.
                * A guy named Peter Borden tried to get smart with Optimizely. He conducted an A/A test (both variants
                were the same) and optimizely told him that Variant A was 100% guarenteed to be 18% better than Variant
                A
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>R vs. Python</h2>

                <table style="font-size: 0.75em;">
                    <thead>
                    <tr>
                        <th></th>
                        <th style="text-align: center;vertical-align: middle;"><img src="custom/media/r-logo.png" class="no-style"
                                                             style="height: 2em;"/></th>
                        <th style="text-align: center;vertical-align: middle;"><img src="custom/media/python-logo.png" class="no-style"
                                                             style="height: 2em;"/></th>
                    </tr>
                    </thead>
                    <tbody>
                    <tr>
                        <td>Purpose</td>
                        <td>Specifically for statistics</td>
                        <td>General whatever</td>
                    </tr>
                    <tr>
                        <td>Good for</td>
                        <td>Ad-hoc analysis</td>
                        <td>Repeated Tasks</td>
                    </tr>
                    <tr>
                        <td>Learning</td>
                        <td>Steep curve</td>
                        <td><code>import scipy</code></td>
                    </tr>
                    <tr>
                        <td>Usage</td>
                        <td>Standalone computing</td>
                        <td>Integrate into apps</td>
                    </tr>
                    <tr>
                        <td>Tasks</td>
                        <td>A few lines</td>
                        <td>OOP fun</td>
                    </tr>
                    <tr>
                        <td>Packages</td>
                        <td>CRAN</td>
                        <td>PyPi</td>
                    </tr>
                    <tr>
                        <td>Commonalities</td>
                        <td colspan="2">Interpreted Language, StackOverflow support</td>
                    </tr>
                    </tbody>
                </table>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">5. Implementation</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Allocation</h2>
                <p><em>Which users see what?</em></p>
                <ul>
                    <li class="fragment"><strong>Batch</strong> - resolve to a fixed and known set of users
                        <ul>
                            <li>Relatively easier to implement</li>
                            <li>What about new users?</li>
                            <li>Will they even see the feature?</li>
                        </ul>
                    </li>
                    <li class="fragment"><strong>Real Time</strong> - eligibility criteria along with interaction
                        <ul>
                            <li>Split users on strata (e.g. demographics)</li>
                            <li>Latency a real concern</li>
                            <li>Can't estimate length of test</li>
                        </ul>
                    </li>
                </ul>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">5. Implementation</p>
            </div>
            <aside class="notes" data-markdown>
                * **Batch Allocation** - resolve to a fixed and known set of users, very straightforward
                  * Decide users with last names A through M see Control, and N-Z see Treatment
                  * Cons:
                  * can't allocate new users
                  * can't allocate based on real-time user behavior
                  * can't guarantee the user will ever access the feature being tested - so how much of the data is even useful?
                * **Real Time allocation** - user is eligible if they meet certain criteria as they interact with the application
                  * split users on user classification, or device, or geographic region
                  * cons: latency from making additional calls.
                  * solution run call in parallel when doing something else before
                  * Can't estimate how long it will take to have sufficient data
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>A/B Testing Workflow</h2>

                <ol>
                    <li>User accesses front end, which calls the API with a session and user info payload</li>
                    <li>The backend calls the A/B testing service</li>
                    <li>A/B services calls DB and retrieves all tests allocated</li>
                    <li>If using real-time allocation, determines correct one</li>
                    <li>Respond to app backend with test info</li>
                    <li>Deliver to front end whatever is necessary</li>
                </ol>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">5. Implementation</p>
            </div>
            <aside class="notes" data-markdown>
                * Special thanks to Netflix

                1. User accesses a feature on the front end
                2. The front end calls the main application API, delivering a payload containing session and user information to the main appplication backend
                3. The back end calls the A/B testing service
                * The service hits the A/B database retrieves all tests to which this user is already allocated
                * For users with batch allocation, the appropriate test is known
                * Otherwise the server determines the correct (real time) allocation
                5. A/B service passes the test info to the backend
                6. With the test info, the backend fetches and delivers whatever media is necessary back to the user front-end
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>A/B Testing Backend</h2>

                <ul>
                    <li>Which users are in which tests at what time?</li>
                    <li>Make sure each user gets sees consistent experience</li>
                    <li>Prevent users from being part of antagonistic tests</li>
                    <li>Have some good caching</li>
                    <li>Feed data to a separate front end for internal analysis</li>
                </ul>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">5. Implementation</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>The front end</h2>

                <pre class="fat-code"><code>const myComponent = ({}) => {
    return (
        <Experiment ref="experiment" name="My Example">
            <Variant name="control">
                <button>Click Me</button>
            </Variant>
            <Variant name="treatment">
                <button>Click Me</button>
            </Variant>
        </Experiment>
    );
};
emitter.testListener((experimentName, variantName) => {
    // record test initiation for experimentName, variantName
});
emitter.conversionListener((experimentName, variantName) => {
    // record event for experimentName, variantName
});
                </code></pre>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">5. Implementation</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Rules Engine</h2>
                <p><em>Two variants is easy, what about multivariate testing?</em></p>

                <ul>
                    <li>A long chain of conditionals inside the components (to determine which UI to render for whom) puts too much unnecessary logic into rendering.</li>
                    <li class="fragment"><strong>Production Rule System</strong> - a set of rules with conditions + actions that can be evaluated in any order
                        <ul>
                            <li>Improves template legibility by abstracting rules out to a separate file, and encourages re-use</li>
                        </ul>
                    </li>
                </ul>
                <aside class="notes" data-markdown>
                    * Suppose we are testing both the page title, form buttons, and help text on a view. We could easily have half a dozen different variations.
                    * We'll need a way to nest all those conditionals, for the different versions, and a way to identify which user sees which
                    * A rules engine, also known as a Production Rule System is something that Martin Fowler is a huge fan of.
                </aside>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">5. Implementation</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h1>5.Faults in A/B Tests</h1>
                <p><em>This is the part where I make my sales pitch look bad.</em></p>
                <aside class="notes" data-markdown>
                </aside>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>An A/B Test doesn't tell us everyhing</h2>
                <p><em>We can determine which variant is better, but not why.</em></p>

                <ul>
                    <li class="fragment">Letting A/B tests dictate the UI might lead to an incoherent design</li>
                    <li class="fragment">We really don't know what is worth testing, so we will waste time trying everything everywhere.</li>
                </ul>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">6. Faults in A/B Tests</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>What can throw off a test?</h2>

                <ul>
                    <li><strong>Short vs. long term</strong> - <em>(latent conversion)</em> there may be a lag between
                        the time a user is exposed to something and when they take action.
                    </li>
                    <li><strong>Familiarity</strong> - users may be less efficient at doing something until they learn
                        how to do it.
                    </li>
                    <li><strong>Novelty</strong> - users may want to click on everything if they see something new,
                        instead of using it for its purpose.
                    </li>
                    <li><strong>Seasonal effects</strong> - e.g., our app usage fin the summer.
                    </li>
                </ul>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">6. Faults in A/B Tests</p>
            </div>
            <aside class="notes" data-markdown>
                * Example: you have a IRA service and you change the feature for grabbing tax-related documents. Nobody
                is gonna use it until tax season.
                * Example: introducing a new navigation
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>We might be responsible for messing it up</h2>

                <ul>
                    <li><strong>Chatter</strong> - users might find out others experience something different.</li>
                    <li><strong>Bugs</strong> - can different variants can work simultaneously?</li>
                    <li><strong>Hawthorne Effect</strong> - users may alter their behavior if they know they&rsquo;re being studied.</li>
                    <li><strong>Consistency</strong> - make sure users see the same variant for the duration of the test.</li>
                </ul>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">6. Faults in A/B Tests</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div class="full-page-background"
                 style="background-image: url('custom/media/xkcd-882.png');height: 680px;"></div>
            <div class="slide-footer-left">
                <p class="footer-text">6. Faults in A/B Tests</p>
            </div>
            <aside class="notes" data-markdown>
                * The joke is they are running 20 tests with a p-value of 0.05. So that means they are likely to get 1 false positive.
            </aside>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Pre- &amp; post-test segmentation</h2>

                <p><em>Stratified sampling before/after the test for more insight</em></p>

                <ul>
                    <li><strong>Before testing</strong>: split users by strata to ensure proportionate representation</li>
                    <li><strong>After testing</strong>: measure which sub-population had the best conversion for each variant
                        <ul>
                            <li>Get ready for more Type I errors</li>
                        </ul>
                    </li>
                </ul>

            </div>
            <div class="slide-footer-left">
                <p class="footer-text">6. Faults in A/B Tests</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <div>
                <h2>Regression to the mean</h2>
                <ul>
                    <li>Suppose you had a bunch of users use coin flips to answer a T/F test, and you picked the users with the highest scores.</li>
                    <li>Give them the same T/F test again and have them pick answers by coin flip. Would they still score highly?</li>
                    <li class="fragment">If you retest a bunch of winning A/B tests again to find the "best of the best,&rdquo; you&rsquo;ll get false negatives.</li>
                </ul>
            </div>
            <div class="slide-footer-left">
                <p class="footer-text">6. Faults in A/B Tests</p>
            </div>
        </section>
        <section
                data-transition="fade-in">
            <h2>The End</h2>
            <img src="custom/media/red-panda.jpg" style="height: 45vh;"/>
            <p>Is it lunch yet? Who's hungry?</p>
        </section>
    </div>
</div>

<script src="lib/js/head.min.js"></script>
<script src="js/reveal.js"></script>

<script>
    // More info about config & dependencies:
    // - https://github.com/hakimel/reveal.js#configuration
    // - https://github.com/hakimel/reveal.js#dependencies
    Reveal.initialize({
        slideNumber: 'c/t',
        showSlideNumber: 'speaker',
        controls: false,
        progress: false,
        history: true,
        autoPlayMedia: null,
//        center: true,
        width: 1080,
        height: 720,
//        margin: 0,
        math: {
            // mathjax: 'http://cdn.mathjax.org/mathjax/latest/MathJax.js',
            config: 'TeX-AMS_HTML-full'
        },
        dependencies: [
            {src: 'plugin/markdown/marked.js'},
            {src: 'plugin/markdown/markdown.js'},
            {src: 'plugin/math/math.js', async: true},
            {src: 'plugin/notes/notes.js', async: true},
            {
                src: 'plugin/highlight/highlight.js', async: true, callback: function () {
                hljs.initHighlightingOnLoad();
            }
            }
        ]
    });
</script>
</body>
</html>