-
Notifications
You must be signed in to change notification settings - Fork 0
/
reward-maximisation.html
402 lines (389 loc) · 22.4 KB
/
reward-maximisation.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
<!DOCTYPE html>
<html lang="en">
<head>
<!-- ## for client-side less
<link rel="stylesheet/less" type="text/css" href="/theme/css/style.less">
<script src="http://cdnjs.cloudflare.com/ajax/libs/less.js/1.7.3/less.min.js" type="text/javascript"></script>
-->
<link rel="stylesheet" type="text/css" href="/theme/css/style.css">
<link rel="stylesheet" type="text/css" href="/theme/css/pygments.css">
<link rel="stylesheet" type="text/css" href="//fonts.googleapis.com/css?family=PT+Sans|PT+Serif|PT+Mono">
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="author" content="Gaël">
<meta name="description" content="Posts and writings by Gaël">
<meta name="keywords" content="">
<title>
And yet it moves
– Reward maximisation </title>
</head>
<body>
<aside>
<div id="user_meta">
<div id="metalogo">
<a href="/">
<img src="/images/logo.svg" alt="logo" id="logo">
</a>
<a href="/" id="title">And yet it moves</a>
</div>
<p></p>
<ul>
<li><a href="/category/data-processing.html" class="active">Data processing</a></li>
</ul>
</div>
</aside>
<main>
<header>
<p>
<a href="/">Index</a> ¦ <a href="/archives.html">Archives</a>
</p>
</header>
<article>
<div class="series_links">
<a style="float: left;" href='/basics.html'>
<< First article of the series
</a>
</div>
<div class="article_meta">
<p style="text-align: right; margin: 0;">Posted on: Fri, 10 Jun 2016</p>
</div>
<div class="article_title">
<h3><a href="/reward-maximisation.html">Reward maximisation</a></h3>
</div>
<div class="article_text">
<div class="section" id="effect-of-the-sample-size">
<h2 id="effect-of-the-sample-size">Effect of the sample size</h2>
<p>I showed in the previous section that the A/B testing data-gathering
method by itself has some advantages over the multi-armed bandit strategy.</p>
<p>But how does this translates to <em>what really matters</em>, clicks?</p>
<p>I first simulated a large number of complete campaigns (i.e. exploration
stage followed by an exploitation stage using the testing results).
Each time, I could get the total number of clicks yielded in our simulated
website lifetime. However the total number of click is not that convenient:
the simulated population in that particular example was composed of
<span class="math">\(20000\)</span> visitors. But what if I chose a different size? The total number
of clicks would not be directly comparable. That is why I decided to calculate
the (simulated) total click-through rate: the number of clicks divided by the
number of visitors.</p>
<p>In this particular example, I used the two-stepped variant of
the multi-armed bandit strategy provided that the set-and-forget mode is
reached when the sample size is equal to the population size. I obtained the
following figure:</p>
<div class="figure">
<img alt="" src="/images/clicks_vs_sample_size.svg"/>
<p class="caption">The total number of click was calculated for different sample sizes
(in a population of <span class="math">\(20000\)</span> visitors). Both methods were applied on
those samples only. The best-performing variant was served to all the
remaining visitors. Each time, the overall number of clicks was calculated
and divided by the population size. The <em>set-and-forget</em> mode is achieved
when the sample size equals the population size.</p>
<div class="legend">
<p><span class="math">\(P(O|A)\)</span>: (hidden) real click-through rate of the <span class="math">\(A\)</span>
(<object class="align-middle" data="/images/orange_btn.svg" type="image/svg+xml">orangeBtn</object>) website variant used as an input in the simulation.</p>
<p><span class="math">\(P(O|B)\)</span>: (hidden) real click-through rate of the <span class="math">\(B\)</span>
(<object class="align-middle" data="/images/green_btn.svg" type="image/svg+xml">greenBtn</object>) website variant used as an input in the simulation.</p>
</div>
</div>
<ul class="simple">
<li>We can see that in both the 2-staged cases the total click-through rate rises
quickly at smaller
sample sizes (left of the figure), yet A/B testing systematically
yield higher click rates in that part of the figure. This is a direct
consequence of what we showed earlier: A/B testing requires fewer trials to
reach a given confidence.</li>
<li>A maximum is then reached in both methods: that maximum is reached
at smaller sample sizes in the A/B testing framework.</li>
<li>Finally, the total click rate decreases linearly toward the final
value in each case (which corresponds to the set-and-forget
scenario). In A/B testing, it is roughly the arithmetic mean of the
individual click rates while in the multi-armed bandit strategy, it
is a weighted mean.</li>
</ul>
<p>But let’s zoom-in a little bit to talk about the domain of superiority of each methods.</p>
</div>
<div class="section" id="domains-of-superiority">
<h2 id="domains-of-superiority">Domains of superiority</h2>
<p>Here is a close-up of the previous figure:</p>
<div class="figure">
<img alt="" src="/images/closer_clicks_vs_sample_size.svg"/>
<p class="caption">Close-up of the previous figure. There are 3 domains of superiority:
first the multi-armed bandit method used in the set-and-forget mode,
then A/B testing, then the two-staged multi-armed bandit strategy
(i.e. the strategy is used only on a limited sample and the best variant
is then served to all the remaining visitors).</p>
<div class="legend">
<p><span class="math">\(P(O|A)\)</span>: (hidden) real click-through rate of the <span class="math">\(A\)</span>
(<object class="align-middle" data="/images/orange_btn.svg" type="image/svg+xml">orangeBtn</object>) website variant used as an input in the simulation.</p>
<p><span class="math">\(P(O|B)\)</span>: (hidden) real click-through rate of the <span class="math">\(B\)</span>
(<object class="align-middle" data="/images/green_btn.svg" type="image/svg+xml">greenBtn</object>) website variant used as an input in the simulation.</p>
</div>
</div>
<p>This illustrates the fact that a properly designed A/B testing campaign can
beat the multi-armed bandit strategy on its own grounds: maximizing the reward.
As it turned out, saying that this multi-armed bandit strategy was designed to
maximize the reward is not rigorously correct: A/B testing seems to be able to
provide just as many clicks if carefully setup. What the multi-armed bandit
strategy is really good at instead is <em>maintaining a high click-through rate over
a large period of time</em>.</p>
<p>But you could rightfully argue that these results are really local: what if we
used another population size instead? or another <span class="math">\(\varepsilon\)</span>?
or other hidden click-through rates? <em>etc</em>. Would the A/B testing
data-gathering strategy still perform well? In what circumstances? This is why
I am going to calculate the same kind</p>
</div>
<div class="section" id="population-size">
<h2 id="population-size">Population size</h2>
<p>In the example above, the sample size was chosen equal to <span class="math">\(20000\)</span>.
But what would happen to the total click-through rate if that population
size was different?</p>
<p>I performed the exact same kind of Monte-Carlo simulation as above.
But this time,
I decided to represent the difference in overall click-through rates: by using
a convenient colour scheme, we can directly see which algorithm yielded the
highest total click-through rate (and therefore the greatest number of clicks)
given a population size, a sample size, an <span class="math">\(\varepsilon\)</span> for a given
hidden truth:</p>
<div class="figure">
<img alt="" src="/images/diff_vs_sample_size_vs_tot.svg"/>
<p class="caption">Difference in the total click-through rates calculated in the <strong>2-staged mode</strong>
for both A/B testing and the multi-armed bandit data gathering strategies.
For each population size, there is a range of sample sizes where the A/B
testing data-gathering strategy performs as well or better on average
than the multi-armd bandit strategy. At their respective peak, both methods
reach roughly the same maximal value.</p>
<div class="legend">
<p>Blue patches: A/B testing provides roughly significantly better
click-through rates on average, in these conditions.</p>
<p>Green patches: the multi-armed bandit strategy in its two-staged mode
provides roughly significantly better click-through rates.</p>
<p>White patches: the difference on average is not significant.</p>
</div>
</div>
<p>It appears clearly that A/B testing is systematically better at lower
sample sizes. As the population increases, the range of sample sizes in
which A/B testing performs better becomes wider. Though not represented
on the figure, the difference in the maximum click-through rate between
the methods is negligible. This means that A/B testing can
systematically reach the same number of clicks as the other strategy.
Conversely, that two-staged multi-armed bandit strategy can systematically
reach the same number of clicks as A/B testing.</p>
<p>This can be explained by the asymptotic behaviour of the two methods as
illustrated previously: as the population size increases, the worse-case
scenario (i.e. the set-and-forget mode where all the time is spent in
the exploration stage) is reached at higher and higher sample sizes resulting
in a much gentler slope in both methods. However, as the slope in the
A/B testing method was much steeper to begin with, the improvement is
more visible there, resulting in a wider range of superiority.</p>
</div>
<div class="section" id="set-and-forget-vs-two-staged">
<h2 id="set-and-forget-vs-two-staged">Set-and-forget vs. two-staged</h2>
<p>You could wonder why we compared the A/B testing data-gathering method
with a two-staged multi-armed bandit strategy and not its set-and-forget variant:</p>
<div class="figure">
<img alt="" src="/images/diff_vs_sample_size_vs_tot_forget.svg"/>
<p class="caption">Difference in total click-through rates obtained using the two-staged
A/B testing data-gathering method vs. the multi-armed bandit strategy in
its <strong>set-and-forget mode</strong>.
The superiority range of the A/B testing strategy is much wider: the
set-and-forget mode performs worse than the two-staged one.
The sample size is defined for A/B testing only: the multi-armed bandit
being used in the set-and-forget mode, its sample size corresponds to the
whole population size.</p>
</div>
<p>The first reason is that in spite of its being far more complicated to setup,
the two-staged mode yields greater click-through rates on a wider range of
sample sizes. So the two-staged mode could be considered better-performing
than the set-and-forget mode.</p>
<p>The second reason is that a pure set-and-forget mode does not seem desirable in
practice in most cases: do you really want to provide each variant you ever
wanted to test for the whole life of your website? Most certainly not: at some
point you would stop and only provide whichever variant you choose.</p>
<p>Once again, we see that the multi-armed bandit strategy is getting further and
further away from being <em>always</em> (or even <em>generally</em>) better than A/B testing
considering what really counts for a business getting its money from clicks:
the overall click-through rate.</p>
</div>
<div class="section" id="mab-asymmetry">
<h2 id="mab-asymmetry"><span class="caps">MAB</span> asymmetry</h2>
<p>As I have explained earlier, the core of the <span class="caps">MAB</span> strategy consists in
favouring the best-performing variant to keep on testing while gathering more
clicks. The magnitude of that imbalance is
quantified by a variable called <span class="math">\(\varepsilon\)</span> (the Greek letter “epsilon”).</p>
<p>For instance, when <span class="math">\(\varepsilon = 0\%\)</span>, the <span class="caps">MAB</span> strategy favours the
seemingly best-performing variant by <span class="math">\(0\%\)</span>: this is equivalent to
a proper A/B testing data-gathering strategy (no variant is actually ever
favored). When <span class="math">\(\varepsilon = 100\%\)</span>, the algorithm is trapped:
whichever variant was found the best at the very first step will be the only
one served ever.</p>
<div class="figure">
<img alt="" src="/images/epsilon_vs_sample_size_vs_tot.svg"/>
<p class="caption">The overall click-through rate as a function of the sample size
and <span class="math">\(\varepsilon\)</span>, the magnitude by which seemingly best variant
at some point is favoured (i.e. presented more often).</p>
</div>
<p>At <span class="math">\(\varepsilon = 0\%\)</span>, no strategy is really favoured: this was
expected as they are strictly equivalent.</p>
<p>Then as <span class="math">\(\varepsilon\)</span> increases, the difference between the strategies
becomes more and more apparent (the patches become darker and darker).</p>
<p>Regarding the ranges of superiority: it is at it’s widest for the
multi-armed bandit strategy around <span class="math">\(50\%\)</span>. As <span class="math">\(\varepsilon\)</span>
goes away from that value, the A/B testing data-gathering strategy seems
to be better.</p>
<p><em>Is that :math:`varepsilon = 50%` value always the same as other parameters
vary?</em> I don’t know, and if I had to guess I would say <em>no</em>.
However the existence of such a parameter maximizing the
range of superiority of the multi-armed bandit strategy seems guaranteed:</p>
<ul class="simple">
<li>at low <span class="math">\(\varepsilon\)</span>, the two strategy are practically equivalent;</li>
<li>at intermediary <span class="math">\(\varepsilon\)</span>, there is a sample size beyond which
both strategy will be mostly correct yet the multi-armed bandit method
does so while favouring one (which should mostly be the best);</li>
<li>at high <span class="math">\(\varepsilon\)</span>, the multi-armed bandit method has fewer
and fewer opportunities to estimate the click-through rate and
can be trapped in a non-optimal setup for quite some time.</li>
</ul>
</div>
<div class="section" id="offset-in-the-hidden-click-through-rate">
<h2 id="offset-in-the-hidden-click-through-rate">Offset in the (hidden) click-through rate</h2>
<p>Until now, I have basically always defined one click-through rate at
<span class="math">\(10\%\)</span> and the other at <span class="math">\(15\%\)</span> (corresponding to a
difference of <span class="math">\(15 - 10 = 5\%\)</span>).
We could wonder whether offsetting these two values upward
(while keeping the difference constant) would change
anything in the range of superiority of both our data-gathering strategies:</p>
<div class="figure">
<img alt="" src="/images/POXorig_vs_sample_size_vs_tot.svg"/>
<p class="caption">Effect of offsetting both the real (and hidden) click-through rate
values: the range of superiority of the A/B testing data-gathering strategy
is maximized for values centered around <span class="math">\(50\%\)</span> (which corresponds to
an offset of <span class="math">\(\sim 35\%\)</span>).</p>
</div>
<p>It appears very clearly that the largest range of A/B testing data-gathering
superiority is obtained for an offset of <span class="math">\(\sim 35\%\)</span>: this
corresponds to the actual click-through rates <span class="math">\(45\%\)</span> and
<span class="math">\(60\%\)</span>, i.e. almost centered on <span class="math">\(50\%\)</span>.</p>
<p>This result is actually caused by a change in the relative power of the
two data-gathering strategies: when the click-through rates are around
<span class="math">\(50\%\)</span>, the A/B testing method finds the correct variant even
faster than in other cases. At the same time, the multi-armed bandit strategy
does not really benefits from this because it is generally going to
favour one variant at the expense of the others, thus not visibly increasing
its power as quickly.</p>
</div>
<div class="section" id="difference-in-click-through-rates">
<h2 id="difference-in-click-through-rates">Difference in click-through rates</h2>
<p>The last parameter I wanted to study is the difference itself between
the click-through rates. This is what I represented in the following
figure (the <span class="math">\({\Delta}P(O|X)\)</span> notation represents this difference):</p>
<div class="figure">
<img alt="" src="/images/POXdiff_vs_sample_size_vs_tot.svg"/>
<p class="caption">Effect of the difference in click-through rate on the overall observed
click-through rate. The bigger the difference, the widest the range of
superiority of the multi-armed bandit strategy.</p>
</div>
<p>It appears clearly that the multi-armed bandit strategy has the widest range of
superiority (i.e. greater number of clicks over a broader sample size range)
as the difference increases.</p>
<p>Yet A/B testing does not become irrelevant: as the difference gets bigger, one
needs far smaller sample sizes to get relevant information. This means that the
blue patches (showing the superiority range of the A/B testing data-gathering
strategy) should still be there but at even smaller sample sizes than the
minimum of <span class="math">\(80\)</span> samples that we represented here.</p>
</div>
<div class="section" id="summary-what-is-the-mab-strategy-good-at">
<h2 id="summary-what-is-the-mab-strategy-good-at">Summary: what is the <span class="caps">MAB</span> strategy good at?</h2>
<p>In practice, saying that this particular implementation of the
multi-armed bandit strategy has been developed to maximise the number of
clicks appears misleading. This is <strong>not</strong> what it does. What it was
actually designed for (and does very well) is keeping that total number
of clicks very high in a larger range of samples sizes. See the difference?</p>
<p>This is why the multi-armed bandit strategy can be used in a set-and-forget
mode. This is also why we observe such a wide superiority range as the
difference in click-through rate increases. And yet this is why there is always
a range of sample sizes in which the A/B testing strategy will be as good on average.</p>
<p>The multi-armed bandit method is also quite good at doing damage control
in poorly set up data-gathering campaigns. Such a campaign requires that you
set the minimum difference to be found. The sample size is then calculated
depending on this difference threshold. If you expected a difference of
<span class="math">\(1\%\)</span> but actually got a difference of <span class="math">\(50\%\)</span>, chances are high
that you spent far too much time in the exploration stage for the A/B testing
data-gathering strategy to be competitive. In the same setup, the multi-armed
bandit strategy would quickly favour the best variant and avoid such a loss.</p>
<p><em>Does that make the multi-armed bandit method the way to go for web
optimisation?</em> Not necessarily: the next (and last) post of the series
is dedicated to giving elements to answer that question.</p>
</div>
<script type='text/javascript'>if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
var location_protocol = (false) ? 'https' : document.location.protocol;
if (location_protocol !== 'http' && location_protocol !== 'https') location_protocol = 'https:';
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = location_protocol + '//cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>
</div>
<div class="series_links">
<a style="float: left;" href='/pros-of-AB-testing.html'>
< Part 3: The pros of the A/B testing method
</a>
</div>
<div class="series_links">
<a style="float: right;" href='/challenging-conclusion.html'>
Part 5: Challenging the claims and conclusion >
</a>
</div>
<div style="clear: both;"></div>
</article>
<div id="ending_message">
<p>© Gaël. Built using <a href="http://getpelican.com" target="_blank">Pelican</a>. Theme by Giulio Fidente on <a href="https://github.com/gfidente/pelican-svbhack" target="_blank">github</a>. </p>
</div>
</main>
</body>
</html>