comparisonError.tex

\section{Batch comparison audits of a tolerable overstatement in votes}
\label{sec:comparisonError}

In this section we expand previous comparison auditing work to handle two new requirements.  
The first relates to partitioning the permissible overstatement
through the parameters $\lambda_s$, as discussed in section~\ref{sec:hybrid}. 
The second handles batch-level comparison audits.

To test whether the overstatement of any margin
(in votes) exceeds some fraction $\lambda$ of the overall margin $V_{w\ell}$ between
reported winner $w$ and reported loser $\ell$.
If the stratum contains all the ballots cast in the contest, then for $\lambda = 1$, this 
would confirm the election outcome.
For stratified audits, we might want to test other values of $\lambda$, as described in Section~\ref{sec:hybrid}.

This section also addresses the second requirement by deriving a method for batches of arbitrary size, which might be useful
 to audit contests that include CVR counties and legacy counties.
We keep the \emph{a priori} error bounds tighter than the ``super-simple'' 
method~\cite{stark10d}.
To keep the notation simpler, we consider only a single contest, but the 
MACRO test statistic \cite{stark09c,stark10d} automatically extends the result to 
auditing $C>1$ contests simultaneously.
The derivation is for plurality contests, including ``vote-for-$k$'' plurality contests.
Majority and super-majority contests are a minor 
modification~\cite{stark08a}.\footnote{%
  So are some forms of preferential and approval voting, such as Borda count, and
  proportional representation contests, such as D'Hondt~\cite{starkTeague14}.
  See \url{https://github.com/pbstark/S157F17/blob/master/audit.ipynb} for a derivation
  of ballot-level comparison risk-limiting audits for super-majority contests. (Last visited 14 May 2018.)
  Changes for IRV/STV are more complicated.
}

\subsection{Notation}
\begin{itemize}
    \item  $\mathcal{W}$: the set of reported winners of the contest
    \item  $\mathcal{L}$: the set of reported losers of the contest
    \item  $N_s$ ballots were cast in all in the stratum. (The contest might not appear on all $N_s$ ballots.)
    \item  $P$ ``batches'' of ballots are in stratum $s$. A batch contains one or more ballots. Every ballot in stratum $s$ is in exactly one batch.
    \item  $n_p$: number of ballots in batch $p$. $N_s = \sum_{p=1}^P n_p$.
    \item  $v_{pi} \in \{0, 1\}$: the reported votes for candidate $i$ in batch $p$
    \item  $a_{pi} \in \{0, 1\}$: actual votes for candidate $i$ in batch $p$. 
                  If the contest does not appear on any ballot in batch $p$, then $a_{pi} = 0$.
                  
    \item  $V_{w\ell,s} \equiv \sum_{p=1}^P (v_{pw} - v_{p\ell})$: 
Reported margin in stratum $s$ of reported winner $w \in \mathcal{W}$ over reported loser 
$\ell \in \mathcal{L}$, in votes.

    \item  $V_{w\ell}$: 
Overall reported margin of reported winner $w \in \mathcal{W}$ over reported loser 
$\ell \in \mathcal{L}$, in votes, for the entire contest (not just stratum $s$)

% If I'm not punch drunk, we really did manage to keep generalizations across multiple contests out of 
% the subsequent exposition. However, we still use V
%
%    \item  $V_s$: smallest reported margin in the stratum among all $C$ contests audited using the same sample:
%$V_s \equiv \min_{w \in \mathcal{W}, \ell \in \mathcal{L}} V_{w \ell, s}$
%
%  \item  $V$: smallest reported overall margin among all $C$ contests audited using the same sample:
% $V \equiv \min_{w \in \mathcal{W}, \ell \in \mathcal{L}} V_{w \ell}$
    \item  $V$: smallest reported overall margin between any reported winner and reported loser:
$V \equiv \min_{w \in \mathcal{W}, \ell \in \mathcal{L}} V_{w \ell}$

    \item  $A_{w\ell,s} \equiv \sum_{p=1}^P (a_{pw} - a_{p\ell})$: 
actual margin in the stratum of reported winner $w \in \mathcal{W}$ over reported loser 
$\ell \in \mathcal{L}$, in votes

    \item  $A_{w\ell}$: 
actual margin of reported winner $w \in \mathcal{W}$ over reported loser 
$\ell \in \mathcal{L}$, in votes, for the entire contest (not just in stratum $s$)

\end{itemize}


\subsection{Reduction to maximum relative overstatement}
If the contest is entirely contained in stratum $s$, then
the reported winners of the contest are the actual winners if
$$ 
   \min_{w \in \mathcal{W}, \ell \in \mathcal{L}} A_{w\ell,s} > 0.
$$
Here, we address the case that the contest may include a portion outside the stratum.
To combine independent samples in different strata, it is convenient
to be able to test whether the net overstatement error in a stratum exceeds a given threshold.

Instead of testing that condition directly, we will test a condition that is sufficient 
but not necessary for the inequality to hold, to get a computationally simple test that
is still conservative (i.e., the risk is not larger than its nominal value).

For every winner, loser pair $(w, \ell)$, we want to test
whether the overstatement error exceeds some threshold, generally
one tied to the reported margin between $w$ and $\ell$.
For instance, for a stratified hybrid audit, we set the threshold to be
$\lambda_s V_{w\ell}$.

We want to test whether
$$
   \sum_{p=1}^P (v_{pw}-a_{pw} - v_{p\ell} + a_{p\ell})/V_{w\ell} \ge \lambda_s.
$$
The maximum of sums is not larger than the sum of the maxima; that is,
$$
\max_{w \in \mathcal{W},  \ell \in \mathcal{L}}
   \sum_{p=1}^P (v_{pw}-a_{pw} - v_{p\ell} + a_{p\ell})/V_{w\ell}
   \le
  \sum_{p=1}^P  \max_{w \in \mathcal{W},  \ell \in \mathcal{L}} 
  (v_{pw}-a_{pw} - v_{p\ell} + a_{p\ell})/V_{w\ell}.
$$

Define 
$$
  e_p \equiv \max_{w \in \mathcal{W} \ell \in \mathcal{L}} 
     (v_{pw}-a_{pw} - v_{p\ell} + a_{p\ell})/V_{w\ell}.
$$
Then no reported margin is overstated by a fraction $\lambda_s$ or more
if 
$$ 
  E \equiv \sum_{p=1}^P e_p < \lambda_s.
$$

Thus if we can reject the hypothesis $E \ge \lambda_s$, we can conclude that
no pairwise margin was overstated by as much as a fraction $\lambda_s$.

Testing whether $E \ge \lambda_s$ would require a very large sample if we knew nothing at
all about $e_p$ without auditing batch $p$: a single large value of $e_p$ could make
$E$ arbitrarily large.
But there is an \emph{a priori} upper bound for $e_p$.
Whatever the reported votes $v_{pi}$ are in batch~$p$, we can find the
potential values of the actual votes $a_{pi}$ that would make the
error $e_p$ largest, because $a_{pi}$ must be between 0 and $n_p$,
the number of ballots in batch~$p$:
$$
    \frac{v_{pw}-a_{pw} - v_{p\ell} + a_{p\ell}}{V_{w\ell}} \le 
    \frac{v_{pw}- 0 - v_{p\ell} + n_p}{V_{w\ell}}.
$$
Hence,
\begin{equation} \label{eq:uDef}
    e_p \le \max_{w \in \mathcal{W}, \ell \in \mathcal{L}} 
    \frac{v_{pw} - v_{p\ell} + n_p}{V_{w\ell}} \equiv u_p.
\end{equation}

Knowing that $e_p \le u_p$ might let us conclude reliably that $E < \lambda_s$
by examining only a small number of batches---depending on the 
values $\{ u_p\}_{p=1}^P$ and on the values of $\{e_p\}$ for the audited batches.

To make inferences about $E$, it is helpful to work with the \emph{taint} 
$t_p \equiv \frac{e_p}{u_p} \le 1$.
Define $U \equiv \sum_{p=1}^P u_p$.
Suppose we draw batches at random with replacement, with probability $u_p/U$
of drawing batch $p$ in each draw, $p = 1, \ldots, P$.
(Since $u_p \ge 0$, these are all positive numbers, and they sum to 1,
so they define a probability distribution on the $P$ batches.)

Let $T_j$ be the value of $t_p$ for the batch $p$ selected in the $j$th draw.
Then $\{T_j\}_{j=1}^n$ are IID, $\mathbb{P} \{T_j \le 1\} = 1$, and
$$
  \mathbb{E} T_1 = \sum_{p=1}^P \frac{u_p}{U} t_p =
  \frac{1}{U}\sum_{p=1}^P u_p \frac{e_p}{u_p} = 
  \frac{1}{U} \sum_{p=1}^P e_p = E/U.
$$
Thus $E = U \mathbb{E} T_1$. 
So, if we have strong evidence that
$\mathbb{E} T_1 < \lambda_s/U$, we have
strong evidence that $E < \lambda_s$.

This approach can be simplified even further by noting that $u_p$ has
a simple upper bound that does not depend on $v_{pi}$.
At worst, the reported result for batch $p$ shows $n_p$ votes for the 
``least-winning'' apparent winner of the contest with the smallest margin, 
but a hand interpretation would show that all $n_p$ ballots in the batch 
had votes for the runner-up in that contest.
Since $V_{w\ell} \ge V\equiv \min_{w \in \mathcal{W}, \ell \in \mathcal{L}} V_{w \ell}$ and $0 \le v_{pi} \le n_p$,
$$ 
    u_p =  \max_{w \in \mathcal{W}, \ell \in \mathcal{L}} 
    \frac{v_{pw} - v_{p\ell} + n_p}{V_{w\ell}}
    \le  \max_{w \in \mathcal{W}, \ell \in \mathcal{L}} 
    \frac{n_p - 0 + n_p}{V_{w\ell}}
    \le \frac{2n_p}{V}.
$$
Thus if we use $2n_p/V$ in lieu of $u_p$, we still get conservative results.
(We also need to re-define $U$ to be the sum of those upper bounds.)
An intermediate, still conservative approach would be to use this upper bound for
batches that consist of a single ballot, but use the sharper bound (\ref{eq:uDef})
when $n_p > 1$.
Regardless, for the new definition of $u_p$ and $U$,
$\{T_j\}_{j=1}^n$ are IID, $\mathbb{P} \{T_j \le 1\} = 1$,
and
$$
  \mathbb{E} T_1 = \sum_{p=1}^P \frac{u_p}{U} t_p =
  \frac{1}{U}\sum_{p=1}^P u_p \frac{e_p}{u_p} = 
  \frac{1}{U} \sum_{p=1}^P e_p = E/U.
$$

So, if we have evidence that $\mathbb{E} T_1 < \lambda_s/U$, we have evidence that 
$E < \lambda_s$.

\subsection{Testing $\mathbb{E} T_1 \ge \lambda_s/U$}

A variety of methods are available to test whether $\mathbb{E} T_1 < \lambda_s/U$.
One particularly ``clean'' sequential method is based on Wald's Sequential Probability
Ratio Test (SPRT) (\cite{wald45}).
Harold Kaplan pointed out this method on a website that no longer exists.
A derivation of this ``Kaplan-Wald'' method is given in Appendix A of~\cite{starkTeague14};
to apply the method here, take $t = \lambda_s$ in their equation~18.

A different sequential method, the Kaplan-Markov method (also due to Harold Kaplan), 
is given in~\cite{stark09b}.