introduction.tex


% importance of privacy and its formal analysis

Privacy has been a hotly debated issue since late nineteenth
century~\cite{WB:90:RP}. With the advent of social media, privacy is
perhaps one of the most relevant topics in the digital age as
well. Indeed, numerous privacy measures for publishing data
have been proposed~\cite{S:2002:KAMPP,FWC:10:PPDP}. Whether these
measures are taken properly and effectively concerns every
individuals. The importance of privacy research is thus noted by the
formal method community as well~\cite{TW:09:FMP}.

% differential privacy

Differential privacy is a framework for the design of privacy
measures~\cite{D:06:DP,DR:14:AFDP}. In the framework, data publishing
mechanisms are formalized as randomized algorithms. On any input data
set, such mechanisms return randomized answers to data users' queries.
Output distributions naturally correspond to information released by
data publishing mechanisms. In order to preserve privacy, data
curators suffice to ensure that similar output distributions are
yielded by mechanisms on similar input data sets. Thanks to its simple
formalization, differential privacy moreover allows data curators
to evaluate privacy and utility quantitatively. The framework
subsequently attracts lots of attentions from academia and industry.

% pufferfish privacy

Pufferfish is a more recent privacy framework which subsumes
differential privacy~\cite{KM:14:PFMPD}. In differential privacy,
there cannot be correlation among entries in data sets. For data sets
with correlated entries, differential privacy can still leak
noticable private information~\cite{KM:11:NFLDP}. The no free lunch
theorem in data privacy shows that prior knowledge about data sets is
crucial to privacy analysis. The Pufferfish privacy framework allows
data curators to analyze privacy with prior knowledge about data
sets. Under the Bayesian privacy framework, it is shown that
differential privacy preserves privacy if there is no correlation
among entries in data sets (Theorem~2 in~\cite{KM:14:PFMPD}).

% importance of formal methods in privacy

Although privacy frameworks help data curators design data publishing
mechanisms, they do not necessarily induce privacy-respecting
mechanisms. In differential and Pufferfish privacy, data publishing
mechanisms are analyzed with sophisticated mathematical tools.
Misinterpretation of privacy proofs can lead to incorrect
modifications or generalizations of privacy-respecting mechanisms.
Indeed, several published variations of differentially private
mechanisms are shown to voilate privacy~\cite{DWWZK:18:DVDP}. The
formal method community has also started to develop techniques for
checking differentially private
mechanisms~\cite{TKD:11:FVDPIS,GHHNP:13:LDTDP,BDGKZ:13:VCDP}.

% markov models as formal models for privacy analysis
% hidden markov models

In this work, we focus on a lightweight but automatic technique for
checking Pufferfish privacy. Our goal is to develop a general
verification technique for the Bayesian privacy framework. To do so,
we propose a formal model for data publishing mechanisms and
investigate the verification problem for Pufferfish privacy.
In~\cite{LWZ:18:MCDPP}, the authors propose Markov chains and Markov
decision processes to model data publishing mechanisms. Several known
mechanisms are formalized as different Markov models and checked to
satisfy differential privacy.

We propose to formalize Pufferfish privacy on hidden Markov models. A
data publishing mechanism is again the underlying Markov
chain associated with a hidden Markov model. Attackers' prior
knowledge is thus modeled by an initial state distribution. Based on
our formalization, we give a formal model for the geometric mechanism
and analyze it with Pufferfish privacy. Interestingly, hidden Markov
models allow us to explain subtleties between differential privacy and
Pufferfish privacy concisely. This strongly favors our formalization
as the right abstraction for Pufferfish privacy. We believe data
curators will be able to understand the Bayesian privacy framework
through this work.

We furthermore investigate the verification problem for Pufferfish
privacy on hidden Markov models. By showing the reduction from the
Boolean Satisfiability Problem(SAT), the verification problem proves
to be NP-hard. Though, using Satisfiability Modulo Theories solvers,
we design a verification algorithm and automatically verify Pufferfish
privacy problems with our implementation in \zpython.

The rest of the paper is organized as follows. Preliminaries are given in
Section~\ref{section:preliminaries}. In Section~\ref{section:pufferfish}
and Section~\ref{section:hmm}, we discuss Pufferfish privacy framework
and several privacy examples modeling in HMMs. In Section~\ref{section:complexity}
and Section~\ref{section:checking-pufferfish}, we investigate the complexity
of verification problem in Pufferfish and design an automatic algorithm
to verify it. A classical case study is given in Section~\ref{section:noisy-max}
and conclusions are discussed in Section~\ref{section:conclusions}.

% results