variational_methods.tex


\chapter{Probability, Inference, and Thermodynamics}{\label{sec:variationalMethods}

\section{Introduction}

Need a systematic way to make inferences.

This entails using probabilities to model our knowledge of the world.


\section{Probability}


Probability theory studies the consistency of the various statements that can be made of the world.
The statements may or may not be true,
where there is uncertainty in some event, 
each possible outcome results in a valid statement.
%there are many valid statements that can be made of the world, one for each possible outcome.
Statements may be atomic, in that they reference no other event,
or they may be compound, in that they depend upon other, simpler, statements.

Given a set of outcomes to an uncertain event, 
an individual may ascribe each statement with the degree to which it is to be believed.
Probability theory determines how to combine the beliefs of simpler statements
into the belief of their compound in a consistent manner,
and describes how beliefs should change in order to remain consistent when new information becomes available.
Whether the original assignments are based on experience, prejudice or mere caprice,
and whether they coincide with the assignments of any other individual does not matter.
Probability theory concerns itself only with maintaining a self-consistent set of beliefs.
%that depend upon an \apriori\ model that defines the set of possible outcomes.


%The formulation of a set of possible outcomes and the assignment of their beliefs constitutes a model of the uncertain event.
A model first defines the set of possible outcomes and second assigns the \apriori\ beliefs.
It defines  how one interprets the world.
A experimental test can make more precise the parameters of the model,
or one may find that experimental data better supports an alternative model,
but the model is always the starting point.
In this thesis, therefore, an individual's view of the world is characterised by the totality of the models that they use to describe the world.

The subjectivity intrinsic to the possibly capricious formulation of a model may be disconcerting. 
A consistent and agreed upon science can result, however,
when there exists a nature to which differing models may be compared.
Consensus can result by subjecting a model to experimental test and by sharing successful models.

%One last comment is in order, however,
%and that is  the inherent lack of knowledge involved even when carrying out a measurement.
%By this I do not mean the banal problems of finite experimental precision,
%the {\em experimental noise} that introduces uncertainty into the quantity measured.
%Rather, I refer to the difficulties in determining how distances and times are to be measured at all.
%If a signal, such as light or sound, is used to locate a faraway entity
%then the true location of the entity is no longer measurable.
%The experimenter has no knowledge of what happens to that signal after it has been sent and before it returns:
%both the signal's route and speed are unknown.
%Assuming the  signal's route to be direct and speed to be constant throughout is a common and necessary convention, for otherwise no definition of measurement is possible.
%It must, however, be emphasised that in reality the speed of the signal is no more attainable than its path.
%Much more will be said on this matter in \chapref{measurement}.  
%For now we just note that our knowledge of the world, through our models and experiments,
%is, and is set to remain, remote from the true natural order of things.

Before moving on,
the world view adopted by probability theory should be located in its philosophical context.
In this regard the closeness of Wittgenstein's {\em Tractatus Logico-Philosophicus} to modern probability theory should be noted. 
This thesis shall not attempt to unpick where Wittgenstein's views agree and differ from modern probability theory 
(for such an attempt see \cite{WittgensteinLattice}),
and a discussion on the relative merits of other world views is not considered to be within the scope of this thesis.
Nevertheless, %to aid the interested reader in locating the  philosophy of probability theory, 
the reader is invited to keep the following quotes in mind during this chapter.
Firstly, regarding the primacy of models when conceiving the world,  Wittgenstein writes 
\begin{quote}
  The world is the totality of facts, not of things.  For the totality of facts determines both what is the case, and also all that is not the case.  Every thing is, as it were, in a space of possible atomic facts. I can think of this space as empty, but not of the thing without the space. (1.1, 1.12, 2.013)
\end{quote}
Indeed, the facts have a definite structure that is determined by how their compound is formed,
and this structure is assumed to be what is shared with nature:
\begin{quote}
Atomic facts are independent of one another. We make to ourselves pictures of facts. The picture is a model of reality.    
That the elements of the picture are combined with one another in a definite way, represents that the things are so combined with one another.
This connexion of the elements of the picture is called its structure, and the possibility of this structure is called the form of representation of the picture.  (2.061, 2.1, 2.12, 2.15)
\end{quote}
Finally, Wittgenstein expresses very directly the  difficulty in articulating any innate truth that is external to a model of the world:
\begin{quote}
The world and life are one. I am my world. (The microcosm)... [A]t death the world does not alter, but comes to an end. (5.621, 5.63, 6.431)
\end{quote}
The natural world exists, but is  mystical because it can only be guessed at from a constructed world view,
\begin{quote}
Not how the world is, is the mystical, but that it is.  
The contemplation of the world sub specie aeterni is its contemplation as a limited whole. 
The feeling that the world is a limited whole is the mystical feeling. (6.44, 6.45)
\end{quote}
% This thesis shall not attempt to unpick where Wittgenstein views agree and differ from modern probability theory 
% (for such an attempt see \cite{WittgensteinLattice}).
% In the presentation that follows, however,
% propositions from the {\em Tractatus} will periodically be referenced where we believe the viewpoint is the same.
% I hope that this is of interest to the reader.


% and that is the difficulties inherent in carrying out measurements.
% To measure the world, one must decide how to measure the world,
% and in particular, how to measure distances and times.
% Einstein introduced the {\em convension} of using light to carry out measurements.
% However, as is argued in \chapref{measurement},
% other convensions are possible.
% In ultrasound physics, measurements are carried out with and indeed more natural for certain measurements.
% For example 


%  prior to any experiment, 

% Probability theory enables the model to be updated consistently in the light of new experimental evidence,
% and informs us that an experiment agrees with one model better than another.

% An individuals view of the world therefore, 

% In this thesis, therefore, an individual's view of the world is characterised by the totality of the models that they use to describe the world.
% The questions that different individuals puts to the world may be different,
% and so their chosen models will be different too.
% To test the quality of the model, the world must be put to experimental test.
% However, in order to do so, a model is also required for time and space.
% It is impossible to measure beyond these. 


% the degree to which each statement should be believed will often be the subject of disagreement.
% Nevertheless, whether the degrees of belief are 

% The outcome of an uncertain event 
% Individuals may disagree with the likelihood of the various outcomes of an uncertain event.
% Everybody is able to ascribe a degree to which the various outcomes of an uncertain event should be believed.


% but whether based on experience or caprice,
% each can ascribed a degree of belief.
% %but are formed with greater or lesser certainty out of all the possible outcomes to their subject.
% %they are ascribed a degree of belief 
% %A degree of belief, whether based on experience or prejudice, can therefore be ascribed to a statement.


\subsection{The Lattice of Statements}

Statements can be ordered in a lattice in a very natural way.


\subsection{Measure}
\subsection{Divergence}
\section{Thermodynamics}


Probability distributions can be used to represent our knowledge of the world.
For example, the value of a  experimentally obtained variable will in general
fluctuate around its average.  
If different runs of the experiment are independent
then the  distribution of obtained values fully describes the experiment.

What is learned from a given experiment is then characterised with how the probability distributions
that represent our knowledge change.
If a hypothesis, $\H$, is that a set of experimental data, $\vx = \{x_n|n=1,\ldots,N\}$, should conform to a model with a set of parameters, $\vw = \{w_i| i=1,\ldots,I\}$,
then our full knowledge of the system is given by the joint probability distribution
\begin{align}
  P\lr{\vx,\vw,\H}.
  \label{eqn:fullJointDist}
\end{align}
Of greater importance than \eqnref{fullJointDist}, however,
is to determine how our knowledge of the model changes when we collect the experimental data.
This can be found from  \eqnref{fullJointDist} by splitting the joint distribution into its conditional probabilities.
\begin{align}
  P\lr{\vx,\vw,\H} = P\lr{\vw|\vx,\H}P\lr{\vx|\H}
  =  P\lr{\vx|\vw,\H} P\lr{\vw|\H}.
\end{align}
from which it follows that 
\begin{align}
   P\lr{\vw|\vx,\H} =  \frac{P\lr{\vw|\vw,\H} P\lr{\vw|\H}}{P\lr{\vx|\H}}.
  \label{eqn:BayesTheorem}
\end{align}
Equation \Eqnref{BayesTheorem} is Bayes Theorem.
It states that the probability of the model's parameters, {\em given the data},
can be determined from the probability of the data when the parameters are known, and the a  probability of parameters {\em before the data was known}.
It describes exactly the process of inference.

The term $P\lr{\vx|\vw,\H}$ is the likelihood function.
It evaluates the degree to which the model with a given set of parameters agrees with the experimental data.
If it is assumed that every data point is independent, and that each datum should agree with the prediction of the model, $t_n$, 
to within Gaussian noise
then the likelihood function would be,
\begin{align}
  P\lr{\vx|\vw,\H} = \prod_{n=1}^N \sqrt{\frac{\gamma}{2\pi}}e^{-0.5\gamma\lr{x_n-t_n}^2}.
\end{align}
The variable $\gamma$ is the precision - the inverse of the variance - and is one of the set $\{w_i\}$.

The term $P\lr{\vw|\H}$ in \eqnref{BayesTheorem} is independent of the experimental data $\{x_n\}$.  
It  represents our knowledge of the parameters before the experiment was carried out.
It could be that the parameters are already known to great precision - 
in which case the probability distribution would tend towards a delta function.
Alternatively, it could that the a priori knowledge of the precision, say, 
does not expend beyond the requirement that the precision is positive definite.
In this case the prior distribution would be represented by a scale invariant positive definite distribution.
One such example is the Gamma distribution,
\begin{align}
  P(\gamma|s,c) = \frac{1}{\Gamma(s)c}\lr{\frac{x}{s}}^{c-1}\exp\lr{-\frac{x}{s}},
\label{eqn:Gamma}
\end{align}
in the limit such that $sc = 1$ and $c\rightarrow 0$ \cite{MacKay2003}.

The hypothesis, $\H$,  encompasses all of the assumptions that go into the inference.
These include the choice of the model that is fitted to the data, 
the prior probabilities assigned to the model variables and the 
the noise model described by the likelihood function.
These assumptions are inevitable - they reflect our uncertainty 
 prompts the experiment in the first place.
However, 
since many different hypotheses can be dreamed up,
it is important to be able to be able to evaluate how each is supported by the experimental data.
For this, Bayes Theorem can be applied a second time:
the probability of the hypothesis, given the data, is
\begin{align}
P\lr{\H | \vx } = \frac{P\lr{\vx|H}P\lr{\H}}{P\lr{\vx}}.
\label{eqn:BayesHyp}
\end{align}
Since the probability of the data, $P\lr{\vx}$, 
is independent of the hypothesis
it can be eliminated when comparing two hypotheses, $\H_1$ and $\H_2$,
\begin{align}
\frac{P\lr{\H_1 | \vx }}{P\lr{\H_2 | \vx }} = \frac{P\lr{\vx|\H_1}}{P\lr{\vx|\H_1}}\frac{P\lr{\H_2}}{P\lr{\H_2}}.
\label{eqn:ModelCmp}
\end{align}
The second of the ratios on the right-hand-side of \eqnref{ModelCmp}
give an opportunity, if desired, to prefer one model over another irrespectively of any data collected.
The first quotient is determined from the experimental data.
The term $P\lr{\vx|\H}$ is called the evidence and it is the partition function of  \eqnref{BayesHyp}.

A model that is highly constrained will be inflexible in the range of predictions it can make,
whereas a model that has many free parameters will be able to predict a vast number of possible outcomes.
The more constrained model will therefore have a smaller set of likely outcomes,
but each of these will have a much greater probability than the many possible outcomes of the less constrained model.
The right-hand-side of \eqnref{ModelCmp} therefore directly and quantitively embodies Occan's razor,
the rule of thumb that states that `simpler' models should be favoured over more complicated models.
For a more detailed discussion of model comparison and Occan's razor see \cite[Chapter 28]{Mackay2003}.

To evaluate the evidence the numerator in equation \eqnref{BayesTheorem} must be integrated over the entire parameter space,
\begin{align}
  P\lr{\vx|\H} = \int_\vw d\vw P\lr{\vw|\vw,\H} P\lr{\vw|\H}
\end{align}
In general this cannot be done analytically.
However, it is often the case that the probability density tightly peaked about the maximum.
In this case the evidence may be evaluated by approximating the peak with a Gaussian, which can be integrated.
This is the saddle point approximation.
Expanding the logarithm of the  unnormalised probability distribution, $P^\ast$, 
around the maximum, $\vx_0$,
gives
\eq{
  \ln P^\ast = \ln P^\ast(\vx_0) - \frac{1}{2}\lr{\vx-\vx_0}^T \vA\lr{\vx-\vx_0 }
}
where 
\eq{
\vA = A_{ij} = \frac{\d^2}{\d x_i\d x_j} \ln P^\ast(\vx_0)
}
is the Hessian matrix.
The right-hand-side of equation \eqnref{BayesTheorem}  is therefore approximated by the multidimensional Gaussian
\begin{align}
   P\lr{\vw|\vx,\H} = P^\ast(\vx_0) \exp \lr{- \frac{1}{2}\lr{\vx-\vx_0}^T \vA\lr{\vx-\vx_0 }}
\end{align}
for which the normalisation constant, the evidence, is 
\begin{align}
   P^\ast(\vx_0) \sqrt{\frac{\lr{2\pi}^K}{\det \vA}}
\end{align}


\subsection{Conjugate Exponential Variables}
\begin{align}
\ln P(X|Y) = \phi(Y) u(X) + f(X) + g(Y)
\end{align}
then conjugacy implies
\begin{align}
\ln P(W|X) = \tilde\phi(W) u(X) + h(W).
\end{align}
so that variable $X$ is treated the same.

\section{Variational Approach}

\section{Kullback-Leibler divergence}

Variational methods can be used to approximate a probability distribution, $P$, that is impossible to evaluate exactly, 
with a probability distribution that is more maluable.
The approximation is varied so that it matches the original distribution as closely as possible. 


The amount of information that is lost when a distribution $Q$ is used in place of the distribution $P$ is measured by the relative entropy, 
a quantity known as the  Kullback-Leibler divergence.
%The Kullback Leibler divergence gives a measure of the similarity of two distributions,
It is defined,
\begin{align}
\KLD{Q}{P} &= \int_\vH Q(\vH|\H) \log\frac{Q(\vH|\H)}{P(\vH|\vD,\H)} d\vH,
\end{align}
where $P$ and $Q$ are probability distributions that model a hypothesis, $\H$.
$\vH$ is a set of unknown variables that form the model and $\vD$ is as set of known variables.
From Gibbs inequality it follows that 
\begin{align}
\KLD{Q}{P} \ge 0
\end{align}
and equality is only when $P=Q$.
That is, knowledge of the system is always lost when it is approximated.

The Kullback-Leibler divergence will be minimised in two different ways in this thesis.

\section{Statistical Mechanics}

  Let 
  \begin{align}
    P = \frac{1}{Z} e^{-\beta \scalar{E} }
  \end{align}
  then
  \begin{align}
    \beta \tilde{F} &= \int Q(x) \ln \frac{Q(x)}{\exp\lr{-\beta \scalar{E} }} \\
    &= \int Q(x) \ln \frac{Q(x)}{P} - \ln Z \\
    &= \KLD{Q}{P} - \ln Z
    &= \KLD{Q}{P} + F.
  \end{align}
  where $F \equiv  - \ln Z$ is the free energy and 
  \begin{align}
    Z = 
  \end{align}

\section{Variational Ensemble Learning}


This property makes the Kullback-Leibler divergence a useful function minimise.
Indeed, using $P(\vH,\vD|\H) = P(\vH|\vD,\H) P(\vD|\H)$, 
we may write,
\begin{align}
  \KLD{Q}{P}  % &=\int_\vH Q(\vH) \log\frac{Q(\vH)}{P(\vH|\vD)} d\vH \\
  &= \int_\vH Q(\vH|\H) \log\frac{Q(\vH|\H)}{P(\vH,\vD|\H)} d\vH  +  \log P(\vD|\H) \\
  &= -S_Q - \int_\vH Q(\vH|\H) \log P(\vH,\vD|\H)  d\vH + \log P(\vD|\H)
\end{align}
where $S_Q = - \int_\vH Q(\vH|\H) \log Q(\vH|\H) d\vH$ is the entropy given the hypothesis%
\footnote{
\begin{quote}
Consider, for example, a crystal of Rochelle salt.
For one set of experiments on it, we work with temperature, pressure and volume.
The entropy can therefore be expressed as some function $S_e(T,P)$.
For another set of experiments on the same crystal,
we work with temperature, the component $e_{xy}$ of the strain tensor,
and the component $P_z$ of the electric polarisation;
the entropy as found in these experiments is a function $S_e(T, e_{xy}, P_z)$.
It is clearly meaningless to ask ``What is the entropy of the crystal?''
unless we first specify the set of parameters which define its thermodynamic state.%

One might reply that in each of the experiments cited, 
we have used only part of the degrees of freedom of the system,
and there is a ``true'' entropy which is a function of all these parameters simultaneously.
However we can always introduce as many parameters as we please...
There is no end to this search for the ultimate ``true'' entropy until we have reached the point where we control
the location of each atom independently.
But just at that point the notion of entropy collapses, and we are no longer talking thermodynamics!

From this we see that entropy is an anthropomorphic concept,
not only in the well known statistical sense that it measures the extent of human ignorance as to the microstate.
{\em Even at the purely phenomenological level, entropy is an anthropomorphic concept.}
For it is a property, not of the physical system,
but of the particular experiments that you or I choose to perform on it.

\flushright Edwin T. Jaynes\cite{Jaynes1965}
\end{quote}
}.
Define the  cost function
\begin{align}
  \L =  \int_\vH Q(\vH) \log P(\vH,\vD) ) d\vH +S_Q  
\end{align}
From which it follows that
\begin{align}
  \L &=   \log P(\vD,\H) - \KLD{Q}{P} \\
     &\le \log P(\vD,\H)
\end{align}
The probability of the model is 
\begin{align}
  P(\H, \vD) &=    \frac{P(\vD, \H) P(\H)}{P(\vD)}\\
             &\le  \frac{\L(Q)P(\H)}{P(\vD)}
\end{align}
Assuming that the variables are independent gives
\begin{align}
Q\lr{\vH} = \prod_n^N Q_n\lr{H_n}
\end{align}
where $Q_n$ is the independent distribution for the $n$th variable.
Then 
\begin{align}
  \L&=  \int_\vH \prod_n^NQ_n(H_n) \log P(\vH,\vD) ) d\vH - \sum_n^N \int_\vH Q_n(H_n)  \log Q_n(H_n|\H) dH_n
\end{align}
Separating out the $j$th element gives
\begin{align}
  \L &= \int_\vH Q_j(H_j)\prod_{n\ne J}^NQ_n(H_n) \log P(\vH,\vD) ) d\vH + S_{Q_j} +  \sum_{n\ne j}^N S_{Q_{n}}
   \\ &= \int_{H_j} Q_j(H_j) \multi{\log P(\vH,\vD)}{\prod_{i\ne j} Q_i\lr{H_i}} +S_{Q_j} +  \sum_{n\ne j}^N S_{Q_{n}}
\end{align}
Introducing
\begin{align}
  Q^\ast_j = \frac{1}{Z}e^{\multi{\log P(\vH,\vD)}{\prod_{i\ne j} Q_i{H_i}}}
\end{align}
gives
\begin{align}
\L &= \int_{H_j} Q_j(H_j) \log Q^\ast dH_j  +S_{Q_j} +  \sum_{n\ne j}^N S_{Q_{n}} - \log Z
\\ &= \KLD{Q_j}{Q^\ast_j}  - \log Z -  \sum_{n\ne j}^N S_{Q_{n}} 
\end{align}
which is maximal with respect to $Q_j$ when $Q_j = Q^\ast_j$, and so the function is maximised when
\begin{align}
  \log Q^\ast = \multi{\log P(\vH,\vD)}{\prod_{i\ne j} Q_i\lr{H_i}}
\end{align}
 
Now 
\begin{align}
  P(X_1, X_2, \ldots, X_N) = \prod_i^N P(X_i|\parents{i})
\end{align}
 so 
 \begin{align}
 \ln Q^\ast_j(H_j) = \multi{\ln P\lr{H_j|\parents{j}} + \sum_{i \in  \children{j}} \ln P\lr{X_i| H_j, \coparents{j}}}{\sim Q(H_j)} + \const
 \end{align}
If conjugate exponential models then
\begin{align}
\ln Q^\ast_j(H_j) &= \multi{\phi(\parents{j}) u(H_j) + f(H_j) + g(\parents{j}) }{\sim Q(H_j)}
\nonumber \\
&+\multi{\sum_{i \in  \children{j}}  \tilde\phi(X_i,\coparents{j}) u(H_j) + h(X_i,\coparents{j})  }{\sim Q(H_j)} + \const\\
&= \multi{\phi(\parents{j}) + \sum_{i \in  \children{j}}  \tilde\phi(X_i,\coparents{j}) }{\sim Q(H_j)} u(H_j)+ f(H_j)+\const
\end{align}
from which it follows that
\begin{align}
  \phi^\ast_j = \scalar{\phi(\parents{j})} + \sum_{i \in  \children{j}}  \scalar{\tilde\phi(X_i,\coparents{j}) }
\end{align}
where expectations are with respect to $Q$.

The message from a variable node to a function node is
\begin{align}
  m_{X_i\rightarrow f_j} = \Moments{Q_i}
\end{align}

The message from a function node to a variable node is
\begin{align}
  m_{f_i\rightarrow X_j} = \Natural{\multi{f_i(\neighbour{i})}{Q_{{\neighbour{i} \bs X_j}}}}
\end{align}
Updated variable node is then given by
\begin{align}
  \Natural{Q^\ast(X_i)} = \sum_{j\in\neighbour{i}} m_{f_j \rightarrow X_i}
\end{align}


\section{Independent Components of the pulses}

Assume differing bubble sources as independent components in bubble.

\begin{align}
\vx_t = \vA \vs_t
\end{align}
One model for this is a Gaussian
\begin{align}
  P(x_t| \Lambda) = \G(x_t; AS_t, \Lambda)
\end{align}
However,
from the \figref{} it is seen that the Gaussian noise model is not good.
A better alternative is to use Fourier decomposition,

\begin{align}
  \vx_\omega = \vA \vs_\omega
\end{align}
such that
\begin{align}
  P(\vx_\omega| \Lambda_\omega) = \G( \vx_\omega ; AS_\omega, \Lambda_\omega)
\end{align}

\begin{align}
  P(\vH|\vD) = G(\vH)
\end{align}
\begin{align}
\KLD{Q}{P} &= \int_\vH Q(\vH) \log\frac{Q(\vH)}{P(\vH|\vD)} d\vH \\
  &= \int_\vH Q(\vH) \log\frac{Q(\vH)}{P(\vH,\vD)} d\vH  + \int_\vH Q(\vH) \log P(\vD) d\vH\\
  &= \int_\vH Q(\vH) \log Q(\vH) d\vH - \int_\vH Q(\vH) \log Q(\vH) \log P(\vH,\vD) ) d\vH + \log P(\vD)
\end{align}
Bring cost function
\begin{align}
  \L =  \int_\vH Q(\vH) \log Q(\vH) \log P(\vH,\vD) ) d\vH - Q(\vH) \log Q(\vH) d\vH 
\end{align}
From which it follows that
\begin{align}
  \L &=   \log P(\vD,\H) - \KLD{Q}{P} \\
     &\le \log P(\vD,\H)
\end{align}
The probability of the model is 
\begin{align}
  P(\H, \vD) &=    \frac{P(\vD, \H) P(\H)}{P(\vD)}
             &\le  \frac{\L(Q)P(\H)}{P(\vD)}
\end{align}


\subsubsection{The model}
\begin{align}
  P(s_{m\omega}| \H) &= \sum_{c=1}^{N_c} \pi_{mc}\G(s_{m\omega};0,\beta_{\omega c})\\
  P(\beta_{\omega c} &= \GammaDistr(\beta_{mc} ; b^{(\beta)}, c^{(\beta)})\\
  P(\{\pi_{mc}\}_{c=1}^{N_c}|\H)  &= \Dirichlet\lr{ \{\pi_{mc}\}_{c=1}^{N_c} | c^{(\pi)}}
\end{align}
And mixture
\begin{align}
  P(A_{nm}|\H) &= \G(A_{nm}; 0,\alpha_m)\\
  P(\alpha_m| \H) &= \GammaDistr(\alpha_m| b^{(\alpha)},c^{(alpha)})
\end{align}
and gamma
\begin{align}
  P(\Lambda_{\omega},\H) = \GammaDistr(\Lambda_\omega;b^{(\Lambda)},c^{(\Lambda)} )
\end{align}

Simplify the distribution
\begin{align}
Q\lr{\vs, \vA, \pi, \beta, \alpha, \Lambda} = Q\lr{s_{\omega m}}Q\lr{A_{nm}}Q\lr{\pi}Q\lr{\beta}Q\lr{\alpha}Q\lr{\Lambda}
\end{align}

\begin{align}
  Q\lr{s_{nm}} = \G(s_{m \omega};\hat{s}_{m\omega}, \tilde{s}_{m\omega})\\
  Q\lr{A_{nm}} = \G(A_{mn};\hat{A}_{mn}, \tilde{A}_{mn})\\
  Q\lr{\beta_{mc}} = \Gamma(\beta_{mc};\hat{A}_{mn}, \tilde{A}_{mn})
\end{align}


\section{Density Functional Theory theory}\label{app:DFT}


\subsection{Introduction}

\Dft\ relaxes the capillary approximation used in \cnt.
The density of the nucleated bubble is not assumed to be that of the bulk,
and the interface is not assumed to be macroscopic and plainer\cite{Oxtoby1992, Oxtoby1998}.
\Dft\ therefore does a much better job at modelling the interface than \cnt.
Rather than it being a sudden boundary,
there is a finite interval over which the density varies from that of the fluid to that of the vapour
and the bubble is modelled for what it is -  a fluctuation in density - 
rather than a vapour entrapped in  flexible boundary.

If  spherical symmetry is assumed then 
the bubble boundary is defined by its radius.
The critical radius is such that\cite{Oxtoby1992,Oxtoby1998}
\begin{align}
  \frac{d \Omega}{d a} =0,\quad\text{at $a = \astar$} \label{eqn:DFT:astarR}
\end{align}
where $\Omega$ is the {\em grand potential}.


The grand potential in \eqnref{DFT:astar} is difficult to evaluate, however,
as it is a functional of the 
phase space positions of all $N$ molecules in the system.
Specifically,  
\begin{align}
  \Omega = -\beta^{-1}\ln \Xi.
\end{align}
where $\Xi$ is the grand partition function
\begin{align}
  \Xi = \Tr \exp\lr{-\beta \lr{H_N - \mu N}}. \label{eqn:nuc:GPF}
\end{align}
 $\Tr$ denotes the trace operator
\begin{align}
  \Tr \equiv \sum_{N=0}^\infty \frac{1}{h^{3N}N!} \iint d\cx_1 d\cp_1
\end{align}
and we have compacted the integral by writing 
\begin{align}
d\cx_n &\equiv dr_n dr_{n+1}\ldots dr_N, &&\quad\text{and}&
d\cp_n &\equiv  dp_n dp_{n+1}\ldots dp_N.
\label{eqn:dshorthand}
\end{align}
In \eqnref{nuc:GPF} $\mu$ denotes the chemical potential and $\H$ denotes the Hamiltonian of the molecules.

The difficulty in evaluating $\Omega$ comes from the interactions between the  molecules.
In order to consider the couplings explicitly we split $\H$ into 
\sub{
\begin{align}
%  \begin{array}{ll}
  \KE &= \sum_i^N \frac{p_i^2}{2m}, && \text{which is the kinetic energy,}\label{eqn:nuc:Kinetic}\\
  \UE &= \UE(\cx_1), && \text{ the internal energy and}\\
  \VE &= \sum_i^N V_\ext(r_i) && \text{the external potential.}
%  \end{array}
\end{align}
}
so that the overall Hamiltonian can be written %in terms of the intrinsic potentials, $\H_\in$ and external potential, $\H_\ext$,
\begin{align}
  \H =  %\H_\in + \H_\ext = 
\KE + \UE + \VE.
\end{align}
Here were have extended the shorthand  employed in  \eqnref{dshorthand} so that 
\begin{align}
\cx_n \equiv r_n,r_{n+1},\ldots,r_N,  \quad\text{and}\quad
\cp_n \equiv  p_n,p_{n+1},\ldots, p_N.
\end{align}
%The coupled terms are therefore the internal energy,  $\UE$.
Separating the Hamiltonian in this way lets us split the grand partition function
\begin{align}
  \Xi = \Tr e^{-\beta (\KE -\mu N)}e^{-\beta(\UE +\VE)}=  \frac{1}{N!}Z_\KE Z_{\UE+\VE}, \label{eqn:XiSeparate}
\end{align}
with the second equality following because $\KE$ is a function of only the particle momentums,
and $\UE$ and $\VE$ are functions of positions.
%At equilibrium the joint probability density of the distribution is the Boltzmann distribution 
%\begin{align}
%  p_0(\cx_1, \cp_1) = \Xi^{-1} \exp\lr{-\beta \lr{H_N - \mu N}}.
%\end{align}
%which can be derived with the Maximum entropy principle\cite{}, for example.
%Here were have extended our shorthand  of \eqnref{dshorthand} so that 
%\begin{align}
%\cx_n \equiv r_n,r_{n+1},\ldots,r_N,  \quad\text{and}\quad
%\cp_n \equiv  p_n,p_{n+1},\ldots, p_N.
%\end{align}
The two functions on the right of in \eqnref{XiSeparate}  can be considered separately:
\nlist{
\item The momentum integrals in \eqnref{XiSeparate} form the partition function of an ideal gas,
with 
\begin{align}
  Z_\KE = \int d \cp_1 e^{-\beta \lr{\sum_i^N \frac{p_i^2}{2m}-\mu N}} = \lr{\frac{m}{2\pi\hbar^2 \beta}}^{3N/2} \equiv n_Q^N.
\end{align}
The term $n_Q$ is sometimes known as the {\em quantum concentration} and is related to the {\em thermal de Broglie wavelength}, $\lambda_T$ by $n_Q = 1/\lambda_T$.
We demote the derivation of this standard result  to \appref{DFT}.
\item
  The remaining $\frac{1}{N! Z_{\UE+\VE}}$ is the  partition function to the joint probability distribution of the  molecular positions,
  \begin{align}
    p_0(\cx) = \frac{1}{N!Z_{\UE+\VE} } e^{-\beta\lr{\UE+\VE}}
  \end{align}
  To make progress  we must  approximate the coupled interaction term, $\UE$.
%so that \eqnref{XiSeparate} can be solved.
Here we assume that only the two particle interactions are important
and write
\begin{align}
  \UE(\cx) \approx \Phi(\cx) =  \sum_{j>i} \sum_i^N \phi(\vr_i, \vr_j),
\end{align}
where $\phi(\vr_i, \vr_j)$ is the two particle potential between a particle at $r_i$ and $r_j$.
%At equilibrium the true joint probability density is the Boltzmann distribution 
%\begin{align}
%  p_0(\cx_1, \cp_1) &= \Xi^{-1} \exp\lr{-\beta \lr{\H_N - \mu N}} 
%\end{align}
%
The approximate Hamiltonian is then $H \equiv \KE + \Phi + \VE$,
and is described by the approximate  probability density, $p$,
  \begin{align}
    p(\cx) = \frac{1}{N!Z_{\UE+\VE} } e^{-\beta\sum_{j>i} \sum_i^N \phi(\vr_i, \vr_j) -\beta\sum_i^N V_\ext(\vr_i)} \label{eqn:pspatial}
  \end{align}
  Marginalising equation \eqnref{pspatial} for the  1-particle distribution gives
  \begin{align}
    p^{(1)}(\vr_1) = \frac{N}{N! Z_{\UE+\VE}} \int e^{-\beta\sum_i^N V_\ext(\vr_i)} d\cx_2. \label{eqn:ponespatial}
  \end{align}
  The 2-particle density is 
  \begin{align}
    p^{(2)}(\vr_1, \vr_2) = \frac{N(N-1)}{N!Z_{\UE+\VE}}\int e^{-\beta\sum_{j>i} \sum_i^N \phi(\vr_i, \vr_j)-\beta\sum_i^N V_\ext(\vr_i)} d\cx_3.
  \end{align}
  The approximate number density, $\rho(\vr)$, is a such that
  \begin{align}
    \int \rho_0(\vr) d\vr = N .
  \end{align}
  It follows that
  \begin{align}
    \rho(\vr) = N!  p^{(1)}(\vr_1). \label{eqn:rhoone}
  \end{align}
  From \eqnref{rhoone} and \eqnref{ponespatial} we find that the {\em density is functional of the external potential.}
}

 The converse is also true:
 {\em the external potential is uniquely determined by the density},
 a result known as the Hohenberg-Kohn theorem.
 The probability density is then determined by the external potential,
 from which it follows that the probability density is a unique functional of the density.
 We outline a proof of the Hohenberg-Kohn theorem in \appref{Hohenberg_Kohn}.
 

It is thereby permissible to work with the mass density rather than the probability density when considering the thermodynamics of the bubble.
Since the density is the term of interest bubble nucleation, the density functional approach is much more direct.


%Re-expressing the grand potential as a functional of mass density rather than probability density
%does not get us any closer to being able to evaluate $\Omega$, however.
%So far the argument is standard from statistical physics.
%To make progress  we must  approximate the coupled interaction term, $\UE$.
%so that \eqnref{XiSeparate} can be solved.
%Here we assume that only the two particle interactions are important
%and write
%\begin{align}
%%  \UE(\cx) \approx \Phi(\cx) =  \sum_{j>i} \sum_i^N \phi(\vr_i, \vr_j),
%\end{align}
%where $\phi(\vr_i, \vr_j)$ is the two particle potential between a particle at $r_i$ and $r_j$.
%At equilibrium the true joint probability density is the Boltzmann distribution 
%\begin{align}
%  p_0(\cx_1, \cp_1) &= \Xi^{-1} \exp\lr{-\beta \lr{\H_N - \mu N}} 
%\end{align}
%
%The approximate Hamiltonian is then $H \equiv \KE + \Phi + \VE$,
%and is described by the approximate  probability density, $p$.
The approximate density function is $\rho$, which defines an approximate grand potential $\Omega_V\lrs{\rho}$.
The task is then to find the distribution $\rho$ that comes closest to approximating $\rho_0$.

The {\em relative entropy } or {\em Kullback-Leibler divergance} gives the amount of information lost 
when using the approximate distribution  $p$ rather than the correct distribution $p_0$,
and is defined
\begin{align}
  \KLD{p}{p_0} = \Tr p \log \frac{p}{p_0} \label{eqn:nuc:KLD}
\end{align}
$\KLD{p}{p_0} \ge 0$, which follows from Gibbs inequality, and only if $p=p_0$ does $\KLD{p}{p_0} = 0$.
We may therefore define
\begin{align}
 \Omega_V\lrs{\rho} \equiv \beta^{-1}\KLD{p}{p_0}+   \Omega\lrs{\rho_0}, 
\end{align}
The approximate  grand potential approaches the true value when 
it vanishes  with respect to $\rho$.
 $\Omega_V\lrs{\rho} $ will then be at thermodynamic equilibrium
which occurs at the critical radius.
Therefore,
condition \eqnref{}
may be expressed\cite{Oxtoby1992}
\begin{align}
  \frac{\delta \Omega_V}{\delta \rho} =0,\quad\text{at $\rho = \rhostar$.} \label{eqn:DFT:astar}
\end{align}

More generally, from \eqnref{} we have
\begin{align}
  \Omega_V\lrs{p} &= \beta^{-1} \Tr p \log \frac{p}{p_0}  - \beta^{-1}\ln \Xi \\
  &= \beta^{-1} \Tr p \log \frac{p}{e^{-\beta\lr{\H - \mu N}}}\\
%  &= T S +    \Tr p \lr{H_N - \mu N} \\
  &= - T S_p + \H_p - \mu N_p \\
  &= F_p - \mu N_p\\
 &= \F + \int V_\ext d\rho - \int \mu d \rho.
\end{align}
where the subscript $p$ indicates an average with respect to the distribution $p$ so that  $S_p$ is the entropy with respect to $p$,
\begin{align}
 \F\lrs {\rho_0} = \Tr p \log\frac{p}{e^{-\beta H_\in}} = \KE + \Phi - TS_p
\end{align}
and the labels `$\in$' and `$ext$' indicates the intrinsic and external parts of the  Hamiltonian.

The energy $\Phi$ at when  $\rho = \rhostar$ (thermodynamic equilibrium)
may be evaluated from by minimising $\Phi\lrs{\phi}$,
\begin{align}
\frac{\delta \Phi}{\delta \phi(r_i, r_j) } &= - \beta^{-1} \frac{\delta \ln Z_{\Phi+\VE}}{\delta \phi(r_i, r_j)} \\
&= \frac{N(N-1)}{2 Z_{\Phi+\VE}}\int d\cx_1 \phi(\vr_1,\vr_2) e^{-\beta\sum_{j>i} \sum_i^N \phi(\vr_i, \vr_j)-\beta\sum_i^N V_\ext(\vr_i)} \\
&= \half \iint d\vr_1 \dr_2 \phi p^{(2)}(\vr_1,\vr_2)
\end{align}
%\begin{align}
%  &= \frac{1}{N!}\frac{e^{-\beta (\KE -\mu N)}}{Z_\KE}\frac{e^{-\beta(\UE +\VE)}}{Z_{\UE+\VE}}  \label{eqn:Jointpzero} %\equiv p_\KE p_{\lr{\UE + VE}}
%\end{align}


%and introduce radial distribution function
%\begin{align}
%  g(r_{12}) = \frac{V^2}{N^2} P_2(r_1, r_2)
%\end{align}


%Since $\UE$ is a measured quantity, it is averaged equilibrium joint probability distribution, $p_0$,
%where $p_0$ is a Boltzmann distribution,
%\begin{align}
% p_0(\cx_1, \cp_1) = \Xi^{-1} \exp\lr{-\beta \lr{H_N - \mu N}}. \label{nuc:pzero}
%\end{align}
%Decoupling the interactions in $\H$ implies that the approximate Hamiltonian, $H$, is described by some likewise decoupled approximate probability distribution, $p$.
%The task is then to vary $p$ so that it matches $p_0$ as closely as possible. %, given its new structure, so that $H$ approaches $\H$.
%From \eqnref{nuc:pzero} $p_0$ is explicitly a function of $\V$.


% This is the 
%  When considering the thermodynamics of the bubble

% The density functional approach is entirely analogous to this next step  but works directly with an approximate mass density, $\rho$, rather than the approximate probability density, $p$.
% The mass density is then varied directly to find the functional form that best matches the equilibrium density, $\rho_0$, and whence the equilibrium Hamiltonian, $\H$.
% Since the density is the term of interest bubble nucleation, the density functional approach is much more direct.
% \Dft\ works at all because
% \nlist{
%   \item 
%     The mass density is a functional of the probability density, $\rho = \rho[p]$.
%     This follows almost trivially from the fact that the equilibrium density is a measured quantity,
%     and therefore an average over $p_0$.
%     Denoting the average
%     \begin{align}
%       \rho\lrs{p} =  \scalar{\rho(\cx_1, \cp_1) }_{p} \equiv \Tr p(\cx_1, \cp_1) \rho(\cx_1, \cp_1), 
%     \end{align}
%     we find that $\rho\lrs{p_0} = \scalar{\rho(\cx_1, \cp_1) }_{p_0}  = \rho_0$.

%     Since the density is a functional of $p_0$, which is in turn a function of $\VE$,
%     it follows that the {\em density is functional of the external potential.}
%   \item
%     The converse is also true:
%     the external potential is uniquely determined by the density,
%     a result known as the Hohenberg-Kohn theorem.
%     The probability density is then determined by the external potential,
%     from which it follows that the probability density is a unique functional of the density.
%     We outline a proof of the Hohenberg-Kohn theorem in \appref{Hohenberg_Kohn}.
%     %It is deferred to the appendix because the proof is not constructive.
%   }
% While many approximations can be made to the internal energy,
% we here consider only  

% A number of approximat

% The approximate Hamiltonian we there

%which we denote
%\begin{align}
%  \UE = \scalar{U(\cx)}_{\rho_0} \equiv Tr  p_0(\cx_1, \cp_1) \rho_0(\cx_1, \cp_1)
%\end{align}
%over the full joint distribution of 

\subsection{Bubble Nucleation}


But density $\rho(r)$ should not be constrained other than to require that it approach the bulk vapor density at large distance.
Then 
\begin{align}
  \frac{\delta \Omega_V}{\delta \rho(r)} = 0
\end{align}
at $\rho(r) = \rho^\ast(r)$.
The mulitdimensainal free energy has a minium at the uniform vapor pressure,
and a second lower minimum at the  uniform liquid density. Between these saddle point sound by setting the funciional deriate to zero.
The matirx of second deriatives containes a negative eigenvalue correspionding to deirection of motion over the barrier.

For equilibrium gas-liquid interface similar except zero eigenvalue not negative. 

sadle point in fucntional space refs in \cite{shen2003}

Density in bubble sufiently far from coexistence differs appreciably from that of stable vapor,
at least order of magnitude ref 28 in \cite{Shen2003}
Agrees well with energy brarrier in vicinity of phase coexistance but does vanish at spinodal.

Nucleation theorem ref 74 in \cite{shen2003}

Have
\begin{align}
\Omega_V = F - \mu N = F - \mu \int dr \rho(r).
\end{align}
Then \begin{align}
\frac{\delta F}{\delta \rho(r)} = \mu
\end{align}
at $\rho(r) = \rho^\ast(r)$.


To make further progress we need to write the grand potential $\Omega$ as a function of the density.

\subsection{Background}
To do so we consider Hamiltonians that are separated in terms of their intrinsic and external contributions
\begin{align}
  \H =  \H_\in + \H_\ext = \lr{\KE + \UE} + \VE
\end{align}
where 
\sub{
\begin{align}
%  \begin{array}{ll}
  \KE &= \sum_i^N \frac{p_i^2}{2m} && \text{is the kinetic energy,}\label{eqn:nuc:Kinetic}\\
  \UE &= U(\cx) && \text{is the internal energy, and}\\
  \VE &= \sum_i^N V_\ext(r_i) && \text{is the external potential.}
%  \end{array}
\end{align}
}
The internal energy depends upon the  locations of the particles, which are denoted  with
\sub{
\begin{align}
\cx_n &\equiv r_n,r_{n+1},\ldots,r_N,  \quad\text{the set of $N-n+1$ particle spatial positions.}
\intertext{Similarly, the momentums of the particles are denoted}
\cp_n &\equiv  p_n,p_{n+1},\ldots, p_N.
\end{align}
}
This notation is usefully extended by defining 
\sub{
\begin{align}
d\cx_n &\equiv dr_n dr_{n+1}\ldots dr_N, &&\quad\text{and}&
d\cp_n &\equiv  dp_n dp_{n+1}\ldots dp_N.
\end{align}
}

Both the energy, $\H$ and the equilibrium  density $\rho_0$ are measured quantities
and as such, both  averaged over the probability of the phase space locations of the $N$ particles $p_0 = p_0(\cx,\cp)$.
The average with respect to $p_0$ is defined
\begin{align}
  \rho_0 =  \scalar{\rho_0(\cx_1, \cp_1) }_{p_0} \equiv \Tr p_0(\cx_1, \cp_1) \rho_0(\cx_1, \cp_1) 
\end{align}
where $\Tr$ denotes the trace operator
\begin{align}
  \Tr \equiv \sum_{N=0}^\infty \frac{1}{h^{3N}N!} \iint d\cx_1 d\cp_1.
\end{align}
The average energy is defined similarly.
The the equilibrium 
joint 
 probability density  is  the Boltzmann distribution 
\begin{align}
  p_0(\cx_1, \cp_1) = \Xi^{-1} \exp\lr{-\beta \lr{H_N - \mu N}}.
\end{align}
where 
\begin{align}
  \Xi = \Tr \exp\lr{-\beta \lr{H_N - \mu N}}. \label{eqn:nuc:GPF}
\end{align}
is called the {\em grand partition function}.
The grand potential then follows according to 
\begin{align}
  \Omega = -\beta^{-1}\ln \Xi.
\end{align}

We may eliminate the momentum terms from the grand partition function, equation \eqnref{nuc:GPF} immediately
\begin{align}
  \Xi = \frac{n_Q^N}{N!} Z_U Z_V Z_N
\end{align}
where $n_Q = \lr{\frac{m}{2\pi\hbar^2 \beta}}^{3/2}$ is the {\em quantum concentration}.
and 
\begin{align}
  Z_U Z_V Z_N = 
\end{align}


Since the density is a functional of $p_0$, which is in turn a function of $V_\ext$,
it follows that the density is functional of the external potential.
The converse is also true:
the external potential is uniquely determined by the density,
a result known as the Hohenberg-Kohn theorem.
The probability density is then determined by the external potential,
from which it follows that the probability density is a unique functional of the density.
We outline a proof of the Hohenberg-Kohn theorem in \appref{Hohenberg_Kohn}.
It is deferred to the appendix because the proof is not constructive.

The approximate  grand potential  can therefore be expressed as a unique functional of the density,
\begin{align}
  \Omega_V\lrs{\rho_0} &=  F - \mu N \\
  &= \lr{\KE + \UE - TS} + \int d\rho\lr{ V_\ext -\mu }\\
  &= \F + \int d\rho\lr{ V_\ext -\mu }\\
\end{align}
where $\F$ is the intrinsic Helmholtz free energy.


The grand potential, through $U$, is a function of $p_0(\cx)$, the  probability describing the locations of all $N$ particles.
This associated multi-particle interactions are complicated and difficult to model.
We decouple these interactions by introducing the approximate probability distribution  $ p = p(\vr_i, \vr_j)$ 
- dependent now only on two particle interactions - 
to evaluate our thermodynamic variables.
This in turn reduces $U$ to two particle interactions, 
\begin{align}
  \Phi = \sum_{i\ne j} \sum_i^N \phi(\vr_i, \vr_j).
\end{align}
Furthermore, we assume that the external potential $\VE$ influences each particle equally.
Therefore, our approximate Hamiltonin is
\begin{align}
\H = \KE + \Phi + NV_\ext.
\end{align}


The {\em relative entropy } or {\em Kullback-Leibler divergance} gives the amount of information lost 
when using the approximate distribution  $p$ rather than the correct distribution $p_0$,
and is defined
\begin{align}
  \KLD{p}{p_0} = \Tr p \log \frac{p}{p_0} \label{eqn:nuc:KLD}
\end{align}
$\KLD{p}{p_0} \ge 0$, which follows from Gibbs inequality with equality if and only if $p=p_0$.

It is convenient for $p$ to be evaluated via a variational principle and so we defined $p_0$ according to 
\begin{align}
 \Omega_V\lrs{\rho} = \beta^{-1}\KLD{p}{p_0}+   \Omega, 
\end{align}
so that the approximate grand potential approaches the true value on application of a variational principle.
It then follows that 
\begin{align}
  \Omega_V\lrs{\rho} &= \beta^{-1} \Tr p \log \frac{p}{p_0}  - \beta^{-1}\ln \Xi \\
  &= \beta^{-1} \Tr p \log \frac{p}{e^{-\beta\lr{\H - \mu N}}}\\
%  &= T S +    \Tr p \lr{H_N - \mu N} \\
  &= - T S_p + \H_p - \mu N_p \\
  &= F_p - \mu N_p\\
  &= \F + \int V_\ext d\rho - \int \mu d \rho.
\end{align}
where 
\begin{align}
 \F\lrs \rho_0 = \Tr p \log\frac{p}{e^{-\beta\H_\in}} = \KE + \UE - TS_p
\end{align}
and
where the subscript indicates that the thermodynamic quantities are evaluated with the approximate $p$ rather than $p_0$.


Have 
\begin{align}
F = \int dr \rho_0 V_\ext + \F\lrs{\rho_0}
\end{align}
and 
\begin{align}
  V_\ext + \mu_\in\lrs{\rho_0} = \mu
\end{align}
where
\begin{align}
  \mu_\in \equiv \deltarho \F.
\end{align}


Integration of interaction potential.


We may eliminate the momentum terms from the grand partition function, equation \eqnref{nuc:GPF} immediately
\begin{align}
  \Xi = \frac{n_Q^N}{N!} Z_{U+V} 
\end{align}
where $n_Q = \lr{\frac{m}{2\pi\hbar^2 \beta}}^{3/2}$ is the {\em quantum concentration}.
We therefore concentrate on $Z_{U+V}$,
the contribution from the potentials.


The joint probability distribution can be marginalised to obtain the single particle density,
\begin{align}
\rho(r_1) = \int p_0(\cx_1) d\cx_2,
\end{align}
where 
\begin{align}
  d\cx_n &\equiv dr_ndr_{n+1}\ldots dr_N \quad \text{so that}
\end{align}
\begin{align}
  \int \rho(r) dr = N.
\end{align}
Likewise the 2-particle density
\begin{align}
 \rho^{(2)}(r_1, r_2) = \frac{N(N-1)}{Z_\phi}\int e^{-\beta \sum_{j>i} \phi(r_{ij})}dr_3 \ldots dr_N
\end{align},
and introduce radial distribution function
\begin{align}
  g(r_{12}) = \frac{V^2}{N^2} P_2(r_1, r_2)
\end{align}


\begin{align}
\frac{  \delta}{\delta \phi(r_i, r_j) } = - \beta^{-1} \frac{\delta \ln \Xi}{\delta \phi(r_i, r_j)} = \frac{N(N-1)}{Z_\phi} = \half \rho^{(2)}(\vr_1,\vr_2)
\end{align}
Integrated by choosing
\begin{align}
  \phi_\alpha \equiv \phi(r_1,r_2;\alpha) = \phi_r + \alpha \lr{\phi - \phi_r}
\end{align}
to 
\begin{align}
  \F\lrs{\rho_0} = \F_r \lrs \rho_0 + \half \int_0^1d\alpha \int dr_1dr_2 \rho^{(2)}\lr{\phi-\phi_r}
\end{align}

If $\phi_r = 0$ then $\F_r = \F_\ideal$ and
\begin{align}
\F\lrs{\rho_0} = \F_\ideal + \half \int_0^1d\alpha \int dr_1dr_2 \rho^{(2)}\phi_\alpha \phi.
\end{align}

Split potential into reference and perturbation
\begin{align}
  \phi(r) = \phir + \phip
\end{align}

Then 
\begin{align}
c^{(2)} - c_r^{(2)} = \given{\frac{\beta \delta^2}{2\delta \rho(r_1) \delta \rho(r_2)} \int d\alpha int dr_1 dr_2 \rho^{(2)} \phip}{\rho(r) = \rho}.
\end{align}

Simplest random phase approximation 
\begin{align}
  \phi^{(2)}  = \phi(r_1) \phi (r_2)
\end{align}
so that
\begin{align}
c^{(2)} - c_r^{(2)}(r) = -\beta \phip
\end{align}

Next expand to lowest order in $\phip$
so that need to evaluate
\begin{align}
  \int dr_1 dr_2 \rhor^{(2)} \phi_p
\end{align}

This can be done by expanding 
\begin{align}
  \rhor^{(2)} = \rhor^{(2)} + \half \lr{\rho(r_1)  + \rho(r_2) - 2\rho } \frac{\d \rhor^{(2)}}{\d\rho} + \ldots + \half \lr{\rho(r_1)  + \rho(r_2) - 2\rho } \frac{\d^2 \rhor^{(2)}}{\d^2\rho}
\end{align}
with $\rhor^{(2)}=\rho^2 g_r(r)$
Then
\begin{align}
  c^{(2)} - c_r^{(2)}(r) = -\frac{\beta}{2} \phip \frac{\d^2 \rhor^{(2)}}{\d\rho^2}
\end{align}
which is the mean density approxmition.


average energy 
\begin{align}
  U &= - \frac{\d}{\d\beta} \ln Z = \frac{3}{2}N\kB T -  \frac{\d}{\d\beta}\ln Z_\phi
\end{align}
where
\begin{align}
   \frac{\d}{\d\beta}\ln Z_\phi &= \frac{1}{Z_\phi} \int \sum_{j>i} \phi(r_{ij})e^{-\beta \sum_{j>i} \phi(r_{ij})} d\cx\\
   &= \frac{N(N-1)}{2 Z_\phi} \int \phi(r_{12}) e^{-\beta \sum_{j>i} \phi(r_{ij})} d\cx\\
   &= \frac{1}{2} \int \phi(r_{12}) P_2 d\cx\\
   &= \frac{N^2}{2V^2} \int \phi(r)g(r)4\pi r^2 dr.
\end{align}


%consider
\begin{align}
  \Omega\lrs p = \Tr_{cl} f\lr{H_N - \mu N + \beta^{-1} \ln f}.
\end{align}
%Have
%\begin{align}
%  \Omega\lrsquare{p_0} = -\beta^{-1}\ln \Xi \equiv \Omega,
%\end{align}
%the grand potential.
%Then 
%\begin{align}
%  \Omega[p] = \Omega[p_0] + \beta^{-1} \lr{Tr_\cl p\ln \lr{p/p_0}   }
%\end{align}


Correlation functions,
\begin{align}
  \F\lrs{\rho} = \F_\ideal\lrs{\rho} - \Phi\lrs{\rho}
\end{align}
so that
\begin{align}
  \beta \mu_\in \lrs{\rho;r} = ln \lr{\lambda^3 \rho} - c
\end{align}
where
\begin{align}
  c \lrs{\rho;r} \equiv \beta \deltarho Phi.
\end{align}
Equilibrium density is then
\begin{align}
  \rho = z \exp \lr{-\beta V_\ext + c}
\end{align}

\subsection{Approach}
Capillary approximation removed by varying density.

Eg Random phase approximation to give pair distribtuion function\cite{Ruckensteirn2005}.
\begin{align}
\rho^{(2)} (r, r^\prime, \phi_\alpha^{(2)}) \approx \phi(r) \phi(r^\prime)
\end{align}
Next need free energy of reference.
Use hard sphere\cite{Ruckensteirn2005}
\begin{align}
F_1\lrs{\rho(\vr)} = \int d\vr \fh \lr{\rho(\vr)}.
\end{align}
which is a local density approximation good for weakly inhomoenous system - no solid interface\cite{Ruckensteirn2005}.
Then 
\begin{align}
  \Omega\lrs{\rho\lr{\vr}} = F\lrs{\rho} - \mu \int dr \rho
\end{align}
minimise to get\cite{Ruckensteirn2005}
\begin{align}
  \mu = \mu_h + \int dr^\prime \phi^\two \rho(r^\prime)
\end{align}
In homogenous limit get \cite{Ruckensteirn2005}
\begin{align}
  f(\rho) = \fh(\rho) - \half \alpha \rho^2
\end{align}
where \cite{Ruckensteirn2005}
\begin{align}
  \alpha = -\int dr^\prime \phi^\two(r^\prime)
\end{align}

Pressure and chemical potential from free energy\cite{Ruckensteirn2005}
\begin{align}
  p = p_h - \half \alpha \rho^2
\end{align}
\begin{align}
  \mu = \mu_h - \alpha \rho
\end{align}

Hard sphere pressure from Carnaham Starling from\cite{Ruckensteirn2005}
\begin{align}
  p_h = \frac{kT\lr{1 + \theta + \theta^2 - \theta^3}}{\lr{1 - \theta}^2}
\end{align}
and chemical potential \cite{Ruckensteirn2005}
\begin{align}
  \frac{\d p_h}{\d \rho} = \rho \frac{\d \mu_h}{\d \rho}
\end{align}

For pertabation method need  WCA scheme\cite{Ruckensteirn2005}
to give repulsive.
then
\begin{align}
  \mu = \mu_h + \int dr^\prime \phi^\two2_\WCA \rho(r^\prime)
\end{align}

\subsection{Applications to liquid surfaces}


For plainar surface
\begin{align}
\gamma = \given{\frac{\d F}{\d A}}{T, V, N}
\end{align}
or 
\begin{align}
\gamma = \given{\frac{\d \Omega}{\d A}}{T, V, \mu}
\end{align}
where
\begin{align}
  \Omega = -pV + \gamma A
\end{align}
Also
\begin{align}
  \gamma = \int_{-L/2}^{L/2} dz \lr{\sigma_N(z) - \sigma_T(z)}
\end{align}


Also have

\begin{align}
  \rho^{(m)} = \Xi^{-1} \sum_{N\ge }^\infty \frac{z^N}{N-m} \int d r_{m+1} dr_N exp\lr{\beta(V + U)}
\end{align}
so $\rho^{(1)} \equiv \rho_0$
and 
\begin{align}
  \scalar{\hat \rho(r) \hat \rho(r^\prime)} =\scalar{\sum_{i\ne j} \delta(r-r_i) \delta(r^\prime - r_j)} + \scalar{\sum_{i} \delta(r-r_i) \delta(r^\prime - r_j)}
= \rho^{(2)} + \rho_0\delta{\rho- \rho^\prime}
\end{align}
Now for pairwise potential
\begin{align}
 \Xi = \sum_0 ^\infty \frac{\lambda^{-3N}}{N!} \int dr_1 \ldots d r_N \exp\lr{\beta \int \dr u \hat \rho - \beta/2 \int dr dr^\prime \hat I (r,r^\prime)\phi(r,r^\prime)}
\end{align}
where $\hat I = \sum_{i\ne j} \delta(r - r_i)\delta(r^\prime - r_j)$.
Then
\begin{align}
  \frac{\delta \Omega}{\delta \phi(r,r^\prime)} = -\beta^{-1} \frac{\delta \ln \Xi}{\delta \phi(r, r^\prime)} = \half \scalar{\hat I} = \half \rho^{(2)}(r,r^\prime).
\end{align}


In \cnt\ density at centre assumed equal to bulk liquid density, 
and shape of and free energy of surface taken to be a planar interface.
Critical nucleus then found by setting 
\begin{align}
\frac{d \Omega}{d R} =0
\end{align}
at $r = R^\ast$.

But density $\rho(r)$ should not be constrained other than to require that it approach the bulk vapor density at large distance.
Then 
\begin{align}
\frac{\delta \Omega_V}{\delta \rho(r)} = 0
\end{align}
at $\rho(r) = \rho^\ast(r)$.
The mulitdimensainal free energy has a minium at the uniform vapor pressure,
and a second lower minimum at the  uniform liquid density. Between these saddle point sound by setting the funciional deriate to zero.
The matirx of second deriatives containes a negative eigenvalue correspionding to deirection of motion over the barrier.

For equilibrium gas-liquid interface similar except zero eigenvalue not negative. 

Have
\begin{align}
\Omega_V = F - \mu N = F - \mu \int dr \rho(r).
\end{align}
Then \begin{align}
\frac{\delta F}{\delta \rho(r)} = \mu
\end{align}
at $\rho(r) = \rho^\ast(r)$.

Cahn and Hilliard in 1959 first,
used
\begin{align}
F[\rho(r)] = \int dr \lrsquare{f_u (\rho(r)) + K(\del \rho(r))^2}
\end{align}
where $f_u$ is Helhotz free energy per unit volume of a uniform system withe density $\\rho$ everywhere.

Functional derivateive finds
\begin{align}
\frac{\d f_u}{\d \rho} - 2K \del^2 \rho - \frac{\d K}{\d \rho}\lr{\del^2 \rho}^2 = \mu.
\end{align}
Near the spinodal critical nucleus is large but small in amplitude.
Near spinodal examind by Unger and Klein. 

Oxtoby and evans investigatesd when $J$ is near 1 cm cubed per second. 
Used hard sphere sluid and Yukawa attractive tail,
\begin{align}
\phi_{att}(r) = \frac{\alpha \lambda^3 \exp(-lambda r)}{4\pi \lambda r}.
\end{align}
Free energy functional then
\begin{align}
F\lrsquare{\rho(r)} = \int d r f_h(\rho(r)) + \frac{1}{2}\int \int dr dr^\prime \rho(r) \rho(r^\prime) \phi_{att}\lr{\abs{r-r^\prime}}
\end{align}
where $f_h$ is free energy of uniform hard sphere fluid, treated locally.

Attrractive pottentioanl not treated not in square gradient approach of Cahn and Hilliard.
Then functional derivative gives
\begin{align}
\mu_h\lrsquare{\rho(r)} = \mu - \int d r^\prime \rho(r^\prime) \phi_{att} \lr{\abs{r-r^\prime}}
\end{align}
Integral equation solved by iteration having some inital guess of radius.
Equilibrium is unstable however (saddle point).
If $R_0$ is too large tends to blow up, if too small tends to shrink into nothing,
if ``correct'' forms nearly converges then shinks or blows up at one side or other of saddle point

In Oxtoby and Evans 1988 cavitation studied, not just condensation as here.
Used Becker-Doring prexponetial to find rates. 

\subsection{General Notes}


To simplify these, we
consider  only pair-wise interactions,
\begin{align}
%  \UE(\cx) \approx \sum_{j>i} \sum_i^N 
\Phi(\cx) = \half \iint d \vr_i d\vr_j \phi(\vr_i, \vr_j)\rho(\vr_i, \vr_j) 
 =  \half \iint d \vr_i d\vr_j \phi(\vr_i, \vr_j)\rho(\vr_i)\rho(\vr_j) 
\label{eqn:nuc:two_interactions}
\end{align}
where $\phi(\vr_i, \vr_j)$ is the two particle potential between a particle at $\vr_i$ and $\vr_j$,
and $\rho(\vr_i, \vr_j) $ is the two particle density function.
In the second equality in \eqnref{}
it has been specified  that
$\rho(\vr_i, \vr_j)= \rho(\vr_i)\rho(\vr_j)$.
This is the simplest possible two particle density function.
It assumes that the densities are uncorrelated - an assumption known as the {\em random phase approximation}.
The approximation has been found to be accurate in the absence of rapid \todo{get this sentence right} \cite.
If the oil droplet is sufficiently large then this approximation should hold.
The random phase appromxation may well need to be refined for very small droplets, however, 
where the oil-water interface cannot be ignored.


The integral in \eqnref{nuc:two_interactions} is still too difficult to solve directly.
However, the interactions of most fluids are dominated  by volume exclusion effects (van der Waal type interactions).
Longer range interactions are, in general, only of secondary importance.
The interaction term $\phi$ can therefore be split 
by considering {\em hard sphere} volume exclusion, $\phi_\hs$ and  an attractive perturbation $\phi_\attr$.

To do so we first eliminate everything but the interaction by differentiating with respect to $\phi(\vr_i, \vr_j)$,
\begin{align}
  \frac{\delta \Omega_V}{\delta \rho(\vr_i, \vr_j)} = \frac{\delta F}{\delta \rho(\vr_i, \vr_j)} = \frac{1}{2}\rho(\vr_i, \vr_j). \label{eqn:deltaOmega_pert}
\end{align}
Equation \eqnref{deltaOmega_pert} is then integrated from the  hard-sphere reference, $F_\hs$
to full potential along the path parameterised by $\alpha$, 
\begin{align}
  F\lrs{\rho} = F_\hs\lrs{\rho} +\half \int d\alpha  \iint d \vr_i d\vr_j \phi_a( \vr_i, \vr_j)\rho(\phi_\alpha;\vr_i, \vr_j), \label{eqn:nuc:two_interactions}
\end{align}
where 
\begin{align}
  \phi_\alpha(\vr_i, \vr_j) = \phi_\hs(\vr_i, \vr_j) + \alpha \phi_a (\vr_i, \vr_j)\quad\text{and}\quad 0\le\alpha\le 1.
\end{align}
The potential is  gradually `turned on' along the integration path 
and the full potential, $\phi$, is recovered on the boundary located at $\alpha=1$.


By assuming that the contribution to the hard sphere energy is entirely local
the total free energy of the reference can be written
\begin{align}
  F_\hs\lrs{\rho} \approx \int d\vr f_\hs(\rho(\vr))
\end{align}
where $f_\hs(\rho(\vr))$ is the potential of a uniform  hard-sphere fluid\cite{OxtobyBook}.
This is the local-density approximation.


For binary mixture 
\begin{align}
  F = \int d\vr \fh \lrs{\rho(\vr_1), \rho(\vr_2)} + \half \int
\end{align}


%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "../../tshorrock_thesis"
%%% End: