Merge pull request #5 from shirtsgroup/MRS_revising

First pass at revising alchemistry.org
shirtsgroup · Sep 3, 2024 · 8091e46 · 8091e46
2 parents b3bb25d + 177ecab
commit 8091e46
Show file tree

Hide file tree

Showing 6 changed files with 102 additions and 40 deletions.
diff --git a/alchemistry/fundamentals/BennetAcceptanceRatio.ipynb b/alchemistry/fundamentals/BennetAcceptanceRatio.ipynb
@@ -29,11 +29,11 @@
     "\n",
     "$\\Delta A_{ij} = -k_{B} T \\ln \\frac{Q_j}{Q_i} = k_{B}T \\ln \\frac{\\left\\langle\\alpha(\\vec{q}) \\exp[-\\beta U_{i}(\\vec{q})]\\right\\rangle_j}{\\left\\langle\\alpha(\\vec{q}) \\exp[-\\beta U_{j}(\\vec{q})]\\right\\rangle_i}$\n",
     "\n",
-    "which is true for any $\\alpha(\\vec{q})>0$ for all $\\vec{q}$. This was Bennett's start point and he then used variational calculus to select the value pf $\\alpha(\\vec{q})$ minimizing the variance of the free energy. The end result is an implicit function of the free energy, and the number of samples at each state, $n_i$ and $n_j$, and is\n",
+    "which is true for any function $\\alpha(\\vec{q})$ as long as $\\alpha(\\vec{q})>0$ for all $\\vec{q}$. This was Bennett's start point and he then used variational calculus to select the value pf $\\alpha(\\vec{q})$ minimizing the variance of the free energy. The end result is an implicit function of the free energy $\\Delta A$ and the number of samples at each state, $n_i$ and $n_j$, and can be written as is\n",
     "\n",
-    "$\\sum_{i=1}^{n_i} \\frac{1}{1 + \\exp(\\ln(n_i/n_j) + \\beta \\Delta U_{ij} - \\beta \\Delta A))} - \\sum_{j=1}^{n_j} \\frac{1}{1 + \\exp(\\ln(n_j/n_i) - \\beta \\Delta U_{ji} + \\beta \\Delta A))} = 0$\n",
+    "$\\sum_{i=1}^{n_i} \\frac{1}{1 + \\exp\\left(\\ln\\left(\\frac{n_i}{n_j}\\right) + \\beta \\Delta U_{ij} - \\beta \\Delta A)\\right)} - \\sum_{j=1}^{n_j} \\frac{1}{1 + \\exp\\left(\\ln\\left(\\frac{n_j}{n_i}\\right) - \\beta \\Delta U_{ji} + \\beta \\Delta A)\\right)} = 0$\n",
     "\n",
-    "which must be solved numerically. This is the full BAR equation and its full derivation can be found in Bennett's paper.{cite}`Bennett1976` To get the equation above from Bennett's paper, you will have to take the exponential of both sides of his self-consistent Equation 12, then a bit of algebra to get the end result. This equation can also be derived from a maximum likelihood approach.{cite}`Shirts2003a`  There are a number of other equivalent ways to express this implicit equation."
+    "which must be solved numerically. This is the full BAR equation and its full derivation can be found in Bennett's paper{cite}`Bennett1976`. To get the equation above from Bennett's paper, you will have to take the exponential of both sides of his self-consistent Equation 12, then a bit of algebra to get the end result. This equation can also be derived from a maximum likelihood approach.{cite}`Shirts2003a`  There are a number of other equivalent ways to express this implicit equation."
    ]
   },
   {
@@ -57,7 +57,7 @@
     "## Estimating Accurate Free Energies with BAR\n",
     "BAR has a very straightforward method to calculate the free energy difference between two states. It still suffers from the same [general problems](IS_rules) as the other methods do with regards to variance along the pathway, what parameters are changing, sequence of change etc. and must still be treated with care. BAR can estimate free energies between many states, however, it must do this a pair at a time. As such, BAR requires information collected from $k-1$, $k$, and $k+1$ state for each configuration stored.\n",
     "\n",
-    "BAR require an iterative solution but all the samples will be correlated since results are taken from adjacent states. Unlike in [TI](TI), however, the results are not so clean to get an error estimate. More indirect methods, such as [Bootstrap Sampling](bootstrap) are required to get variances. Furthermore, subsampling is needed since each state can have different correlation times."
+    "BAR require an iterative solution but the estimates of free energies between states $1 \\rightarrow 2$, $2 \\rightarrow 3$, $2 \\rightarrow 3$ and so forth will be correlated, because the free energy difference calculated between state $i-1$ to $i$ and the free energy difference between $i$ and $i+1$ both use the same samples from state $i$. Therefore, unlike [TI](TI), it is not so clean to get an error estimate. More indirect methods, such as [Bootstrap Sampling](bootstrap) are required to get variances. Furthermore, subsampling is needed since each state can have different correlation times."
    ]
   },
   {
@@ -86,7 +86,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.3"
+   "version": "3.11.5"
   }
  },
  "nbformat": 4,

diff --git a/alchemistry/fundamentals/ExponentialAveraging.ipynb b/alchemistry/fundamentals/ExponentialAveraging.ipynb
@@ -14,7 +14,7 @@
    "id": "303403d9-8584-4d41-bfe7-4d8a4daf4522",
    "metadata": {},
    "source": [
-    "Exponential Averaging (EXP) is one of the earliest free energy methods available to researchers. Mostly used to evaluate between only two states of interest, it has an exact solution where as many other methods require approximations. Other names for this technique are the \"Zwanzig Relationship\" (named after the person who first derived it){cite}`Zwanzig1954` and \"free energy perturbation (FEP).\" To avoid confusion, the latter term should be avoided as it can easily be confused with other definitions in the scientific field.\n",
+    "Exponential Averaging (EXP) is one of the earliest free energy methods available to researchers. Mostly used to evaluate between only two states of interest, it has an exact solution where as many other methods require approximations. Other names for this technique are the \"Zwanzig relationship\" (named after the person who first derived it){cite}`Zwanzig1954` and \"free energy perturbation (FEP).\" To avoid confusion, the latter term should be avoided as it can easily be confused with other definitions in the scientific field.\n",
     "\n",
     "## Derivation\n",
     "Starting from the [statistical relation for free energy](core_eq) of\n",
@@ -23,15 +23,39 @@
     "\n",
     "the free energy difference between two states with potentials $U_0(\\vec{q})$ and $U_1(\\vec{q})$ is simply\n",
     "\n",
-    "$ \\displaystyle A_1 - A_0 = -\\beta^{-1} \\left[\\mathrm{ln}(Q_1) - \\mathrm{ln}(Q_0)\\right] = -\\beta^{-1} \\mathrm{ln}\\left[\\frac{Q_1}{Q_0}\\right]  $.\n",
+    "$ \\displaystyle \\Delta A_{10} = A_1 - A_0 = -\\beta^{-1} \\left[\\mathrm{ln}(Q_1) - \\mathrm{ln}(Q_0)\\right] = -\\beta^{-1} \\mathrm{ln}\\left[\\frac{Q_1}{Q_0}\\right]  $.\n",
     "\n",
-    "By then adding and subtracting $e^{-\\beta U_0(\\vec{q})}$ from the integral in the partition function in the numerator, we get\n",
+    "By then adding and subtracting $e^{-\\beta U_0(\\vec{q})}$ from the integral in the partition function in the numerator, we get:\n",
     "\n",
-    ":$ \\Delta A_{10} = -\\beta^{-1} \\mathrm{ln} \\left[\\frac{\\int e^{-\\beta \\left(U_1(\\vec{q})-U_0(\\vec{q})+U_0(\\vec{q})\\right)} d\\vec{q}}{Q_0}\\right] = -\\beta^{-1} \\mathrm{ln} \\left[\\frac{\\int e^{-\\beta \\left(U_1(\\vec{q})-U_0(\\vec{q})\\right)} e^{-\\beta U_0(\\vec{q})} d\\vec{q}}{Q_0}\\right]$\n",
+    "$ \\Delta A_{10} = -\\beta^{-1} \\mathrm{ln} \\left[\\frac{\\int e^{-\\beta \\left(U_1(\\vec{q})-U_0(\\vec{q})+U_0(\\vec{q})\\right)} d\\vec{q}}{Q_0}\\right] = -\\beta^{-1} \\mathrm{ln} \\left[\\frac{\\int e^{-\\beta \\left(U_1(\\vec{q})-U_0(\\vec{q})\\right)} e^{-\\beta U_0(\\vec{q})} d\\vec{q}}{Q_0}\\right]$\n",
     "\n",
     "which gives the final relationship of:\n",
     "\n",
-    "$ \\displaystyle \\Delta A_{10} = -\\beta^{-1} \\mathrm{ln} \\left\\langle e^{-\\beta \\left(U_1(\\vec{q})-U_0(\\vec{q})\\right)} \\right\\rangle_0 = -\\beta^{-1} \\mathrm{ln} \\left\\langle e^{-\\beta \\Delta U(\\vec{q})} \\right\\rangle_0 $"
+    "$ \\displaystyle \\Delta A_{10} = -\\beta^{-1} \\mathrm{ln} \\left\\langle e^{-\\beta \\left(U_1(\\vec{q})-U_0(\\vec{q})\\right)} \\right\\rangle_0 = -\\beta^{-1} \\mathrm{ln} \\left\\langle e^{-\\beta \\Delta U(\\vec{q})} \\right\\rangle_0 $\n",
+    "\n",
+    "\n",
+    "This can also be derived from the technique of importance sampling. \n",
+    "\n",
+    "First, the concept of Monte Carlo integration states that if we want to estimate some multidimensional expectation value $\\langle O \\rangle_i = \\int O(\\vec{q})p(\\vec{q}) d\\vec{q}$, where $p(\\vec{q})$ is a normalized distribution, it can be estimated from $N$ samples from the probability distribution $p(\\vec{q})$ as  $\\sum_{i=1}^{N} O(q_i)$.  Note that this works if we do not know the normalizing constant; we only need to draw samples proportional to $p\\vec{q}$. \n",
+    "\n",
+    "Importance sampling states that if we have two distributions $p_1(\\vec{q})$ and $p_2(\\vec{q})$, then we can write the integral as:\n",
+    "\n",
+    "$ \\displaystyle \\int O(\\vec{q}) p_1(\\vec{q}) d\\vec{q} = \\int O(\\vec{q}) p_2(\\vec{q}) \\left(\\frac{p_1(\\vec{q})}{p_2(\\vec{q})}\\right) d\\vec{q} $  \n",
+    "Which can be approximated by the sum $\\sum_{i=1}^{N} O(\\vec{q}_i) \\left(\\frac{p_1(\\vec{q})}{p_2(\\vec{q})}\\right)$, which is a estimate of the expectation value $\\langle O \\rangle_i$, but estimated with samples from $p_2$.  \n",
+    "\n",
+    "If we now have two normalized Boltzmann distributions $\\rho_1(\\vec{q}) = \\frac{e^{-\\beta U_1(\\vec{q})}}{Q_1} = e^{-\\beta A_1 - \\beta U_1}$ and $\\rho_2(\\vec{q}) = \\frac{e^{\\beta U_2(\n",
+    "\\vec{q})}}{Q_2} = e^{\\beta A_1 - \\beta U_1}$, then we can write:\n",
+    "\n",
+    "$ \\displaystyle \\langle O(\\vec{q}) \\rangle_1 = \n",
+    "\\left\\langle O(\\vec{q}) \\frac{e^{\\beta A_1 -\\beta U_1(\\vec{q})}}{e^{\\beta A_2 -\\beta U_2(\\vec{q})}} \\right\\rangle_2 = \\left\\langle O(\\vec{q}) e^{\\beta A_1-A_2 -\\beta U_1(\\vec{q})-U_2(\\vec{q})}\\right\\rangle_2$\n",
+    "\n",
+    "This is a general formula for estimating the expectation value of $\\langle O(\\vec{q} \\rangle_1$ with samples from $p_2$, but does not tell us how what the value of $\\Delta A = A_1-A_2$ is.  However $\\Delta A$ is independent of the position.  Let us use the special case of $O(\\vec{q})= 1$ (i.e. the constant 1).  Clearly, $\\langle 1 \\rangle_1 =1$ -- the average must be 1 because every single value is 1!  We then get: \n",
+    "\n",
+    "$ \\displaystyle 1 =  \\left \\langle e^{\\beta \\Delta A -\\beta (U_1(\\vec{q})-U_2)(\\vec{q})}\\right\\rangle_2$\n",
+    "\n",
+    "$ \\displaystyle e^{-\\beta \\Delta A} =  \\left \\langle  e^{-\\beta (U_1(\\vec{q})-U_2(\\vec{q})}\\right\\rangle_2 \\approx \\sum_{i=1}^N e^{-\\beta (U_1(\\vec{q})-U_2(\\vec{q}))}$, where the $\\vec{q}$ are sampled from $p_2$. \n",
+    "\n",
+    "This is of course symmetric, so $A_2-A_1 = -\\Delta A$ can be computed from samples from $p_1$, but recomputing the energies from state 2. "
    ]
   },
   {
@@ -87,7 +111,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.3"
+   "version": "3.11.5"
   }
  },
  "nbformat": 4,

diff --git a/alchemistry/fundamentals/FreeEnergyFundamentals.ipynb b/alchemistry/fundamentals/FreeEnergyFundamentals.ipynb
@@ -46,9 +46,9 @@
     "    * Harmonic bond angle terms\n",
     "    * Periodic dihedral terms\n",
     "    * Non-bonded terms made up of point charges and Lennard-Jones repulsion/dispersion terms\n",
-    "* Constant temperature is maintained: (discussed [below](facts_split))\n",
-    "**Masses do not alchemically change**. If one wishes to do this, substitute all potential energies, $U$, with the more general Hamiltonian, $\\mathcal{H}$.\n",
-    "* **QM/MM will not be considered** for fundamental; there are number of other more complicated factors to consider that are beyond the scope of this discussion. {cite}`woods2008`"
+    "* **The temperatur is constant** during the transforamation: (discussed [below](facts_split))\n",
+    "* **Masses do not alchemically change**. If one wishes to change the masses as well, substitute all potential energies, $U$, with the more general Hamiltonian, $\\mathcal{H}$.  Note that in may cases, the free energy differences will be the same regardless of whether or not the masses change.  This is because the kinetic energy distribution will be independent of what the masses are.  \n",
+    "* **QM/MM will not be considered** in this description of fundamentals; there are number of other more complicated factors to consider for QM/MM simulations that are beyond the scope of this discussion. {cite}`woods2008`"
    ]
   },
   {
@@ -65,11 +65,11 @@
    "id": "442e3abe-9bda-40ce-ae21-4af5c0c28593",
    "metadata": {},
    "source": [
-    "Most free energy calculations and methods started from a single core equation derived from statistical mechanics. The free energy difference between two states is directly related to the ratio of probabilities of those states through the partition functions. This free energy difference for an NVT ensemble is\n",
+    "Most free energy calculations and methods started from a single core equation that defined free energy differences.The free energy difference between two states is directly related to the ratio of probabilities of those states through the partition functions. This free energy difference for an NVT ensemble is\n",
     "\n",
     "$\\displaystyle \\Delta A_{ij} = -k_B T \\ln \\frac{Q_j}{Q_i} = -k_B T \\ln \\frac{\\int_{\\Gamma_j} e^{-\\frac{U_j(\\vec{q})}{k_B T}} d\\vec{q}}{\\int_{\\Gamma_i} e^{-\\frac{U_i(\\vec{q})}{k_B T}}d\\vec{q}}$\n",
     "\n",
-    "where $\\Delta A_{ij}$ is the Helmholtz free energy difference between state $j$ and state $i$, $k_B$ the Boltzmann constant, $Q$ the canonical partition function, $T$ is the temperature, $U_i$ and $U_j$ are the potential energies as a function of the coordinates and momenta $\\vec{q}$ for two states, and $\\Gamma_i$ and $\\Gamma_j$ are the *phase space volumes* of $\\vec{q}$ over which we sample, or the total set of all allowed positions and momenta of the system.\n",
+    "where $\\Delta A_{ij}$ is the Helmholtz free energy difference between state $j$ and state $i$, $k_B$ the Boltzmann constant, $Q$ the canonical partition function, $T$ is the temperature, $U_i$ and $U_j$ are the potential energies as a function of the coordinates and momenta $\\vec{q}$ for two states, and $\\Gamma_i$ and $\\Gamma_j$ are the *phase space volumes* of $\\vec{q}$ over which we sample, i.e. the total set of all allowed positions and momenta of the system.\n",
     "\n",
     "From this equation, all free energy calculations are derived."
    ]
@@ -90,11 +90,11 @@
    "source": [
     "There are a number of key notes we can learn from the definition of free energy differences. Each of these can be important  in interpreting simulation results.\n",
     "\n",
-    "* **Only free energy *differences* are ever calculated.** There is never a calculation where absolute free energies are needed (and rarely can they be calculated at all) as all of the biological or thermodynamical quantities of interest are based on a free energy difference. As such, there must always be a minimum of two defined thermodynamic states.\n",
-    "    * Even *absolute* free energies of binding are still free energy differences between two states: the ligand restricted to the binding site, and the ligand free to explore all other configurations.\n",
+    "* **Only free energy *differences* are ever calculated.** There is never a calculation where absolute free energies are needed (and rarely can they be calculated at all) as all of the biological or thermodynamical quantities of interest are related to free energy differences. As such, there must always be a minimum of two defined thermodynamic states.\n",
+    "    * Even when we talk about *absolute* free energies of binding, these are actually free energy differences between two states: the ligand restricted to the binding site, and the ligand free to explore all other configurations.\n",
     "* **Free energy differences between states at different temperatures are usually not what you want to be calculating** for problems of interest.  If it did, you would get $\\Delta A_{ij} = -k_B T_j \\ln Q_j + k_B T_i \\ln Q_i$, which is no longer a ratio calculation and not needed for biological systems of interest. Temperature dependence on free energy is more likely to be \"what is $ \\Delta A_{ij} $ change at two different temperatures?\"\n",
-    "* **There are two different phase space volumes.** $\\Gamma_i$ and $\\Gamma_j$ are often the same, but they are not required to be. The methods presented here almost always assumes that the two phase spaces overlap. However, when the spaces do not overlap, these methods break down and it is difficult to identify this problem without in depth knowledge of your system. Consider the example of a hard sphere solute with radius $\\sigma$ at state $i$ and a Lennard Jones repulsion/dispersion potential, with the same $\\sigma$ at state $j$. Since $\\Gamma _i$ will not have molecules at a distance less than $\\sigma$, but $\\Gamma_j$ will, the two phase spaces are not the same and these methods will either break down or return very error-prone results.\n",
-    "    * The degree to which the phase spaces are shared is called the \"phase space overlap\". Efficient free energy calculations require significant phase space overlap. There are  [a number of strategies to address](Intermediate_States) lack of overlap between target spaces, but determining the best way for any given situation is still a research question.\n",
+    "* **There are two different phase space volumes.** $\\Gamma_i$ and $\\Gamma_j$ are often the same, but they are not required to be. The methods presented here almost always assumes that the two phase spaces overlap. However, when the spaces do not overlap, these methods break down and it is difficult to identify this problem without in depth knowledge of your system. Consider the example of a hard sphere solute with radius $\\sigma$ at state $i$ and a Lennard Jones repulsion/dispersion potential, with the same $\\sigma$ at state $j$. Since $\\Gamma _i$ will not have molecules at a distance less than $\\sigma$, but $\\Gamma_j$ will, the two phase spaces are not the same and these alchemical methods will either break down or return very error-prone results.\n",
+    "    * The degree to which the phase spaces are shared by the two end state Hamiltonians is called the \"phase space overlap\". Efficient free energy calculations require significant phase space overlap. There are  [a number of strategies to address](Intermediate_States) lack of overlap between target spaces, but determining the best approach for any given situation is still a research question.\n",
     "    * It should also be noted that \"near zero probability\" and \"always zero probability\" are two distinct things when considering phase space. So long as there is a chance for an observation to be made, no matter how small, it is considered part of the phase space."
    ]
   },
@@ -123,7 +123,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.3"
+   "version": "3.11.5"
   }
  },
  "nbformat": 4,

diff --git a/alchemistry/fundamentals/ImprovingSampling.ipynb b/alchemistry/fundamentals/ImprovingSampling.ipynb
@@ -114,7 +114,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.3"
+   "version": "3.11.5"
   }
  },
  "nbformat": 4,