Apply suggestions from code review by HM

Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
optimagic-dev · Nov 4, 2024 · 7cb889d · 7cb889d
1 parent f99d483
commit 7cb889d
Showing 1 changed file with 14 additions and 14 deletions.
diff --git a/docs/source/how_to/how_to_algorithm_selection.ipynb b/docs/source/how_to/how_to_algorithm_selection.ipynb
@@ -15,7 +15,7 @@
     "\n",
     "- There is no optimizer that works well for all problems \n",
     "- Making the right choice can lead to enormous speedups\n",
-    "- Making the wrong choice can mean that you don't solve your problem at all; Sometimes, \n",
+    "- Making the wrong choice can mean that you don't solve your problem at all. Sometimes, \n",
     "optimizers fail silently!\n",
     "\n",
     "\n",
@@ -24,20 +24,20 @@
     "Algorithm selection is a mix of theory and experimentation. We recommend the following \n",
     "for steps:\n",
     "\n",
-    "1. **Theory**: Select three to 5 candidate algorithms based on the properties \n",
-    "of your problem. Below we provide a simple decision tree for this step.\n",
+    "1. **Theory**: Based on the properties of your problem, start with 3 to 5 candidate algorithms. \n",
+    "You may use the [decision tree below](link)\n",
     "2. **Experiments**: Run the candidate algorithms for a small number of function \n",
     "evaluations. As a rule of thumb, use between `n_params` and `10 * n_params`\n",
     "evaluations. \n",
     "3. **Comparison**: Compare the results in a *criterion plot*.\n",
     "4. **Optimization**: Re-run the algorithm with the best results until \n",
-    "convergence. Use the best parameter vector from the experiments as starting point.\n",
+    "convergence. Use the best parameter vector from the experiments as start parameters.\n",
     "\n",
     "These steps work well for most problems. Sometimes you need [variations](four-steps-variations).\n",
     "\n",
     "## An example problem\n",
     "\n",
-    "As an example we use the Trid function. The Trid function has no local minimum except \n",
+    "As an example we use the [Trid function](https://www.sfu.ca/~ssurjano/trid.html). The Trid function has no local minimum except \n",
     "the global one. It is defined for any number of dimensions, we will pick 20. As starting \n",
     "values we will pick the vector [0, 1, ..., 19]. \n",
     "\n",
@@ -86,13 +86,13 @@
    "source": [
     "## Step 1: Theory\n",
     "\n",
-    "The below decision tree offers a practical guide on how to narrow down the set of algorithms to experiment with, based on the theoretical properties of your problem:\n",
+    "This is a practical guide for narrowing down the set of algorithms to experiment with:\n",
     "\n",
     "```{mermaid}\n",
     "graph LR\n",
     "    classDef highlight fill:#FF4500;\n",
     "    A[\"Do you have<br/>nonlinear constraints?\"] -- yes --> B[\"differentiable?\"]\n",
-    "    B[\"differentiable?\"] -- yes --> C[\"'ipopt', 'nlopt_slsqp', 'scipy_trust_constr', ...\"]\n",
+    "    B[\"Is your objective function differentiable?\"] -- yes --> C[\"'ipopt', 'nlopt_slsqp', 'scipy_trust_constr', ...\"]\n",
     "    B[\"differentiable?\"] -- no --> D[\"'scipy_cobyla', 'nlopt_cobyla', ...\"]\n",
     "\n",
     "    A[\"Do you have<br/>nonlinear constraints?\"] -- no --> E[\"Can you exploit<br/>a least-squares<br/>structure?\"]\n",
@@ -108,9 +108,9 @@
     "\n",
     "Let's go through the steps for the Trid function:\n",
     "\n",
-    "1. There are no nonlinear constraints our solution needs to satisfy\n",
-    "2. There is no least-squares structure we can exploit \n",
-    "3. The function is differentiable and we have a closed form gradient that we would like \n",
+    "1. **No** nonlinear constraints our solution needs to satisfy\n",
+    "2.  **No** no least-squares structure we can exploit \n",
+    "3.  **Yes**, the function is differentiable and we have a closed form gradient that we would like \n",
     "to use. \n",
     "\n",
     "We therefore end up with the candidate algorithms `scipy_lbfgsb`, `nlopt_lbfgsb`, and \n",
@@ -263,13 +263,13 @@
    "metadata": {},
    "source": [
     "We can see that our chosen optimizer solves the problem with less than 35 function \n",
-    "evaluations. At this time, the two gradient free optimizers have not even started to \n",
-    "make significant progress. Cobyla gets reasonably close to an optimum after about 4k \n",
-    "evaluations. Neldermead gets stuck after 8k evaluations and fails to solve the problem. \n",
+    "evaluations. At this point, the two gradient-free optimizers have not yet made \n",
+    "significant progress. CoByLA gets reasonably close to an optimum after about 4k \n",
+    "evaluations. Nelder-Mead gets stuck after 8k evaluations and fails to solve the problem. \n",
     "\n",
     "This example shows not only that the choice of optimizer is important but that the commonly \n",
     "held belief that gradient free optimizers are generally more robust than gradient based \n",
-    "ones is dangerous! The Neldermead algorithm did \"converge\" and reports success, but\n",
+    "ones is dangerous! The Nelder-Mead algorithm did \"converge\" and reports success, but\n",
     "did not find the optimum. It did not even get stuck in a local optimum because we know \n",
     "that the Trid function does not have local optima except the global one. It just got \n",
     "stuck somewhere. "