From 1856a26d8bce827f7fa97176a9cd1b2effbf0863 Mon Sep 17 00:00:00 2001 From: Ronny Bergmann Date: Tue, 16 Apr 2024 20:34:51 +0200 Subject: [PATCH] Add a structured list of solvers (#380) * Start sketich the list, keeping a temp which ones to sort in still. * Finish a list of algorithms grouped by features. * Adds two symbols as indicators for state-of-the-art and robust. * Add a missing description, change menu entry for the list of solvers. --------- Co-authored-by: Mateusz Baran --- Changelog.md | 1 + docs/make.jl | 2 +- docs/src/solvers/index.md | 140 +++++++++++++++++++++++------- src/solvers/FrankWolfe.jl | 4 +- src/solvers/LevenbergMarquardt.jl | 2 +- 5 files changed, 115 insertions(+), 34 deletions(-) diff --git a/Changelog.md b/Changelog.md index 4a534fb056..90c4fcba5e 100644 --- a/Changelog.md +++ b/Changelog.md @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added * Tests now also use `Aqua.jl` to spot problems in the code, e.g. ambiguities. +* introduce a feature-based list of solvers and reduce the details in the alphabetical list ### Fixed diff --git a/docs/make.jl b/docs/make.jl index 30d8e57087..c70e7e25b3 100755 --- a/docs/make.jl +++ b/docs/make.jl @@ -166,7 +166,7 @@ makedocs(; "About" => "about.md", (tutorials_in_menu ? [tutorials_menu] : [])..., "Solvers" => [ - "Introduction" => "solvers/index.md", + "List of Solvers" => "solvers/index.md", "Adaptive Regularization with Cubics" => "solvers/adaptive-regularization-with-cubics.md", "Alternating Gradient Descent" => "solvers/alternating_gradient_descent.md", "Augmented Lagrangian Method" => "solvers/augmented_Lagrangian_method.md", diff --git a/docs/src/solvers/index.md b/docs/src/solvers/index.md index 78fed17003..c601f37b8a 100644 --- a/docs/src/solvers/index.md +++ b/docs/src/solvers/index.md @@ -1,43 +1,123 @@ -# List of available solvers +# Available solvers in Manopt.jl ```@meta CurrentModule = Manopt ``` -Solvers can be applied to [`AbstractManoptProblem`](@ref)s with solver -specific [`AbstractManoptSolverState`](@ref). +Optimisation problems can be classified with respect to several criteria. +In the following we provide a grouping of the algorithms with respect to the “information” +available about your optimisation problem -# List of algorithms +```math +\operatorname*{arg\,min}_{p∈\mathbb M} f(p) +``` + +Within the groups we provide short notes on advantages of the individual solvers, pointing our properties the cost ``f`` should have. +We use 🏅 to indicate state-of-the-art solvers, that usually perform best in their corresponding group and 🫏 for a maybe not so fast, maybe not so state-of-the-art method, that nevertheless gets the job done most reliably. + +## Derivative Free + +For derivative free only function evaluations of ``f`` are used. + +* [Nelder-Mead](NelderMead.md) a simplex based variant, that is using ``d+1`` points, where ``d`` is the dimension of the manifold. +* [Particle Swarm](particle_swarm.md) 🫏 use the evolution of a set of points, called swarm, to explore the domain of the cost and find a minimizer. +* [CMA-ES](cma_es.md) uses a stochastic evolutionary strategy to perform minimization robust to local minima of the objective. + +## First Order + +### Gradient + +* [Gradient Descent](gradient_descent.md) uses the gradient from ``f`` to determine a descent direction. Here, the direction can also be changed to be Averaged, Momentum-based, based on Nesterovs rule. +* [Conjugate Gradient Descent](conjugate_gradient_descent.md) uses information from the previous descent direction to improve the current (gradient-based) one including several such update rules. +* The [Quasi-Newton Method](quasi_Newton.md) 🏅 uses gradient evaluations to approximate the Hessian, which is then used in a Newton-like scheme, where both a limited memory and a full Hessian approximation are available with several different update rules. +* [Steihaug-Toint Truncated Conjugate-Gradient Method](@ref tCG) a solver for a constrained problem defined on a tangent space. + +### Subgradient + +The following methods require the Riemannian subgradient ``∂f`` to be available. +While the subgradient might be set-valued, the function should provide one of the subgradients. + +* The [Subgradient Method](subgradient.md) takes the negative subgradient as a step direction and can be combined with a step size. +* The [Convex Bundle Method](convex_bundle_method.md) (CBM) uses a former collection of sub gradients at the previous iterates and iterate candidates to solve a local approximation to `f` in every iteration by solving a quadratic problem in the tangent space. +* The [Proximal Bundle Method](proximal_bundle_method.md) works similar to CBM, but solves a proximal map-based problem in every iteration. + +## Second Order + +* [Adaptive Regularisation with Cubics](adaptive-regularization-with-cubics.md) 🏅 locally builds a cubic model to determine the next descent direction. +* The [Riemannian Trust-Regions Solver](trust_regions.md) builds a quadratic model within a trust region to determine the next descent direction. + +## Splitting based + +For splitting methods, the algorithms are based on splitting the cost into different parts, usually in a sum of two or more summands. +This is usually very well tailored for non-smooth objectives. + +### Smooth + +The following methods require that the splitting, for example into several summands, is smooth in the sense that for every summand of the cost, the gradient should still exist everywhere + +* [Levenberg-Marquardt](LevenbergMarquardt.md) minimizes the square norm of ``f: \mathcal M→ℝ^d`` provided the gradients of the component functions, or in other words the Jacobian of ``f``. +* [Stochastic Gradient Descent](stochastic_gradient_descent.md) is based on a splitting of ``f`` into a sum of several components ``f_i`` whose gradients are provided. Steps are performed according to gradients of randomly selected components. +* The [Alternating Gradient Descent](@ref solver-alternating-gradient-descent) alternates gradient descent steps on the components of the product manifold. All these components should be smooth aso the gradient exists, and (locally) convex. + +### Nonsmooth + +If the gradient does not exist everywhere, that is if the splitting yields summands that are nonsmooth, usually methods based on proximal maps are used. + +* The [Chambolle-Pock](ChambollePock.md) algorithm uses a splitting ``f(p) = F(p) + G(Λ(p))``, + where ``G`` is defined on a manifold ``\mathcal N`` and we need the proximal map of its Fenchel dual. Both these functions can be non-smooth. +* The [Cyclic Proximal Point](cyclic_proximal_point.md) 🫏 uses proximal maps of the functions from splitting ``f`` into summands ``f_i`` +* [Difference of Convex Algorithm](@ref solver-difference-of-convex) (DCA) uses a splitting of the (nonconvex) function ``f = g - h`` into a difference of two functions; for each of these we require the gradient of ``g`` and the subgradient of ``h`` to state a sub problem in every iteration to be solved. +* [Difference of Convex Proximal Point](@ref solver-difference-of-convex-proximal-point) uses a splitting of the (nonconvex) function ``f = g - h`` into a difference of two functions; provided the proximal map of ``g`` and the subgradient of ``h``, the next iterate is computed. Compared to DCA, the correpsonding sub problem is here written in a form that yields the proximal map. +* [Douglas—Rachford](DouglasRachford.md) uses a splitting ``f(p) = F(x) + G(x)`` and their proximal maps to compute a minimizer of ``f``, which can be non-smooth. +* [Primal-dual Riemannian semismooth Newton Algorithm](@ref solver-pdrssn) extends Chambolle-Pock and requires the differentials of the proximal maps additionally. + +## Constrained + +Constrained problems of the form + +```math +\begin{align*} +\operatorname*{arg\,min}_{p∈\mathbb M}& f(p)\\ +\text{such that } & g(p) \leq 0\\&h(p) = 0 +\end{align*} +``` + +For these you can use + +* The [Augmented Lagrangian Method](augmented_Lagrangian_method.md) (ALM), where both `g` and `grad_g` as well as `h` and `grad_h` are keyword arguments, and one of these pairs is mandatory. +* The [Exact Penalty Method](exact_penalty_method.md) (EPM) uses a penalty term instead of augmentation, but has the same interface as ALM. +* [Frank-Wolfe algorithm](FrankWolfe.md), where besides the gradient of ``f`` either a closed form solution or a (maybe even automatically generated) sub problem solver for ``\operatorname*{arg\,min}_{q ∈ C} ⟨\operatorname{grad} f(p_k), \log_{p_k}q⟩`` is required, where ``p_k`` is a fixed point on the manifold (changed in every iteration). -The following algorithms are currently available +# Alphabetical list List of algorithms -| Solver | Function & State | Objective | +| Solver | Function | State | |:---------|:----------------|:---------| -[Alternating Gradient Descent](@ref solver-alternating-gradient-descent) | [`alternating_gradient_descent`](@ref) [`AlternatingGradientDescentState`](@ref) | ``f=(f_1,\ldots,f_n)``, ``\operatorname{grad} f_i`` | -[Augmented Lagrangian Method](augmented_Lagrangian_method.md) | [`augmented_Lagrangian_method`](@ref), [`AugmentedLagrangianMethodState`](@ref) | ``f``, ``\operatorname{grad} f``, ``g``, ``\operatorname{grad} g_i``, ``h``, ``\operatorname{grad} h_j`` | -[Chambolle-Pock](ChambollePock.md) | [`ChambollePock`](@ref), [`ChambollePockState`](@ref) (using [`TwoManifoldProblem`](@ref)) | ``f=F+G(Λ\cdot)``, ``\operatorname{prox}_{σ F}``, ``\operatorname{prox}_{τ G^*}``, ``Λ`` | -[Conjugate Gradient Descent](conjugate_gradient_descent.md) | [`conjugate_gradient_descent`](@ref), [`ConjugateGradientDescentState`](@ref) | ``f``, ``\operatorname{grad} f`` -[Convex Bundle Method](convex_bundle_method.md) | [`convex_bundle_method`](@ref), [`ConvexBundleMethodState`](@ref) | ``f``, ``\partial f`` -[Cyclic Proximal Point](cyclic_proximal_point.md) | [`cyclic_proximal_point`](@ref), [`CyclicProximalPointState`](@ref) | ``f=\sum f_i``, ``\operatorname{prox}_{\lambda f_i}`` | -[Difference of Convex Algorithm](@ref solver-difference-of-convex) | [`difference_of_convex_algorithm`](@ref), [`DifferenceOfConvexState`](@ref) | ``f=g-h``, ``∂h``, and for example ``g``, ``\operatorname{grad} g`` | -[Difference of Convex Proximal Point](@ref solver-difference-of-convex-proximal-point) | [`difference_of_convex_proximal_point`](@ref), [`DifferenceOfConvexProximalState`](@ref) | ``f=g-h``, ``∂h``, and for example ``g``, ``\operatorname{grad} g`` | -[Douglas—Rachford](DouglasRachford.md) | [`DouglasRachford`](@ref), [`DouglasRachfordState`](@ref) | ``f=\sum f_i``, ``\operatorname{prox}_{\lambda f_i}`` | -[Exact Penalty Method](exact_penalty_method.md) | [`exact_penalty_method`](@ref), [`ExactPenaltyMethodState`](@ref) | ``f``, ``\operatorname{grad} f``, ``g``, ``\operatorname{grad} g_i``, ``h``, ``\operatorname{grad} h_j`` | -[Frank-Wolfe algorithm](FrankWolfe.md) | [`Frank_Wolfe_method`](@ref), [`FrankWolfeState`](@ref) | sub-problem solver | -[Gradient Descent](gradient_descent.md) | [`gradient_descent`](@ref), [`GradientDescentState`](@ref) | ``f``, ``\operatorname{grad} f`` | -[Levenberg-Marquardt](LevenbergMarquardt.md) | [`LevenbergMarquardt`](@ref), [`LevenbergMarquardtState`](@ref) | ``f = \sum_i f_i`` ``\operatorname{grad} f_i`` (Jacobian)| -[Nelder-Mead](NelderMead.md) | [`NelderMead`](@ref), [`NelderMeadState`](@ref) | ``f`` -[Particle Swarm](particle_swarm.md) | [`particle_swarm`](@ref), [`ParticleSwarmState`](@ref) | ``f`` | -[Primal-dual Riemannian semismooth Newton Algorithm](@ref solver-pdrssn) | [`primal_dual_semismooth_Newton`](@ref), [`PrimalDualSemismoothNewtonState`](@ref) (using [`TwoManifoldProblem`](@ref)) | ``f=F+G(Λ\cdot)``, ``\operatorname{prox}_{σ F}`` & diff., ``\operatorname{prox}_{τ G^*}`` & diff., ``Λ`` -[Proximal Bundle Method](proximal_bundle_method.md) | [`proximal_bundle_method`](@ref), [`ProximalBundleMethodState`](@ref) | ``f``, ``\partial f`` -[Quasi-Newton Method](quasi_Newton.md) | [`quasi_Newton`](@ref), [`QuasiNewtonState`](@ref) | ``f``, ``\operatorname{grad} f`` | -[Steihaug-Toint Truncated Conjugate-Gradient Method](@ref tCG) | [`truncated_conjugate_gradient_descent`](@ref), [`TruncatedConjugateGradientState`](@ref) | ``f``, ``\operatorname{grad} f``, ``\operatorname{Hess} f`` | -[Subgradient Method](subgradient.md) | [`subgradient_method`](@ref), [`SubGradientMethodState`](@ref) | ``f``, ``∂ f`` | -[Stochastic Gradient Descent](stochastic_gradient_descent.md) | [`stochastic_gradient_descent`](@ref), [`StochasticGradientDescentState`](@ref) | ``f = \sum_i f_i``, ``\operatorname{grad} f_i`` | -[The Riemannian Trust-Regions Solver](trust_regions.md) | [`trust_regions`](@ref), [`TrustRegionsState`](@ref) | ``f``, ``\operatorname{grad} f``, ``\operatorname{Hess} f`` | - -Note that the solvers (their [`AbstractManoptSolverState`](@ref), to be precise) can also be decorated to enhance your algorithm by general additional properties, see [debug output](@ref sec-debug) and [recording values](@ref sec-record). This is done using the `debug=` and `record=` keywords in the function calls. Similarly, since Manopt.jl 0.4 a (simple) [caching of the objective function](@ref subsection-cache-objective) using the `cache=` keyword is available in any of the function calls.. +| [Adaptive Regularisation with Cubics](adaptive-regularization-with-cubics.md) | [`adaptive_regularization_with_cubics`](@ref) | [`AdaptiveRegularizationState`](@ref) | +| [Augmented Lagrangian Method](augmented_Lagrangian_method.md) | [`augmented_Lagrangian_method`](@ref) | [`AugmentedLagrangianMethodState`](@ref) | +| [Chambolle-Pock](ChambollePock.md) | [`ChambollePock`](@ref) | [`ChambollePockState`](@ref) | +| [Conjugate Gradient Descent](conjugate_gradient_descent.md) | [`conjugate_gradient_descent`](@ref) | [`ConjugateGradientDescentState`](@ref) | +| [Convex Bundle Method](convex_bundle_method.md) | [`convex_bundle_method`](@ref) | [`ConvexBundleMethodState`](@ref) | +| [Cyclic Proximal Point](cyclic_proximal_point.md) | [`cyclic_proximal_point`](@ref) | [`CyclicProximalPointState`](@ref) | +| [Difference of Convex Algorithm](@ref solver-difference-of-convex) | [`difference_of_convex_algorithm`](@ref) | [`DifferenceOfConvexState`](@ref) | +| [Difference of Convex Proximal Point](@ref solver-difference-of-convex-proximal-point) | [`difference_of_convex_proximal_point`](@ref) | [`DifferenceOfConvexProximalState`](@ref) | +| [Douglas—Rachford](DouglasRachford.md) | [`DouglasRachford`](@ref) | [`DouglasRachfordState`](@ref) | +| [Exact Penalty Method](exact_penalty_method.md) | [`exact_penalty_method`](@ref) | [`ExactPenaltyMethodState`](@ref) | +| [Frank-Wolfe algorithm](FrankWolfe.md) | [`Frank_Wolfe_method`](@ref) | [`FrankWolfeState`](@ref) | +| [Gradient Descent](gradient_descent.md) | [`gradient_descent`](@ref) | [`GradientDescentState`](@ref) | +| [Levenberg-Marquardt](LevenbergMarquardt.md) | [`LevenbergMarquardt`](@ref) | [`LevenbergMarquardtState`](@ref) | ``f = \sum_i f_i`` ``\operatorname{grad} f_i`` (Jacobian)| +| [Nelder-Mead](NelderMead.md) | [`NelderMead`](@ref) | [`NelderMeadState`](@ref) | +| [Particle Swarm](particle_swarm.md) | [`particle_swarm`](@ref) | [`ParticleSwarmState`](@ref) | +[Primal-dual Riemannian semismooth Newton Algorithm](@ref solver-pdrssn) | [`primal_dual_semismooth_Newton`](@ref) | [`PrimalDualSemismoothNewtonState`](@ref) | +| [Proximal Bundle Method](proximal_bundle_method.md) | [`proximal_bundle_method`](@ref) | [`ProximalBundleMethodState`](@ref) | +| [Quasi-Newton Method](quasi_Newton.md) | [`quasi_Newton`](@ref) | [`QuasiNewtonState`](@ref) | +| [Steihaug-Toint Truncated Conjugate-Gradient Method](@ref tCG) | [`truncated_conjugate_gradient_descent`](@ref) | [`TruncatedConjugateGradientState`](@ref) | +| [Subgradient Method](subgradient.md) | [`subgradient_method`](@ref) | [`SubGradientMethodState`](@ref) | +| [Stochastic Gradient Descent](stochastic_gradient_descent.md) | [`stochastic_gradient_descent`](@ref) | [`StochasticGradientDescentState`](@ref) | +| [Riemannian Trust-Regions](trust_regions.md) | [`trust_regions`](@ref) | [`TrustRegionsState`](@ref) | + + +Note that the solvers (their [`AbstractManoptSolverState`](@ref), to be precise) can also be decorated to enhance your algorithm by general additional properties, see [debug output](@ref sec-debug) and [recording values](@ref sec-record). This is done using the `debug=` and `record=` keywords in the function calls. Similarly, a `cache=` keyword is available in any of the function calls, that wraps the [`AbstractManoptProblem`](@ref) in a cache for certain parts of the objective. ## Technical details diff --git a/src/solvers/FrankWolfe.jl b/src/solvers/FrankWolfe.jl index d7cc482ab8..30e716c597 100644 --- a/src/solvers/FrankWolfe.jl +++ b/src/solvers/FrankWolfe.jl @@ -19,7 +19,7 @@ It comes in two forms, depending on the realisation of the `subproblem`. The sub task requires a method to solve ```math - \operatorname*{argmin}_{q∈\mathcal M} ⟨X, \log_p q⟩,\qquad \text{ where }X=\operatorname{grad} f(p) + \operatorname*{arg\,min}_{q ∈ C} ⟨\operatorname{grad} f(p_k), \log_{p_k}q⟩. ``` # Constructor @@ -135,7 +135,7 @@ where the main step is a constrained optimisation is within the algorithm, that is the sub problem (Oracle) ```math - q_k = \operatorname{arg\,min}_{q ∈ C} ⟨\operatorname{grad} F(p_k), \log_{p_k}q⟩. + q_k = \operatorname*{arg\,min}_{q ∈ C} ⟨\operatorname{grad} f(p_k), \log_{p_k}q⟩. ``` for every iterate ``p_k`` together with a stepsize ``s_k≤1``, by default ``s_k = \frac{2}{k+2}``. diff --git a/src/solvers/LevenbergMarquardt.jl b/src/solvers/LevenbergMarquardt.jl index b73bbe486f..298b650d42 100644 --- a/src/solvers/LevenbergMarquardt.jl +++ b/src/solvers/LevenbergMarquardt.jl @@ -4,7 +4,7 @@ Solve an optimization problem of the form ```math -\operatorname{arg\,min}_{p ∈ \mathcal M} \frac{1}{2} \lVert f(p) \rVert^2, +\operatorname*{arg\,min}_{p ∈ \mathcal M} \frac{1}{2} \lVert f(p) \rVert^2, ``` where ``f: \mathcal M → ℝ^d`` is a continuously differentiable function,