Ronny Bergmann 2023-06-06
This example illustrates how the 📖 Rosenbrock problem can be rephrased as a difference of convex problem and with a new metric on Euclidean space. This example is the code that produces the results in [BFSS23], Section 7.2.
Both the Rosenbrock problem
\[ \operatorname*{argmin}_{x\in ℝ^2} a\bigl( x_1^2-x_2\bigr)^2 + \bigl(x_1-b\bigr)^2,\]
where $a,b>0$ and usually $b=1$ and $a \gg b$, we know the minimizer $x^* = (b,b^2)^\mathrm{T}$, and also the (Euclidean) gradient
\[\nabla f(x) =
+ \begin{pmatrix}
+ 4a(x_1^2-x_2)\\ -2a(x_1^2-x_2)
+ \end{pmatrix}
+ +
+ \begin{pmatrix}
+ 2(x_1-b)\\ 0
+ \end{pmatrix}.\]
They are even available already here in ManifoldExamples.jl
, see RosenbrockCost
and RosenbrockGradient!!
.
Furthermore, the RosenbrockMetric
can be used on $ℝ^2$, that is
\[⟨X,Y⟩_{\mathrm{Rb},p} = X^\mathrm{T}G_pY, \qquad
+G_p = \begin{pmatrix}
+ 1+4p_1^2 & -2p_1 \\
+ -2p_1 & 1
+\end{pmatrix},\]
In this example we want to explore four different approaches to minimizing the Rosenbrock example, that are all based on first-order methods, i.e. using a gradient but not a Hessian.
- The Euclidean Gradient
- The Riemannian gradient descent with respect to the
RosenbrockMetric
- The Euclidean Difference of Convex Algorithm
- The Riemannian Difference of Convex Algorithm respect to the
RosenbrockMetric
Where we obtain a difference of convex problem by writing
\[f(x) = a\bigl( x_1^2-x_2\bigr)^2 + \bigl(x_1-b\bigr)^2
+ = a\bigl( x_1^2-x_2\bigr)^2 + 2\bigl(x_1-b\bigr)^2 - \bigl(x_1-b\bigr)^2\]
that is
\[g(x) = a\bigl( x_1^2-x_2\bigr)^2 + 2\bigl(x_1-b\bigr)^2 \quad\text{ and }\quad h(x) = \bigl(x_1-b\bigr)^2\]
using LinearAlgebra, Random, Statistics
+using Manifolds, Manopt, ManoptExamples
+using NamedColors, Plots
+import Manopt: set_manopt_parameter!
+Random.seed!(42)
paul_tol = load_paul_tol()
+indigo = paul_tol["mutedindigo"]
+green = paul_tol["mutedgreen"]
+sand = paul_tol["mutedsand"]
+teal = paul_tol["mutedteal"]
+grey = paul_tol["mutedgrey"]
To emphasize the effect, we choose a quite large value of a
.
a = 2*10^5
+b = 1
and use the starting point and a direction to check gradients
p0 = [0.1, 0.2]
For the Euclidean gradient we can just use the same approach as in the Rosenbrock example
M = ℝ^2
+f = ManoptExamples.RosenbrockCost(M; a=a, b=b)
+∇f!! = ManoptExamples.RosenbrockGradient!!(M; a=a, b=b)
define a common debug vector
debug_vec = [
+ (:Iteration, "# %-8d "),
+ (:Cost, "F(x): %1.4e"),
+ " ",
+ (:Change, "|δp|: %1.4e | "),
+ (:GradientNorm, "|grad f|: %1.6e"),
+ :Stop,
+ "\n",
+ ]
and call the gradient descent algorithm
Eucl_GD_state = gradient_descent(M, f, ∇f!!, p0;
+ evaluation=InplaceEvaluation(),
+ debug=[debug_vec...,10^7],
+ stopping_criterion=StopAfterIteration(10^8) | StopWhenChangeLess(1e-16),
+ record=[:Iteration, :Cost],
+ return_state=true,
+)
Initial F(x): 7.2208e+03
+# 10000000 F(x): 8.9937e-06 |δp|: 1.3835e+00 | |grad f|: 8.170355e-03
+# 20000000 F(x): 2.9474e-09 |δp|: 6.5764e-03 | |grad f|: 1.419191e-04
+# 30000000 F(x): 9.8376e-13 |δp|: 1.1918e-04 | |grad f|: 2.526295e-06
+# 40000000 F(x): 3.2830e-16 |δp|: 2.1773e-06 | |grad f|: 4.526313e-08
+# 50000000 F(x): 1.0154e-19 |δp|: 3.9803e-08 | |grad f|: 6.838240e-10
+The algorithm performed a step with a change (0.0) less than 1.0e-16.
+
+# Solver state for `Manopt.jl`s Gradient Descent
+After 53073227 iterations
+
+## Parameters
+* retraction method: ExponentialRetraction()
+
+## Stepsize
+ArmijoLinesearch() with keyword parameters
+ * initial_stepsize = 1.0
+ * retraction_method = ExponentialRetraction()
+ * contraction_factor = 0.95
+ * sufficient_decrease = 0.1
+
+## Stopping criterion
+
+Stop When _one_ of the following are fulfilled:
+ Max Iteration 100000000: not reached
+ |Δp| < 1.0e-16: reached
+Overall: reached
+This indicates convergence: Yes
+
+## Debug
+ :Iteration = [(:Iteration, "# %-8d "), (:Cost, "F(x): %1.4e"), " ", (:Change, "|δp|: %1.4e | "), (:GradientNorm, "|grad f|: %1.6e"), "\n", 10000000]
+ :Stop = :Stop
+
+## Record
+(Iteration = RecordGroup([RecordIteration(), RecordCost()]),)
For the Riemannian case, we define
M_rb = MetricManifold(M, ManoptExamples.RosenbrockMetric())
MetricManifold(Euclidean(2; field=ℝ), ManoptExamples.RosenbrockMetric())
and the gradient is now adopted to the new metric
function grad_f!(M, X, p)
+ ∇f!!(M, X, p)
+ riemannian_gradient!(M, X, p, X)
+ return X
+end
+function grad_f(M, p)
+ X = zero_vector(M, p)
+ return grad_f!(M, X, p)
+end
R_GD_state = gradient_descent(M_rb, f, grad_f!, p0;
+ evaluation=InplaceEvaluation(),
+ debug=[debug_vec...,10^6],
+ stopping_criterion=StopAfterIteration(10^8) | StopWhenChangeLess(1e-16),
+ record=[:Iteration, :Cost],
+ return_state=true,
+)
Initial F(x): 7.2208e+03
+# 1000000 F(x): 1.3571e-09 |δp|: 9.1006e-01 | |grad f|: 1.974939e-04
+# 2000000 F(x): 2.7921e-18 |δp|: 3.6836e-05 | |grad f|: 9.240792e-09
+The algorithm performed a step with a change (0.0) less than 1.0e-16.
+
+# Solver state for `Manopt.jl`s Gradient Descent
+After 2443750 iterations
+
+## Parameters
+* retraction method: ExponentialRetraction()
+
+## Stepsize
+ArmijoLinesearch() with keyword parameters
+ * initial_stepsize = 1.0
+ * retraction_method = ExponentialRetraction()
+ * contraction_factor = 0.95
+ * sufficient_decrease = 0.1
+
+## Stopping criterion
+
+Stop When _one_ of the following are fulfilled:
+ Max Iteration 100000000: not reached
+ |Δp| < 1.0e-16: reached
+Overall: reached
+This indicates convergence: Yes
+
+## Debug
+ :Iteration = [(:Iteration, "# %-8d "), (:Cost, "F(x): %1.4e"), " ", (:Change, "|δp|: %1.4e | "), (:GradientNorm, "|grad f|: %1.6e"), "\n", 1000000]
+ :Stop = :Stop
+
+## Record
+(Iteration = RecordGroup([RecordIteration(), RecordCost()]),)
For the convex case, we have to first introduce the two parts of the cost.
f1(M, p; a=100, b=1) = a * (p[1]^2 - p[2])^2;
+f2(M, p; a=100, b=1) = (p[1] - b[1])^2;
+g(M, p; a=100, b=1) = f1(M, p; a=a, b=b) + 2 * f2(M, p; a=a, b=b)
+h(M, p; a=100, b=1) = f2(M, p; a=a, b=b)
and their (Euclidan) gradients
function ∇h!(M, X, p; a=100, b=1)
+ X[1] = 2*(p[1]-b)
+ X[2] = 0
+ return X
+end
+function ∇h(M, p; a=100, b=1)
+ X = zero(p)
+ ∇h!(M, X, p; a=a, b=b)
+ return X
+end
+function ∇g!(M, X, p; a=100, b=1)
+ X[1] = 4*a*(p[1]^2-p[2])*p[1] + 2*2*(p[1]-b)
+ X[2] = -2*a*(p[1]^2-p[2])
+ return X
+end
+function ∇g(M, p; a=100, b=1)
+ X = zero(p)
+ ∇g!(M, X, p; a=a, b=b)
+ return X
+end
and we define for convenience
docE_g(M, p) = g(M, p; a=a, b=b)
+docE_f(M,p) = docE_g(M,p) - h(M, p; a=a, b=b)
+docE_∇h!(M, X, p) = ∇h!(M, X, p; a=a, b=b)
+docE_∇g!(M, X, p) = ∇g!(M, X, p; a=a, b=b)
+function docE_∇f!(M, X, p)
+ Y = zero_vector(M, p)
+ docE_∇g!(M, X, p)
+ docE_∇h!(M, Y, p)
+ X .-= Y
+ return X
+end
Then we call the difference of convex algorithm on Eucldiean space $ℝ^2$.
E_doc_state = difference_of_convex_algorithm(
+ M, docE_f, docE_g, docE_∇h!, p0;
+ gradient=docE_∇f!,
+ grad_g = docE_∇g!,
+ debug=[debug_vec..., 10^4],
+ evaluation=InplaceEvaluation(),
+ record=[:Iteration, :Cost],
+ stopping_criterion=StopAfterIteration(10^8) | StopWhenChangeLess(1e-16),
+ sub_hess=nothing, # Use gradient descent
+ sub_stopping_criterion=StopAfterIteration(2000) | StopWhenGradientNormLess(1e-16),
+ return_state=true,
+)
Initial F(x): 7.2208e+03
+# 10000 F(x): 2.9705e-09 |δp|: 1.3270e+00 | |grad f|: 1.388203e-04
+# 20000 F(x): 3.3302e-16 |δp|: 1.2173e-04 | |grad f|: 4.541087e-08
+The algorithm performed a step with a change (0.0) less than 1.0e-16.
+
+# Solver state for `Manopt.jl`s Difference of Convex Algorithm
+After 26549 iterations
+
+## Parameters
+* sub solver state:
+ | # Solver state for `Manopt.jl`s Gradient Descent
+ | After 2000 iterations
+ |
+ | ## Parameters
+ | * retraction method: ExponentialRetraction()
+ |
+ | ## Stepsize
+ | ArmijoLinesearch() with keyword parameters
+ | * initial_stepsize = 1.0
+ | * retraction_method = ExponentialRetraction()
+ | * contraction_factor = 0.95
+ | * sufficient_decrease = 0.1
+ |
+ | ## Stopping criterion
+ |
+ | Stop When _one_ of the following are fulfilled:
+ | Max Iteration 2000: reached
+ | |grad f| < 1.0e-16: not reached
+ | Overall: reached
+ | This indicates convergence: No
+
+## Stopping criterion
+
+Stop When _one_ of the following are fulfilled:
+ Max Iteration 100000000: not reached
+ |Δp| < 1.0e-16: reached
+Overall: reached
+This indicates convergence: Yes
+
+## Debug
+ :Iteration = [(:Iteration, "# %-8d "), (:Cost, "F(x): %1.4e"), " ", (:Change, "|δp|: %1.4e | "), (:GradientNorm, "|grad f|: %1.6e"), "\n", 10000]
+ :Stop = :Stop
+
+## Record
+(Iteration = RecordGroup([RecordIteration(), RecordCost()]),)
We first have to again defined the gradients with respect to the new metric
function grad_h!(M, X, p; a=100, b=1)
+ ∇h!(M, X, p; a=a, b=b)
+ riemannian_gradient!(M, X, p, X)
+ return X
+end
+function grad_h(M, p; a=100, b=1)
+ X = zero(p)
+ grad_h!(M, X, p; a=a, b=b)
+ return X
+end
+function grad_g!(M, X, p; a=100, b=1)
+ ∇g!(M, X, p; a=a,b=b)
+ riemannian_gradient!(M, X, p, X)
+ return X
+end
+function grad_g(M, p; a=100, b=1)
+ X = zero(p)
+ grad_g!(M, X, p; a=a, b=b)
+ return X
+end
While the cost of the subgradient can be infered automaticallty, we also have to provide the gradient of the sub problem. For $X \in ∂h(p^{(k)})$ the sunproblem top determine $p^{(k+1)}$ reads
\[\operatorname*{argmin}_{p\in\mathcal M} g(p) - \langle X, \log_{p^{(k)}}p\rangle\]
for which usually the cost and gradient functions are computed automatically in the difference of convex algorithm. However, in our case first the closed form solution for the adjoint differential of the logaithmic map is complicated to compute and second the gradint can even be given in a nicer form. We can first simplify in our case with $X = \operatorname{grad} h(p^{(k)})$ that
\[\phi(p) = g(p) - \langle X, \log_{p^{(k)}}p\rangle
+= a\bigl( p_{1}^2-p_{2}\bigr)^2
+ + 2\bigl(p_{1}-b\bigr)^2 - 2(p^{(k)}_1-b)p_1 + 2(p^{(k)}_1-b)p^{(k)}_1,\]
its Euclidean gradient reads
\[\operatorname{grad}\phi(p) =
+ \nabla \varphi(p)
+ = \begin{pmatrix}
+ 4a p_1(p_1^2-p_2) + 4(p_1-b) - 2(p^{(k)}_1-b)\\
+ -2a(p_1^2-p_2)
+ \end{pmatrix}\]
where we can again employ the gradient conversion from before to obtain the Riemannian gradient.
mutable struct SubGrad{P,T,V}
+ pk::P
+ Xk::T
+ a::V
+ b::V
+end
+function (ϕ::SubGrad)(M, p)
+ X = zero_vector(M, p)
+ ϕ(M, X, p)
+ return X
+end
+function (ϕ::SubGrad)(M, X, p)
+ X .= [
+ 4 * ϕ.a * p[1] * (p[1]^2 - p[2]) + 4 * (p[1] - ϕ.b) - 2 * (ϕ.pk[1] - ϕ.b),
+ -2 * ϕ.a * (p[1]^2 - p[2]),
+ ]
+ riemannian_gradient!(M, X, p, X) # convert
+ return X
+end
And in orer to update the subsolvers gradient correctly, we have to overwrite
set_manopt_parameter!(ϕ::SubGrad, ::Val{:p}, p) = (ϕ.pk .= p)
+set_manopt_parameter!(ϕ::SubGrad, ::Val{:X}, X) = (ϕ.Xk .= X)
And we again introduce for ease of use
docR_g(M, p) = g(M, p; a=a, b=b)
+docR_f(M, p) = docR_g(M, p) - h(M, p; a=a, b=b)
+docR_grad_h!(M, X, p) = grad_h!(M, X, p; a=a, b=b)
+docR_grad_g!(M, X, p) = grad_g!(M, X, p; a=a, b=b)
+function docR_grad_f!(M, X, p)
+ Y = zero_vector(M, p)
+ docR_grad_g!(M, X, p)
+ docR_grad_h!(M, Y, p)
+ X .-= Y
+ return X
+end
+docR_sub_grad = SubGrad(copy(M, p0), zero_vector(M, p0), a, b)
Then we can finally call the last of our four algorithms to compare, the difference of convex algorithm with the Riemannian metric.
R_doc_state = difference_of_convex_algorithm(
+ M_rb, docR_f, docR_g, docR_grad_h!, p0;
+ gradient=docR_grad_f!,
+ grad_g = docR_grad_g!,
+ debug=[debug_vec..., 10^6],
+ evaluation=InplaceEvaluation(),
+ record=[:Iteration, :Cost],
+ stopping_criterion=StopAfterIteration(10^8) | StopWhenChangeLess(1e-16),
+ sub_grad=docR_sub_grad,
+ sub_hess = nothing, # Use gradient descent
+ sub_stopping_criterion=StopAfterIteration(2000) | StopWhenGradientNormLess(1e-16),
+ return_state=true,
+)
Initial F(x): 7.2208e+03
+The algorithm performed a step with a change (0.0) less than 1.0e-16.
+
+# Solver state for `Manopt.jl`s Difference of Convex Algorithm
+After 1235 iterations
+
+## Parameters
+* sub solver state:
+ | # Solver state for `Manopt.jl`s Gradient Descent
+ | After 2000 iterations
+ |
+ | ## Parameters
+ | * retraction method: ExponentialRetraction()
+ |
+ | ## Stepsize
+ | ArmijoLinesearch() with keyword parameters
+ | * initial_stepsize = 1.0
+ | * retraction_method = ExponentialRetraction()
+ | * contraction_factor = 0.95
+ | * sufficient_decrease = 0.1
+ |
+ | ## Stopping criterion
+ |
+ | Stop When _one_ of the following are fulfilled:
+ | Max Iteration 2000: reached
+ | |grad f| < 1.0e-16: not reached
+ | Overall: reached
+ | This indicates convergence: No
+
+## Stopping criterion
+
+Stop When _one_ of the following are fulfilled:
+ Max Iteration 100000000: not reached
+ |Δp| < 1.0e-16: reached
+Overall: reached
+This indicates convergence: Yes
+
+## Debug
+ :Iteration = [(:Iteration, "# %-8d "), (:Cost, "F(x): %1.4e"), " ", (:Change, "|δp|: %1.4e | "), (:GradientNorm, "|grad f|: %1.6e"), "\n", 1000000]
+ :Stop = :Stop
+
+## Record
+(Iteration = RecordGroup([RecordIteration(), RecordCost()]),)
fig = plot(;
+ legend=:topright,
+ xlabel=raw"Iterations $k$ (log. scale)", ylabel=raw"Cost $f(x)$ (log. scale)",
+ yaxis=:log,
+ ylims=(1e-16, 5*1e5),
+ xaxis=:log,
+ xlims=(1,10^8),
+)
+scatter!(fig, [1,], [f(M,p0),], label=raw"$f(p_0)$", markercolor=grey)
+egi = get_record(Eucl_GD_state, :Iteration, :Iteration)[1:10000:end] #5308 entries
+egc = get_record(Eucl_GD_state, :Iteration, :Cost)[1:10000:end] #5308 entries
+plot!(fig, egi, egc, color=teal, label="Euclidean GD")
+#
+rgi = get_record(R_GD_state, :Iteration, :Iteration)[1:1000:end] # 2444 entries
+rgc = get_record(R_GD_state, :Iteration, :Cost)[1:1000:end] # 2444 entries
+plot!(fig, rgi, rgc, color=indigo, label="Riemannian GD")
+#
+edi = get_record(E_doc_state, :Iteration, :Iteration) #26549 entries
+edc = get_record(E_doc_state, :Iteration, :Cost) #26549 entries
+plot!(fig, edi, edc, color=sand, label="Euclidean DoC")
+#
+rdi = get_record(R_doc_state, :Iteration, :Iteration) # 1235 entries
+rdc = get_record(R_doc_state, :Iteration, :Cost) # 1235 entries
+plot!(fig, rdi, rdc, color=green, label="Riemannian DoC")
And we can see that using difference of convex outperforms gradient descent, and using the Riemannian approach required less iterations than their Euclidean counterparts.
- [BFSS23]
R. Bergmann, O. P. Ferreira, E. M. Santos and J. C. Souza.
The difference of convex algorithm on Hadamard manifolds. Preprint (2023),
arXiv:2112.05250.