Problems with quasi-newton and PositiveDefinite matrix #401

benjione · 2024-07-19T06:57:53Z

I am optimizing over positive definite matrices and with gradient descent everything works quite well. I tried to switch to quasi-Newton and it works most of the time, but sometimes I get a PosDefException in the LineSearch.
I tried to change the Linesearch, but also with AdaptiveWNGradient I get it sometimes and I run in similar problems with a constant stepsize.

I could imagine that this is because the eigenvalues are too close to 0 and cholesky fails, even though the matrix is not technically SDP yet.
Does someone have any insights on this issue?

kellertuer · 2024-07-19T07:54:13Z

Without an MWE (minimal (non-)working example), or at least an error message where that appears, I can only ask more questions.

Is your objective smooth?
is your objective geodetically convex?
Which line-search are you using (before switching)?
what is the size of your SPDs (depending on the size different numerical rounding errors might appear)
which metric are you using on the SPDs (the default AffineInvariantMetric)?
Which update rule are you using? L-BFGS ()? Others, like for example https://manoptjl.org/stable/solvers/quasi_Newton/#Manopt.InverseSR1 might be less stable in some cases
Are you using the Caustious update? See https://manoptjl.org/stable/solvers/quasi_Newton/#Manopt.QuasiNewtonCautiousDirectionUpdate and the literature reference therein.

mateuszbaran · 2024-07-19T09:03:33Z

To start with, could you attach the stack trace for that exception? Maybe there is a relatively simple solution. Our SPD implementation already does some numerical stabilization so maybe we missed a few spots where it would be beneficial to do it.

benjione · 2024-07-19T23:51:56Z

I will check to come up with a minimum working example. Currently the code I am using is here .
I am optimizing functions f_A(x) = v(x)^T A v(x) with A being positive definite using different divergences. I got the error for quasi Newton so far with KL divergence and chi2 divergence, but with gradient descent it works, it just does not converge fast and I stop it before fully converged. Also, I do not get the error everytime, maybe every 10th time.

the objective is smooth (in the sense of being differentiable)
It is convex, but I don't know if it is geodetically convex, I will read up on it.
I used the WolfePowellLinesearch before
the sizes I tested with where 21x21 matrices
I am using the default AffineInvariantMetric metric
I use the default update rule, I believe this is BFGS
I am not using the cautious update, I will try that

The stacktrace is here: stacktrace.txt

kellertuer · 2024-07-20T04:25:57Z

v^TAv itself would be (locally, geodesicallz) convex, but in such long code I do not directly see how var you deviate from just that, so of course sums of these are also fine.

Matrix size is okay, it is not too large.

Concerning the point with the “stops before converged” for Gradient descent – why does it stop? (you can set debug=[:Stop] then it tells you that). You could also modify the stopping criterion, the default is a trade-off between not too many iterations and reasonably close to a minimiser (stopping_criterion=StopAfterIteration(200) | StopWhenGradientNormLess(1e-8)).

At first glance from the error (distance line 73 on the input q) it seems your q is already nonSPD at that point (otherwise cholesky would not complain). so we would have to narrow down why that is the case.

In such a long code I also do not directly see KL and chi ;) But I can try to find time in the evening today and take a closer look.

edit: I narrowed it further down, it seems to be the locking condition check (that is not called when alpha is zero) – I can only check this further on a concrete (failing) example, but it might be that we already need a tolerance for the alpha. though interesting that this did not cause problems until now.

mateuszbaran · 2024-07-22T07:48:45Z

I tried figuring it out too but I can't tell where the numerical error could have been introduced.

I use the default update rule, I believe this is BFGS

The default one is QuasiNewtonLimitedMemoryDirectionUpdate. Its default memory size is 20, but for my problems it usually worked best with lower memory sizes (between 5 and 10). Lower memory size also means fewer potential sources of numerical errors so it might help. You can just pass memory_size argument to quasi_Newton to try it.

benjione · 2024-08-01T11:40:44Z

Sorry for the late answer.

Even after 10000 iterations, gradient descent stops because of the iteration limit.
I reduced the memory size to 5 and I only got the error again once (of maybe 100 runs). Increasing it to 100, the problem appears much more often.
The cautious update did not make a difference
I did not come up with a working simple example yet. But I figured out that the problem mostly appears with complex distributions I try to estimate (I am using an iterative estimation approach, where this can get quite complex). Also, when adding regularization to the matrix, the problem appears less often.

So finally I think this is mostly a problem in numerical errors of my loss formulation, but I also guess that Manopt should keep the matrix positive definite even if there are numerical errors (?).
With JuMP, I did not had these problems so far.

I will try to get a simple example which is failing running and check for numerical problems in my formulation. But I am on holidays the next two weeks, so it will take some time.

kellertuer · 2024-08-01T12:19:32Z

Thanks for coming back to this

So finally I think this is mostly a problem in numerical errors of my loss formulation, but I also guess that Manopt should keep the matrix positive definite even if there are numerical errors (?).
With JuMP, I did not had these problems so far.

I have two questions on this.

How do you expect the positive definiteness should be performed? We can not “project back” to the manifold, because the set of SPD matrices is open (arbitrary small but positive eigenvalues are still fine).
So if we encounter zero or negative eigenvalues, I would only be able to offer to error (or warn) – for that you could check out the ValidationManifold at https://juliamanifolds.github.io/ManifoldsBase.jl/stable/manifolds/#A-manifold-for-validation

But if you have an idea how one could keep them SPD, sure, we can check how we can introduce that.

Comparison to JuMP:
Can you be a bit more specific what you are comparing to?
Besides that Manopt can be used within JuMP, I assume you do some kind of constraint optimisation?
Then you are comparing apples and oranges:

a) Constraint optimisation on the set of matrixes (JuMP)
b) unconstraint optimisation on the manifold of SPDs

While I have a personal preference for b) for a few arguments, both are valid approaches.
a) might be a higher-dimensional problem and with the constraints maybe a bit more complicated in the algorithm to use (maybe IPNewton or so?)
b) might be lower dimensional, but the single operations (retractions or the exponential map or parallel transports) might be more involved (than just + in a) – and sometimes maybe have different (for better or worse) numerical errors.

So if you conclude a) works better in your scenario, that is for sure also just fine.

benjione · 2024-08-22T12:15:37Z

Thanks a lot for this information.
Manopt is exactly intriguing for me for working with more data and/or going to higher dimensions. But I think for now I will stick to option a).

kellertuer · 2024-08-22T12:22:15Z

Sure, both are valid ways to go, and it might depend on the actual application, which option performs better. Thanks still for the interest in Manopt.jl :)

benjione closed this as completed Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with quasi-newton and PositiveDefinite matrix #401

Problems with quasi-newton and PositiveDefinite matrix #401

benjione commented Jul 19, 2024

kellertuer commented Jul 19, 2024

mateuszbaran commented Jul 19, 2024

benjione commented Jul 19, 2024

kellertuer commented Jul 20, 2024 •

edited

Loading

mateuszbaran commented Jul 22, 2024

benjione commented Aug 1, 2024

kellertuer commented Aug 1, 2024

benjione commented Aug 22, 2024

kellertuer commented Aug 22, 2024

Problems with quasi-newton and PositiveDefinite matrix #401

Problems with quasi-newton and PositiveDefinite matrix #401

Comments

benjione commented Jul 19, 2024

kellertuer commented Jul 19, 2024

mateuszbaran commented Jul 19, 2024

benjione commented Jul 19, 2024

kellertuer commented Jul 20, 2024 • edited Loading

mateuszbaran commented Jul 22, 2024

benjione commented Aug 1, 2024

kellertuer commented Aug 1, 2024

benjione commented Aug 22, 2024

kellertuer commented Aug 22, 2024

kellertuer commented Jul 20, 2024 •

edited

Loading