Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with quasi-newton and PositiveDefinite matrix #401

Closed
benjione opened this issue Jul 19, 2024 · 9 comments
Closed

Problems with quasi-newton and PositiveDefinite matrix #401

benjione opened this issue Jul 19, 2024 · 9 comments

Comments

@benjione
Copy link

I am optimizing over positive definite matrices and with gradient descent everything works quite well. I tried to switch to quasi-Newton and it works most of the time, but sometimes I get a PosDefException in the LineSearch.
I tried to change the Linesearch, but also with AdaptiveWNGradient I get it sometimes and I run in similar problems with a constant stepsize.

I could imagine that this is because the eigenvalues are too close to 0 and cholesky fails, even though the matrix is not technically SDP yet.
Does someone have any insights on this issue?

@kellertuer
Copy link
Member

Without an MWE (minimal (non-)working example), or at least an error message where that appears, I can only ask more questions.

@mateuszbaran
Copy link
Member

To start with, could you attach the stack trace for that exception? Maybe there is a relatively simple solution. Our SPD implementation already does some numerical stabilization so maybe we missed a few spots where it would be beneficial to do it.

@benjione
Copy link
Author

I will check to come up with a minimum working example. Currently the code I am using is here .
I am optimizing functions f_A(x) = v(x)^T A v(x) with A being positive definite using different divergences. I got the error for quasi Newton so far with KL divergence and chi2 divergence, but with gradient descent it works, it just does not converge fast and I stop it before fully converged. Also, I do not get the error everytime, maybe every 10th time.

  • the objective is smooth (in the sense of being differentiable)
  • It is convex, but I don't know if it is geodetically convex, I will read up on it.
  • I used the WolfePowellLinesearch before
  • the sizes I tested with where 21x21 matrices
  • I am using the default AffineInvariantMetric metric
  • I use the default update rule, I believe this is BFGS
  • I am not using the cautious update, I will try that

The stacktrace is here: stacktrace.txt

@kellertuer
Copy link
Member

kellertuer commented Jul 20, 2024

v^TAv itself would be (locally, geodesicallz) convex, but in such long code I do not directly see how var you deviate from just that, so of course sums of these are also fine.

Matrix size is okay, it is not too large.

Concerning the point with the “stops before converged” for Gradient descent – why does it stop? (you can set debug=[:Stop] then it tells you that). You could also modify the stopping criterion, the default is a trade-off between not too many iterations and reasonably close to a minimiser (stopping_criterion=StopAfterIteration(200) | StopWhenGradientNormLess(1e-8)).

At first glance from the error (distance line 73 on the input q) it seems your q is already nonSPD at that point (otherwise cholesky would not complain). so we would have to narrow down why that is the case.

In such a long code I also do not directly see KL and chi ;) But I can try to find time in the evening today and take a closer look.

edit: I narrowed it further down, it seems to be the locking condition check (that is not called when alpha is zero) – I can only check this further on a concrete (failing) example, but it might be that we already need a tolerance for the alpha. though interesting that this did not cause problems until now.

@mateuszbaran
Copy link
Member

I tried figuring it out too but I can't tell where the numerical error could have been introduced.

  • I use the default update rule, I believe this is BFGS

The default one is QuasiNewtonLimitedMemoryDirectionUpdate. Its default memory size is 20, but for my problems it usually worked best with lower memory sizes (between 5 and 10). Lower memory size also means fewer potential sources of numerical errors so it might help. You can just pass memory_size argument to quasi_Newton to try it.

@benjione
Copy link
Author

benjione commented Aug 1, 2024

Sorry for the late answer.

  • Even after 10000 iterations, gradient descent stops because of the iteration limit.
  • I reduced the memory size to 5 and I only got the error again once (of maybe 100 runs). Increasing it to 100, the problem appears much more often.
  • The cautious update did not make a difference
  • I did not come up with a working simple example yet. But I figured out that the problem mostly appears with complex distributions I try to estimate (I am using an iterative estimation approach, where this can get quite complex). Also, when adding regularization to the matrix, the problem appears less often.

So finally I think this is mostly a problem in numerical errors of my loss formulation, but I also guess that Manopt should keep the matrix positive definite even if there are numerical errors (?).
With JuMP, I did not had these problems so far.

I will try to get a simple example which is failing running and check for numerical problems in my formulation. But I am on holidays the next two weeks, so it will take some time.

@kellertuer
Copy link
Member

Thanks for coming back to this

So finally I think this is mostly a problem in numerical errors of my loss formulation, but I also guess that Manopt should keep the matrix positive definite even if there are numerical errors (?).
With JuMP, I did not had these problems so far.

I have two questions on this.

  1. How do you expect the positive definiteness should be performed? We can not “project back” to the manifold, because the set of SPD matrices is open (arbitrary small but positive eigenvalues are still fine).
    So if we encounter zero or negative eigenvalues, I would only be able to offer to error (or warn) – for that you could check out the ValidationManifold at https://juliamanifolds.github.io/ManifoldsBase.jl/stable/manifolds/#A-manifold-for-validation

But if you have an idea how one could keep them SPD, sure, we can check how we can introduce that.

  1. Comparison to JuMP:
    Can you be a bit more specific what you are comparing to?
    Besides that Manopt can be used within JuMP, I assume you do some kind of constraint optimisation?
    Then you are comparing apples and oranges:

a) Constraint optimisation on the set of matrixes (JuMP)
b) unconstraint optimisation on the manifold of SPDs

While I have a personal preference for b) for a few arguments, both are valid approaches.
a) might be a higher-dimensional problem and with the constraints maybe a bit more complicated in the algorithm to use (maybe IPNewton or so?)
b) might be lower dimensional, but the single operations (retractions or the exponential map or parallel transports) might be more involved (than just + in a) – and sometimes maybe have different (for better or worse) numerical errors.

So if you conclude a) works better in your scenario, that is for sure also just fine.

@benjione
Copy link
Author

Thanks a lot for this information.
Manopt is exactly intriguing for me for working with more data and/or going to higher dimensions. But I think for now I will stick to option a).

@kellertuer
Copy link
Member

Sure, both are valid ways to go, and it might depend on the actual application, which option performs better. Thanks still for the interest in Manopt.jl :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants