Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with convergence when using PRAXIS #723

Open
liamlundy opened this issue Sep 29, 2023 · 4 comments
Open

Issue with convergence when using PRAXIS #723

liamlundy opened this issue Sep 29, 2023 · 4 comments

Comments

@liamlundy
Copy link

Models that are run with PRAXIS as the optimizer will not terminate. In the simple example below, the model will spin indefinitely but if manually stopped, it will return the expected coefficients. BOBYQA works correctly in this case.

Version Info:

Julia Version 1.9.1
Commit 147bdf428cd (2023-06-07 08:27 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 10 × Apple M1 Pro
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, westmere)
  Threads: 1 on 10 virtual cores
Environment:
  JULIA_PROJECT = /Users/liam/Development/attribution-engine/ma-model/SignConstrainedMixedModels

Example code:

using MixedModels, DataFrames, Random

# Arrange
id1 = Int[1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4]
var1 = Float64[
    -0.968,
    -1.568,
    -4.101,
    2.616,
    2.658,
    1.352,
    3.836,
    -0.234,
    -1.257,
    -0.482,
    3.810,
    -2.812,
    1.382,
    -4.764,
    3.977,
    1.302,
    2.186,
    -4.237,
    4.803,
    -1.026,
]
var2 = Float64[
    2.633,
    2.565,
    2.685,
    2.661,
    0.048,
    1.285,
    2.127,
    1.687,
    1.542,
    0.932,
    1.650,
    2.239,
    0.239,
    0.247,
    0.989,
    1.800,
    0.667,
    1.412,
    2.497,
    2.004,
]
var3 = Float64[
    -4.805,
    -4.048,
    -2.160,
    -0.233,
    -2.655,
    -2.062,
    -0.395,
    -2.706,
    -3.509,
    -2.800,
    -2.927,
    -1.843,
    -2.401,
    -3.332,
    -1.413,
    -4.351,
    -4.917,
    -0.977,
    -0.303,
    -1.072,
]
vol = [
    0.25 * var1[1:5] + 0.3 * var2[1:5] + -0.6 * var3[1:5]
    0.15 * var1[6:10] + 0.3 * var2[6:10] + -0.6 * var3[6:10]
    0.05 * var1[11:15] + 0.3 * var2[11:15] + -0.6 * var3[11:15]
    -0.05 * var1[16:20] + 0.3 * var2[16:20] + -0.6 * var3[16:20]
]

noise = rand(MersenneTwister(5234325), size(vol, 1))
vol = vol .+ noise
data = DataFrames.DataFrame(id1 = id1, var1 = var1, var2 = var2, var3 = var3, vol = vol)

f = @formula(vol ~ 1 + var1 + var3 + var2 + (0 + var1 | id1))

model = LinearMixedModel(f, data)
model.optsum.optimizer = :LN_PRAXIS
model_fit = fit!(model)
@palday
Copy link
Member

palday commented Oct 3, 2023

Thanks for proving an MWE!

Looking at the NLopt docs, I'm not sure PRAXIS is an algorithm I'd recommend:

This algorithm was originally designed for unconstrained optimization. In NLopt, bound constraints are "implemented" in PRAXIS by the simple expedient of returning infinity (Inf) when the constraints are violated (this is done automatically—you don't have to do this in your own function). This seems to work, more-or-less, but appears to slow convergence significantly. If you have bound constraints, you are probably better off using COBYLA or BOBYQA.

The optimization problem here is indeed bounded (we don't allow variances to be negative). The :LN_COBYLA and :LN_NELDERMEAD converge for this example as well and in my limited testing seem to also work well on cases where BOBYQA fails.

@liamlundy
Copy link
Author

@palday thanks for getting back to me about this! I definitely appreciate your work maintaining this project. For our use case, we use BOBYQA for all of our final models, but use PRAXIS for preliminary testing of variables. Our datasets can become very large and PRAXIS outperforms BOBYQA by a wide margin in terms of performance (at least in our test cases). Do you know if there is a way to get PRAXIS working by modifying the stopping condition?

@palday
Copy link
Member

palday commented Oct 25, 2023

The problem in the MWE is that the optimizer winds up oscillating around the optimum.

           ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀objective⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ 
           ┌────────────────────────────────────────┐ 
   11.0467 │⠀⠀⢰⠀⠀⡖⢲⠀⠀⡆⠀⢠⡆⠀⢰⠒⡆⠀⢰⠀⠀⡆⠀⢠⠒⡆⠀⢠⡆⠀⢰⠀⠀⡖⢲⠀⠀⡆⠀⢠│ 
           │⠀⠀⢸⠀⠀⡇⢸⠀⠀⡇⠀⢸⡇⠀⢸⠀⡇⠀⢸⠀⠀⡇⠀⢸⠀⡇⠀⢸⡇⠀⢸⠀⠀⡇⢸⠀⠀⡇⠀⢸│ 
           │⠀⠀⣾⠀⠀⡇⢸⠀⠀⣷⠀⢸⡇⠀⡎⠀⡇⠀⣾⠀⠀⣷⠀⢸⠀⢱⠀⢸⡇⠀⣾⠀⠀⡇⢸⠀⠀⣷⠀⢸│ 
           │⠀⠀⣿⠀⠀⡇⢸⠀⠀⣿⠀⢸⡇⠀⡇⠀⡇⠀⣿⠀⠀⣿⠀⢸⠀⢸⠀⢸⡇⠀⣿⠀⠀⡇⢸⠀⠀⣿⠀⢸│ 
           │⠀⠀⡇⡇⢸⠀⠀⡇⢸⢸⠀⢸⡇⠀⡇⠀⡇⠀⡇⡇⢸⢸⠀⢸⠀⢸⠀⢸⡇⠀⡇⡇⢸⠀⠀⡇⢸⢸⠀⢸│ 
           │⠀⠀⡇⡇⢸⠀⠀⡇⢸⢸⠀⡸⢇⠀⡇⠀⢇⠀⡇⡇⢸⢸⠀⡸⠀⢸⠀⡸⢇⠀⡇⡇⢸⠀⠀⡇⢸⢸⠀⡸│ 
           │⠀⠀⡇⡇⢸⠀⠀⡇⢸⢸⠀⡇⢸⠀⡇⠀⢸⠀⡇⡇⢸⢸⠀⡇⠀⢸⠀⡇⢸⠀⡇⡇⢸⠀⠀⡇⢸⢸⠀⡇│ 
           │⠀⢠⠃⡇⢸⠀⠀⡇⢸⠘⡄⡇⢸⢠⠃⠀⢸⢠⠃⡇⢸⠘⡄⡇⠀⠘⡄⡇⢸⢠⠃⡇⢸⠀⠀⡇⢸⠘⡄⡇│ 
           │⠀⢸⠀⡇⢸⠀⠀⡇⢸⠀⡇⡇⢸⢸⠀⠀⢸⢸⠀⡇⢸⠀⡇⡇⠀⠀⡇⡇⢸⢸⠀⡇⢸⠀⠀⡇⢸⠀⡇⡇│ 
           │⠀⢸⠀⢱⡎⠀⠀⢱⡎⠀⡇⡇⢸⢸⠀⠀⢸⢸⠀⢱⡎⠀⡇⡇⠀⠀⡇⡇⢸⢸⠀⢱⡎⠀⠀⢱⡎⠀⡇⡇│ 
           │⠀⢸⠀⢸⡇⠀⠀⢸⡇⠀⡇⡇⢸⢸⠀⠀⢸⢸⠀⢸⡇⠀⡇⡇⠀⠀⡇⡇⢸⢸⠀⢸⡇⠀⠀⢸⡇⠀⡇⡇│ 
           │⠀⢸⠀⢸⡇⠀⠀⢸⡇⠀⣿⠀⠀⣿⠀⠀⠀⣿⠀⢸⡇⠀⣿⠀⠀⠀⣿⠀⠀⣿⠀⢸⡇⠀⠀⢸⡇⠀⣿⠀│ 
           │⠀⡸⠀⢸⡇⠀⠀⢸⡇⠀⢿⠀⠀⡿⠀⠀⠀⡿⠀⢸⡇⠀⢿⠀⠀⠀⢿⠀⠀⡿⠀⢸⡇⠀⠀⢸⡇⠀⢿⠀│ 
           │⠀⡇⠀⢸⡇⠀⠀⢸⡇⠀⢸⠀⠀⡇⠀⠀⠀⡇⠀⢸⡇⠀⢸⠀⠀⠀⢸⠀⠀⡇⠀⢸⡇⠀⠀⢸⡇⠀⢸⠀│ 
   11.0467 │⠀⠇⠀⠘⠇⠀⠀⠘⠇⠀⠸⠀⠀⠇⠀⠀⠀⠇⠀⠘⠇⠀⠸⠀⠀⠀⠸⠀⠀⠇⠀⠘⠇⠀⠀⠘⠇⠀⠸⠀│ 
           └────────────────────────────────────────┘ 
           ⠀0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀30⠀ 

          ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀theta⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ 
          ┌────────────────────────────────────────┐ 
    0.552 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
          │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
          │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
          │⠀⠀⠀⠀⠀⡄⠀⠀⠀⠀⠀⠀⠀⠀⢠⠀⠀⠀⠀⠀⠀⠀⠀⢀⡄⠀⠀⠀⠀⠀⠀⠀⠀⡄⠀⠀⠀⠀⠀⠀│ 
          │⠀⠀⠀⠀⢠⢣⠀⠀⠀⠀⠀⠀⠀⠀⣾⠀⠀⠀⠀⠀⠀⠀⠀⢸⡇⠀⠀⠀⠀⠀⠀⠀⢠⢣⠀⠀⠀⠀⠀⠀│ 
          │⠀⠀⠀⠀⡸⢸⠀⠀⠀⠀⠀⠀⠀⢠⠋⡆⠀⠀⠀⠀⠀⠀⠀⡇⡇⠀⠀⠀⠀⠀⠀⠀⡸⢸⠀⠀⠀⠀⠀⠀│ 
          │⠀⠀⠀⠀⡇⢸⠀⠀⠀⠀⠀⠀⠀⡸⠀⡇⠀⠀⠀⠀⠀⠀⢰⠁⢱⠀⠀⠀⠀⠀⠀⠀⡇⢸⠀⠀⠀⠀⠀⠀│ 
          │⠀⢣⠀⢸⠁⠀⡇⢸⡇⠀⣼⠀⢀⠇⠀⢇⢀⢧⠀⢸⡇⠀⡜⠀⢸⠀⣼⠀⢀⢧⠀⢸⠁⠀⡇⢸⡇⠀⣼⠀│ 
          │⠀⠸⡀⡎⠀⠀⡇⡎⢣⢀⠇⡇⢸⠀⠀⢸⢸⠸⡀⡎⢣⢀⠇⠀⠸⣀⠇⡇⢸⠸⡀⡎⠀⠀⡇⡎⢣⢀⠇⡇│ 
          │⠀⠀⣧⠃⠀⠀⢣⠃⠸⣸⠀⢱⡇⠀⠀⢸⡇⠀⣧⠃⠸⣸⠀⠀⠀⣿⠀⢱⡇⠀⣧⠃⠀⠀⢣⠃⠸⣸⠀⢱│ 
          │⠀⠀⠘⠀⠀⠀⠘⠀⠀⠃⠀⠘⠃⠀⠀⠈⠃⠀⠘⠀⠀⠃⠀⠀⠀⠃⠀⠘⠃⠀⠘⠀⠀⠀⠘⠀⠀⠃⠀⠘│ 
          │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
          │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
          │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
   0.5517 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
          └────────────────────────────────────────┘ 
          ⠀0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀30⠀ 

Examining the change in values, I get

julia> minimum(abs, diff(obj))
2.0605739337042905e-13

julia> minimum(abs, diff(theta))
1.0706571629270911e-7

This suggests that you could achieve convergence either by setting in the optsum either xtol_abs = [1e-6] or ftol_abs = 1e-12. (See here for the list of settings you can change.) Since you're using this for an initial exploration, the logic would be similar to the motivation for the reduced precision bootstrap.

However, setting this for :LN_PRAXIS seems to have no effect. I have no idea why. maxfeval seems to work though, but a good value for that will depend on your real data.

@dmbates
Copy link
Collaborator

dmbates commented Feb 7, 2024

To follow up on this issue and #742 , using PRIMA.bobyqa produces

julia> x, info = bobyqa(objective!(model), model.optsum.initial; xl=model.optsum.lowerbd)
([0.5518592883381098], PRIMA.Info(11.04666952249386, 15, PRIMA.SMALL_TR_RADIUS, 0.0, Float64[], Float64[]))

In other words, it took only 15 evaluations to converge to what seems to be a good optimum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants