Fit returns initial value for some initial values without any error #786

jankaWIS · 2022-06-05T14:17:07Z

jankaWIS
Jun 5, 2022

Hi,

I have encountered this, quite unexpected and to me annoying, issue. I have some points to be fitted by a function and it happened to me several times that after fitting, I got back the same parameters as I used for initialisation. Although I understand that this is my fault, the fact that it went under the radar is quite alarming. What I mean is that if one changes the initial guess just slightly, the convergence to some params progresses but as shown below, for some cases it does not. And when I was checking whether the fit succeeded, it passed (result.success) although this in reality failed. Besides that, it seems like I'm not the only one encountering this issue.

I have three questions:

How is this possible? How can it happen?
What to do with this? Imagine having a for loop with different x and y values and just checking whether the fit spits out something (is successful), how do I know that it actually did the work and didn't get stuck like this?
The last two fits show another issue -- they are different, quite substantially. I guess that the way around would be to refit a few times but that is already happening within the Model module, no? So what do you recommend as best practice?

One more note -- I would report this as an issue because I think this should not happen but I was quite puzzled by the message one gets when trying to submit an issue. What is the correct approach in this case?

import numpy as np

from lmfit import Model
from lmfit import Parameters
from scipy import special

def cdf_lognormal(x, mu, sigma):
    return 0.5*special.erfc(-(np.log(x) - mu) / (np.sqrt(2) * sigma))

y = np.array([0.42027684, 0.58755021, 0.67997272, 0.74193937, 0.78238594, 0.811578  , 0.83274371, 0.85145394])
x = np.array([10, 20, 30, 40, 50, 60, 70, 80])

# initiate the model
gmodel = Model(cdf_lognormal, nan_policy='omit')
params = gmodel.make_params(mu=0.3, sigma=0.3)
# fit
result = gmodel.fit(y, params, x=x)
print(result.success, result.best_values["mu"], result.best_values["sigma"])
# True 0.3 0.3

# fit another set

# initiate the model
gmodel = Model(cdf_lognormal, nan_policy='omit')
params = gmodel.make_params(mu=0.2, sigma=0.3)
result = gmodel.fit(y, params, x=x)
print(result.success, result.best_values["mu"], result.best_values["sigma"])
# True 0.2 0.3

# fit successful model
# initiate the model
gmodel = Model(cdf_lognormal, nan_policy='omit')
params = gmodel.make_params(mu=0.4, sigma=0.3)
result = gmodel.fit(y, params, x=x)
print(result.success, result.best_values["mu"], result.best_values["sigma"])
# True 2.629383845478658 1.6584040818144

# initiate the model
gmodel = Model(cdf_lognormal, nan_policy='omit')
params = gmodel.make_params(mu=0.5, sigma=0.3)
result = gmodel.fit(y, params, x=x)
print(result.success, result.best_values["mu"], result.best_values["sigma"])
# True 2.3027922196253883 0.5163352088995795

Thanks.

Answered by newville

Jun 5, 2022

On Sun, Jun 5, 2022 at 9:17 AM Jan Kadlec ***@***.***> wrote: Hi, I have encountered this, quite unexpected and to me annoying, issue. I have some points to be fitted by a function and it happened to me several times that after fitting, I got back the same parameters as I used for initialisation. Although I understand that this is my fault, the fact that it went under the radar is quite alarming. What I mean is that if one changes the initial guess just slightly, the convergence to some params progresses but as shown below, for some cases it does not. And when I was checking whether the fit succeeded, it passed (result.success) although this in reality failed. Besides that, it seems like …

View full answer

newville · 2022-06-05T19:01:03Z

newville
Jun 5, 2022
Maintainer

On Sun, Jun 5, 2022 at 9:17 AM Jan Kadlec ***@***.***> wrote: Hi, I have encountered this, quite unexpected and to me annoying, issue. I have some points to be fitted by a function and it happened to me several times that after fitting, I got back the same parameters as I used for initialisation. Although I understand that this is my fault, the fact that it went under the radar is quite alarming. What I mean is that if one changes the initial guess just slightly, the convergence to some params progresses but as shown below, for some cases it does not. And when I was checking whether the fit succeeded, it passed (result.success) although this in reality failed. Besides that, it seems like I'm not the only one encountering this issue <https://stackoverflow.com/questions/71443125/lmfit-not-properly-fitting-where-scipy-does-with-same-starting-parameter-values> . I have three questions: 1. How is this possible? How can it happen?

Fitting is an iterative process for refining values. The fitting methods start with your initial parameter values, calculate your model function, then make small changes (like at the 1.e-7 level) to those value, calculates your function again, and sees if and by how much that improves the fit. It is certainly possible to start with values that are so far from reasonable that the small change makes no difference in the result. At that point, the algorithm does not know which way to go. If you imagine data with a Gaussian-like peak centered at x=70000 and with a width of 10 to 20 x-value units, then trying to fit that data with a Gaussian function with initial values for center and width of 1 and 3, the fit will simply fail. Those are just absurdly bad initial values. In fitting, the initial values always matter. By "always", I do not mean 99.9999% of the time. I mean 100% of the time. Initial values are never optional, there are no reasonable default values. They always depend on the details of the model or objective function *and* on the data. So the short answer is that your guess of mu=0.3 is just a bad guess. If you had plotted your data and your cdf_lognormal() function with your x values and mu=0.3, sigma=0.3, you would have seen that. Plotting your data and doing interactive data exploration and analysis is always a good idea. If you had printed out and read the fit report it would have told you that the fit made only 3 function evaluations, that uncertainties could not be estimated, and that both mu and sigma were at their initial values. Reading the fit report, especially when something seems wrong, is always a good idea. The "success" attribute really encodes whether the fitting algorithm thinks it finished without error. It doesn't really encode whether a fit is "good". For that, you need to look at many of the values in the fit report, including things like whether uncertainties could be calculated and whether there are warnings about parameters being at initial values or stuck at boundaries. The fit statistics like chi-square can also help decide whether a fit is "good" but the library does not know how you have scaled your data. That is, it cannot tell from the value of chi-square where a fit is good.

1. What to do with this? Imagine having a for loop with different x and y values and just checking whether the fit spits out something (is successful), how do I know that it actually did the work and didn't get stuck like this?

Well, give better initial values. Make representative plots of data and model. Checking `results.errorbars` will tell you whether uncertainties were calculated. If that is False, then something probably went wrong. I strongly recommend reading the fit report. If you are doing this in a loop, then you probably need to ensure that the data is somehow "correct" or "as expected", and you might need to adjust the starting values based on the data itself. It might be that your "x" arrays are pretty stable, so a single value would work... until it didn't.

1. The last two fits show another issue -- they are different, quite substantially. I guess that the way around would be to refit a few times but that is already happening within the Model module, no? So what do you recommend as best practice?

Yeah, when initial values are very far off, it can sometimes be a challenge to get stable results. And to be clear, both mu=0.4 and mu=0.5 are terrible guesses. Honestly, I think it is kind of impressive that they got close. You have x values that range from 10 to 80, and mu is used as `erfc(-(np. log(x) - mu)/scale)`. Your log(x) ranges from 2.3 to 4.3. Even mu=1 is a bad guess, but there will be a small effect at x=10 so that a good fit can be found. And, yes, I do mean that "1" is a "not great" initial value and 0.3 is very bad. Because you are using log(x) and your data starts at x=10. Context like that always matters. Using something like "mu=log(x).mean()" or "mu=log(x.mean())" (ie something in the middle of your data range) would probably be reasonable and reliable.

One more note -- I would report this as an issue because I think this should not happen but I was quite puzzled by the message one gets when trying to submit an issue. What is the correct approach in this case?

We try to use Github Issues for Issues with the code that may require changes to the code. We don't normally consider bad fits to be Issues. Well, it might be evidence of an issue, but is not an issue by itself: it is always possible to find starting values to get a fit to fail - guessing all values to be -np.Inf usually works, if that is the goal ;).

…

--Matt

1 reply

jankaWIS Jun 6, 2022
Author

Hi @newville and thanks a lot for your detailed answer. Indeed, as we both write and agree, it's my fault with how I was fitting the data. The way how I figured that this is a problem was from the plots. The issue was that if you imagine a loop over many different values, you wouldn't necessarily want to plot all of them to see if the fit is right and how they look. What I was looking for is some sort of a flag, like the success flag, that would bring my attention to the given fit.

About the fit_report -- what I use usually is just the output of the gmodel.fit(), in my case result (where this information does not appear) and/or whatever parameters I am interested in. When I opened up the fit report, I first thought that I mistyped something, only after consultations with the docs I realised that it's a multiline string but that deterred me from using this feature actively so all the warnings there I completely missed. It would be nice if it was possible to somehow check them, export them or just raise them.

It might be that your "x" arrays are pretty stable, so a single value would work... until it didn't.

Yes, indeed, you're right. Usually, the x values are in order of units and tens, this happened when I was testing and simulating what happens when I go beyond the usual numbers. And I just blindly reused a function I wrote before with default init values and didn't think carefully ahead.

Honestly, I think it is kind of impressive that they got close.

:)

Using something like "mu=log(x).mean()" or "mu=log(x.mean())" (ie something in the middle of your data range) would probably be reasonable and reliable.

Thanks a lot, that's a really good way around. I thought of refitting the model but in a case like the one above, it doesn't matter how many times I try to fit it.

We try to use Github Issues for Issues with the code that may require changes to the code. We don't normally consider bad fits to be Issues. Well, it might be evidence of an issue, but is not an issue by itself: it is always possible to find starting values to get a fit to fail - guessing all values to be -np.Inf usually works, if that is the goal ;).

I see. I'm new to lmfit and was a bit surprised by this. Usually, one gets this kind of forums where people share problems with setups and then issues like the one you describe. Initially, I wanted to open an issue with the fact that it didn't raise (noticeable) warnings when the fit gets stuck at the initial values. But I wasn't sure how that would be perceived so I rather wrote more of a question and post type of text to get help. Thanks again for the welcoming answer and help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lmfit

Fit returns initial value for some initial values without any error #786

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

lmfit

Fit returns initial value for some initial values without any error #786

jankaWIS Jun 5, 2022

Replies: 1 comment · 1 reply

newville Jun 5, 2022 Maintainer

jankaWIS Jun 6, 2022 Author

jankaWIS
Jun 5, 2022

Replies: 1 comment 1 reply

newville
Jun 5, 2022
Maintainer

jankaWIS Jun 6, 2022
Author