Replies: 1 comment 1 reply
-
On Sun, Jun 5, 2022 at 9:17 AM Jan Kadlec ***@***.***> wrote:
Hi,
I have encountered this, quite unexpected and to me annoying, issue. I
have some points to be fitted by a function and it happened to me several
times that after fitting, I got back the same parameters as I used for
initialisation. Although I understand that this is my fault, the fact that
it went under the radar is quite alarming. What I mean is that if one
changes the initial guess just slightly, the convergence to some params
progresses but as shown below, for some cases it does not. And when I was
checking whether the fit succeeded, it passed (result.success) although
this in reality failed. Besides that, it seems like I'm not the only one
encountering this issue
<https://stackoverflow.com/questions/71443125/lmfit-not-properly-fitting-where-scipy-does-with-same-starting-parameter-values>
.
I have three questions:
1. How is this possible? How can it happen?
Fitting is an iterative process for refining values. The fitting methods
start with your initial parameter values, calculate your model function,
then make small changes (like at the 1.e-7 level) to those value,
calculates your function again, and sees if and by how much that improves
the fit. It is certainly possible to start with values that are so far
from reasonable that the small change makes no difference in the result.
At that point, the algorithm does not know which way to go. If you imagine
data with a Gaussian-like peak centered at x=70000 and with a width of 10
to 20 x-value units, then trying to fit that data with a Gaussian function
with initial values for center and width of 1 and 3, the fit will simply
fail. Those are just absurdly bad initial values.
In fitting, the initial values always matter. By "always", I do not mean
99.9999% of the time. I mean 100% of the time. Initial values are never
optional, there are no reasonable default values. They always depend on
the details of the model or objective function *and* on the data.
So the short answer is that your guess of mu=0.3 is just a bad guess. If
you had plotted your data and your cdf_lognormal() function with your x
values and mu=0.3, sigma=0.3, you would have seen that. Plotting your data
and doing interactive data exploration and analysis is always a good idea.
If you had printed out and read the fit report it would have told you that
the fit made only 3 function evaluations, that uncertainties could not be
estimated, and that both mu and sigma were at their initial values. Reading
the fit report, especially when something seems wrong, is always a good
idea.
The "success" attribute really encodes whether the fitting algorithm thinks
it finished without error. It doesn't really encode whether a fit is
"good". For that, you need to look at many of the values in the fit
report, including things like whether uncertainties could be calculated
and whether there are warnings about parameters being at initial values or
stuck at boundaries. The fit statistics like chi-square can also help
decide whether a fit is "good" but the library does not know how you have
scaled your data. That is, it cannot tell from the value of chi-square
where a fit is good.
1. What to do with this? Imagine having a for loop with different x
and y values and just checking whether the fit spits out something (is
successful), how do I know that it actually did the work and didn't get
stuck like this?
Well, give better initial values. Make representative plots of data and
model. Checking `results.errorbars` will tell you whether uncertainties
were calculated. If that is False, then something probably went wrong.
I strongly recommend reading the fit report.
If you are doing this in a loop, then you probably need to ensure that the
data is somehow "correct" or "as expected", and you might need to adjust
the starting values based on the data itself. It might be that your "x"
arrays are pretty stable, so a single value would work... until it didn't.
1. The last two fits show another issue -- they are different, quite
substantially. I guess that the way around would be to refit a few times
but that is already happening within the Model module, no? So what do
you recommend as best practice?
Yeah, when initial values are very far off, it can sometimes be a
challenge to get stable results. And to be clear, both mu=0.4 and mu=0.5
are terrible guesses. Honestly, I think it is kind of impressive that
they got close.
You have x values that range from 10 to 80, and mu is used as `erfc(-(np.
log(x) - mu)/scale)`. Your log(x) ranges from 2.3 to 4.3. Even mu=1 is a
bad guess, but there will be a small effect at x=10 so that a good fit can
be found. And, yes, I do mean that "1" is a "not great" initial value and
0.3 is very bad. Because you are using log(x) and your data starts at x=10.
Context like that always matters. Using something like "mu=log(x).mean()"
or "mu=log(x.mean())" (ie something in the middle of your data range) would
probably be reasonable and reliable.
One more note -- I would report this as an issue because I think this
should not happen but I was quite puzzled by the message one gets when
trying to submit an issue. What is the correct approach in this case?
We try to use Github Issues for Issues with the code that may
require changes to the code. We don't normally consider bad fits to be
Issues. Well, it might be evidence of an issue, but is not an issue by
itself: it is always possible to find starting values to get a fit to fail
- guessing all values to be -np.Inf usually works, if that is the goal ;).
…--Matt
|
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
jankaWIS
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I have encountered this, quite unexpected and to me annoying, issue. I have some points to be fitted by a function and it happened to me several times that after fitting, I got back the same parameters as I used for initialisation. Although I understand that this is my fault, the fact that it went under the radar is quite alarming. What I mean is that if one changes the initial guess just slightly, the convergence to some params progresses but as shown below, for some cases it does not. And when I was checking whether the fit succeeded, it passed (
result.success
) although this in reality failed. Besides that, it seems like I'm not the only one encountering this issue.I have three questions:
Model
module, no? So what do you recommend as best practice?One more note -- I would report this as an issue because I think this should not happen but I was quite puzzled by the message one gets when trying to submit an issue. What is the correct approach in this case?
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions