pwlf with unknown line segments #88

GvdDool · 2021-06-30T21:26:34Z

I am trying to run the BayesianOptimization, and am trying to understand your function def my_obj(x):
-define some penalty parameter l
-you'll have to arbitrarily pick this
-it depends upon the noise in your data, --> how do you check this, and what are acceptable levels
-and the value of your sum of the square of residuals --> how do I find/obtain this number

Could you give some ranges and explain in more detail how the penalty parameter is affecting the results?

Your assistance would be most appreciated and a great help in understanding how the function works

cjekel · 2021-07-01T16:05:34Z

Penalty parameters generally range from 1e-1 to 1e-6, and yes it's super arbitrary.

If you are looking at automatically performing these fits in a more robust manner, check out this post #17 (comment) where I look for a variance ratio. You probably need at least 20 data points for that variance ratio to work. I think this is a very novel way to automatically fit these models (and I really need to write a paper on this).

So the Bayesian optimization is trying to minimize the sum of square of residuals (mypwlf.ssr) while penalizing the model complexity (number of line segments). As the number of line segments goes to infinity, the sum of square of residuals goes to zero. Also as the number of line segments goes to infinity, the penalty on model complexity also should go to infinity. It's a dance with the devil.

I would just try lambdas = [1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6] and see which one gives you the best visual fit. Do this for a couple cases in your data set, and then just fix the penalty parameter to that value.

GvdDool · 2021-07-01T16:19:56Z

Thanks Charles,
Knowing the range helps, not that it will help my problem. I managed to run the model with 1e-1, but I fear my set is too noisy to benefit from smaller values. I have daily nighttime light intensities for one location and am trying to fit the piecewise linear function through the data, but the variance is very high.

The function with fixed lines (see below) runs fine up to 4 lines, but introducing more lines is increasing the run time exponential, and setting the maximum elements to 20 takes 1.5hr on my laptop in a Jupyter Notebook.

GvdDool · 2021-07-01T16:23:31Z

I used 12 line elements because this is the first point in the optimised graph, using the suggested 19 doesn't make a visual difference, and the optimising values are very similar (if not identical)

cjekel · 2021-07-01T17:59:49Z

The variance is very high in your case, and you may benefit from trying this #17 (comment) but replace x and y with your own data. It should be biased to use very few line segments. (it should also run much faster than the Bayesian optimization routine).

GvdDool · 2021-07-01T18:27:06Z

Thanks Charles,
I will check the issue, and compare the results.

One other thing I am going to try is to smooth my data with a 7-day moving average, this will remove most of the noise in the data. I tried this averaging already, to get the data stationary, and the 7 day period gives the best results (clear trend).

The reason I am trying your method is to have the piece-wise linear lines to check if there is a trend change after a known date. I can use the (known) date, but that won't prove that there is a trend change, that will (in my understanding) only show a different trend.

GvdDool · 2021-07-01T18:37:38Z

This is a view on the smoothed data:

The event date is at the beginning of August, but what I was expecting is not the decline before the event; it should have been much more abrupt (in theory), so there is something else happening before the event (likely the COVID-19 confinements are interfering with the NightTime Light intensities in the area)
Best,
Gijs

GvdDool · 2021-07-02T07:23:33Z

Hi Charles,
Quick update, your method #17 is giving some promising results. The method suggests 2 lines, but I think 3 segments are telling the story better.

cjekel · 2021-07-02T16:30:08Z

What was the F ratio for both cases?

    F = sigma_hat / sigma

Maybe it's better to pick the one that is closest to 1.0, since one over and the other is under.

trueParadise · 2022-04-18T03:40:22Z

Hi Charles,
Did you check my pull request? Please let me know, thanks.

cjekel mentioned this issue Apr 18, 2022

add AutoPiecewiseLinFit class #95

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pwlf with unknown line segments #88

pwlf with unknown line segments #88

GvdDool commented Jun 30, 2021 •

edited

Loading

cjekel commented Jul 1, 2021

GvdDool commented Jul 1, 2021

GvdDool commented Jul 1, 2021

cjekel commented Jul 1, 2021

GvdDool commented Jul 1, 2021

GvdDool commented Jul 1, 2021

GvdDool commented Jul 2, 2021

cjekel commented Jul 2, 2021

trueParadise commented Apr 18, 2022

pwlf with unknown line segments #88

pwlf with unknown line segments #88

Comments

GvdDool commented Jun 30, 2021 • edited Loading

cjekel commented Jul 1, 2021

GvdDool commented Jul 1, 2021

GvdDool commented Jul 1, 2021

cjekel commented Jul 1, 2021

GvdDool commented Jul 1, 2021

GvdDool commented Jul 1, 2021

GvdDool commented Jul 2, 2021

cjekel commented Jul 2, 2021

trueParadise commented Apr 18, 2022

GvdDool commented Jun 30, 2021 •

edited

Loading