Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pwlf with unknown line segments #88

Open
GvdDool opened this issue Jun 30, 2021 · 9 comments
Open

pwlf with unknown line segments #88

GvdDool opened this issue Jun 30, 2021 · 9 comments

Comments

@GvdDool
Copy link

GvdDool commented Jun 30, 2021

I am trying to run the BayesianOptimization, and am trying to understand your function def my_obj(x):
-define some penalty parameter l
-you'll have to arbitrarily pick this
-it depends upon the noise in your data, --> how do you check this, and what are acceptable levels
-and the value of your sum of the square of residuals --> how do I find/obtain this number

Could you give some ranges and explain in more detail how the penalty parameter is affecting the results?

Your assistance would be most appreciated and a great help in understanding how the function works

@cjekel
Copy link
Owner

cjekel commented Jul 1, 2021

Penalty parameters generally range from 1e-1 to 1e-6, and yes it's super arbitrary.

If you are looking at automatically performing these fits in a more robust manner, check out this post #17 (comment) where I look for a variance ratio. You probably need at least 20 data points for that variance ratio to work. I think this is a very novel way to automatically fit these models (and I really need to write a paper on this).

So the Bayesian optimization is trying to minimize the sum of square of residuals (mypwlf.ssr) while penalizing the model complexity (number of line segments). As the number of line segments goes to infinity, the sum of square of residuals goes to zero. Also as the number of line segments goes to infinity, the penalty on model complexity also should go to infinity. It's a dance with the devil.

I would just try lambdas = [1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6] and see which one gives you the best visual fit. Do this for a couple cases in your data set, and then just fix the penalty parameter to that value.

@GvdDool
Copy link
Author

GvdDool commented Jul 1, 2021

Thanks Charles,
Knowing the range helps, not that it will help my problem. I managed to run the model with 1e-1, but I fear my set is too noisy to benefit from smaller values. I have daily nighttime light intensities for one location and am trying to fit the piecewise linear function through the data, but the variance is very high.

The function with fixed lines (see below) runs fine up to 4 lines, but introducing more lines is increasing the run time exponential, and setting the maximum elements to 20 takes 1.5hr on my laptop in a Jupyter Notebook.
image

@GvdDool
Copy link
Author

GvdDool commented Jul 1, 2021

I used 12 line elements because this is the first point in the optimised graph, using the suggested 19 doesn't make a visual difference, and the optimising values are very similar (if not identical)
image

@cjekel
Copy link
Owner

cjekel commented Jul 1, 2021

The variance is very high in your case, and you may benefit from trying this #17 (comment) but replace x and y with your own data. It should be biased to use very few line segments. (it should also run much faster than the Bayesian optimization routine).

@GvdDool
Copy link
Author

GvdDool commented Jul 1, 2021

Thanks Charles,
I will check the issue, and compare the results.

One other thing I am going to try is to smooth my data with a 7-day moving average, this will remove most of the noise in the data. I tried this averaging already, to get the data stationary, and the 7 day period gives the best results (clear trend).

The reason I am trying your method is to have the piece-wise linear lines to check if there is a trend change after a known date. I can use the (known) date, but that won't prove that there is a trend change, that will (in my understanding) only show a different trend.

@GvdDool
Copy link
Author

GvdDool commented Jul 1, 2021

This is a view on the smoothed data:
image

The event date is at the beginning of August, but what I was expecting is not the decline before the event; it should have been much more abrupt (in theory), so there is something else happening before the event (likely the COVID-19 confinements are interfering with the NightTime Light intensities in the area)
Best,
Gijs

@GvdDool
Copy link
Author

GvdDool commented Jul 2, 2021

Hi Charles,
Quick update, your method #17 is giving some promising results. The method suggests 2 lines, but I think 3 segments are telling the story better.
image

@cjekel
Copy link
Owner

cjekel commented Jul 2, 2021

What was the F ratio for both cases?

    F = sigma_hat / sigma

Maybe it's better to pick the one that is closest to 1.0, since one over and the other is under.

@trueParadise
Copy link

Hi Charles,
Did you check my pull request? Please let me know, thanks.

Screen Shot 2022-04-08 at 2 16 26 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants