Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Penalty term explainer #275

Open
thorbjornwolf opened this issue Nov 8, 2022 · 2 comments
Open

Docs: Penalty term explainer #275

thorbjornwolf opened this issue Nov 8, 2022 · 2 comments

Comments

@thorbjornwolf
Copy link

thorbjornwolf commented Nov 8, 2022

Super neat library! The API feels very well-designed 🤩

Reading the documentation, I miss a couple of things.
One of them is a central description about what pen is, and a general strategy for setting it or getting the right order of magnitude - or reasoning why no such strategy exists.

Perhaps something like (modified from #271)

The penalty value is a positive float that controls how many changes you want (higher values yield fewer changepoints). Finding a correct value is really dependent on your situation, but as a rule of thumb pen can be initialized to around [rule of thumb] and tweaked up and down from there.

Existing work

In Binseg and sibling models there's this magic incantation:

my_bkps = algo.predict(pen=np.log(n) * dim * sigma**2)

Was it produced with some rule of thumb?
In the advanced usage, kernel article, it is set twice to values 2 OOM apart:

penalty_value = 100  # beta
penalty_value = 1  # beta

The suggestion in #271 for reading an article is fine; what I lack is a paragraph or two somewhere visible. The penalty term seems important enough to be worth it.

@deepcharles
Copy link
Owner

Very good point. I'll the issue open to remind me to add it to the docs (shortly hopefully).

@tg12
Copy link

tg12 commented Oct 19, 2023

I found this to be the rule of thumb.

  penalty_method_dict = {'SIC': p * np.log(time_series_len), 
                           'BIC': p * np.log(time_series_len), 
                           'AIC': p * 2, 
                           'Hannan-Quinn': 2 * p * np.log(np.log(time_series_len))}

@deepcharles Amazing work on the original lib! Great work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants