Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for getCyclopsProfileLogLikelihood #75

Closed
azimov opened this issue Jun 14, 2024 · 6 comments
Closed

Add option for getCyclopsProfileLogLikelihood #75

azimov opened this issue Jun 14, 2024 · 6 comments

Comments

@azimov
Copy link

azimov commented Jun 14, 2024

In long running jobs we are finding that Cyclops getCyclopsProfileLogLikelihood runs for many hours and often produces the warning:

WARN Cyclops getCyclopsProfileLogLikelihood Coefficient drift detected. Resetting Cyclops object and recomputing all likelihood values computed so far.

It seems that this will repeat an almost identical process of failing to find the minima from an initial starting position exactly 10 times. In the tasks we're executing we don't really have the luxury of tweaking other parameters to allow this to converge. Instead, it would be good to set this to a lower parameter with a configurable option (e.g. something Cyclops.LogLikelihood.MaxRetry).

@msuchard
Copy link
Member

@azimov -- how's this (fa3433d) ?

@schuemie
Copy link
Member

Just to be clear: these aren't retries, it is recomputing likelihood at prior points, but not re-doing the positioning of those points. In other words: it will eventually get there, and the result is not guaranteed to be non-informative just because we detected coefficient drift.

In general, in OHDSI we tend to mostly value getting the 'right' answer, while the amount of compute it takes to get there is of secondary concern. @azimov seems to suggest that there is point where we don't want to spend more compute to get the right answer, and that this is the point. I don't share that opinion.

Don't get me wrong: I think computing profiles for SCCS takes way too much time, and would like a more efficient solution. But until we have that, I don't think the solution is: don't compute profiles when it takes more than some arbitrary threshold.

@schuemie
Copy link
Member

schuemie commented Jul 1, 2024

@msuchard : Could you change the code you committed so the default value for maxRetries = Inf? That way, at least the code will be default behave the way it did. It also means I don't have to now change CohortMethod and SelfControlledCaseSeries to set maxRetries = Inf.

If we are going to add this argument, could we also give it a more accurate name, like maxCorrections?

@msuchard
Copy link
Member

msuchard commented Jul 1, 2024

hmmm ... i do not believe i changed the default behavior, but my apologies if i did:

7cda607 (old line 919)

am going to avoid making any more changes until @schuemie and @azimov come to an agreement on best behavior.

@schuemie
Copy link
Member

Ah, I had forgotten I already set a maxResets inside the function. I'll push a change where the new function argument is renamed to maxResets. I think @azimov has agreed to let me make the final call.

@azimov
Copy link
Author

azimov commented Jul 16, 2024

Yes @msuchard - I defer to Martijn on this. It seems to be a single edge case that caused my issue and we should probably work on improving precision to resolve that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants