Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant compute consumption on main application thread #39

Open
zspencer opened this issue Jul 27, 2022 · 5 comments
Open

Significant compute consumption on main application thread #39

zspencer opened this issue Jul 27, 2022 · 5 comments

Comments

@zspencer
Copy link
Contributor

Prior to using LabTech, our Dyno Load on Heroku was a gentle 0.8~1.5 for our 1m load average.

Once installing LabTech and running an experiment which accumulated 2.5M observations over the course of a week, our 1m avg dyno load increased to 6~8!

On the one hand, this may mean that we have reached ROFLSCALE and would be better equipped by the scientist package directly.

On the other hand, it seems possible that pulling the complex computations onto the application's Background Job queue could be sufficient to raise the ceiling at which folks need to expand their infrastructure footprint.

It's unlikely I'll take the time to identify and make the adjustment, but maybe @bhaibel will take a stab if we decide to stay on LabTech.

@geeksam
Copy link
Contributor

geeksam commented Jul 27, 2022

Interesting! Are you running your experiments at 100%? And is there any chance you've got some APM data showing where the time is going? (Feel free to email if private.)

@zspencer
Copy link
Contributor Author

No APM traces that we can share, unfortunately, as our APM doesn't get that detailed. We were capturing at 100%, which was definitely part of the problem.

@geeksam
Copy link
Contributor

geeksam commented Jul 29, 2022

I'm not quite sure I understand what you mean by "puliing the complex computations onto the application's Background Job queue". All the examples I've seen for Scientist show the comparison and capture being done inline during the request/response cycle.

Are you thinking about capturing only the results inline, persisting them to the DB with LabTech, and adding a background job to perform the comparisons? Or are you using LabTech in a background job already?

@geeksam
Copy link
Contributor

geeksam commented Jul 29, 2022

Copy/pasting @bhaibel's tweet on the topic for posterity:

I'm about 80% sure that what's happening is that I miswrote our comparison function, guaranteeing 100% mismatches. 100% mismatches on a common read op + 100% experiment-triggered --> way more writes than usual --> app blocks on DB connection ALL THE TIME.

I'd be happy to schedule a pairing session if that's helpful. Otherwise, please let me know what you find!

@zspencer
Copy link
Contributor Author

zspencer commented Aug 1, 2022

Are you thinking about capturing only the results inline, persisting them to the DB with LabTech, and adding a background job to perform the comparisons? Or are you using LabTech in a background job already?

I'm thinking something like that may make the most sense, even though it means we're technically running the control twice; at least the expensive behavior isn't on the main thread of the web request.

But it could also be a horrible idea; so we'll see where, if anywhere, we wind up going with it. Disablin the experiment and decreasing the sample rate was sufficient to prevent request queuing; and it's unlikely that a sample size of 2M+ is ... uhh... that useful for us so we may wind up doing something like setting a default sample rate or a "disable after you hit 10k comparisons" or something similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants