Significant compute consumption on main application thread #39

zspencer · 2022-07-27T16:33:05Z

Prior to using LabTech, our Dyno Load on Heroku was a gentle 0.8~1.5 for our 1m load average.

Once installing LabTech and running an experiment which accumulated 2.5M observations over the course of a week, our 1m avg dyno load increased to 6~8!

On the one hand, this may mean that we have reached ROFLSCALE and would be better equipped by the scientist package directly.

On the other hand, it seems possible that pulling the complex computations onto the application's Background Job queue could be sufficient to raise the ceiling at which folks need to expand their infrastructure footprint.

It's unlikely I'll take the time to identify and make the adjustment, but maybe @bhaibel will take a stab if we decide to stay on LabTech.

The text was updated successfully, but these errors were encountered:

geeksam · 2022-07-27T19:33:36Z

Interesting! Are you running your experiments at 100%? And is there any chance you've got some APM data showing where the time is going? (Feel free to email if private.)

zspencer · 2022-07-28T17:28:45Z

No APM traces that we can share, unfortunately, as our APM doesn't get that detailed. We were capturing at 100%, which was definitely part of the problem.

geeksam · 2022-07-29T16:53:43Z

I'm not quite sure I understand what you mean by "puliing the complex computations onto the application's Background Job queue". All the examples I've seen for Scientist show the comparison and capture being done inline during the request/response cycle.

Are you thinking about capturing only the results inline, persisting them to the DB with LabTech, and adding a background job to perform the comparisons? Or are you using LabTech in a background job already?

geeksam · 2022-07-29T16:54:10Z

Copy/pasting @bhaibel's tweet on the topic for posterity:

I'm about 80% sure that what's happening is that I miswrote our comparison function, guaranteeing 100% mismatches. 100% mismatches on a common read op + 100% experiment-triggered --> way more writes than usual --> app blocks on DB connection ALL THE TIME.

I'd be happy to schedule a pairing session if that's helpful. Otherwise, please let me know what you find!

zspencer · 2022-08-01T17:47:18Z

Are you thinking about capturing only the results inline, persisting them to the DB with LabTech, and adding a background job to perform the comparisons? Or are you using LabTech in a background job already?

I'm thinking something like that may make the most sense, even though it means we're technically running the control twice; at least the expensive behavior isn't on the main thread of the web request.

But it could also be a horrible idea; so we'll see where, if anywhere, we wind up going with it. Disablin the experiment and decreasing the sample rate was sufficient to prevent request queuing; and it's unlikely that a sample size of 2M+ is ... uhh... that useful for us so we may wind up doing something like setting a default sample rate or a "disable after you hit 10k comparisons" or something similar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant compute consumption on main application thread #39

Significant compute consumption on main application thread #39

zspencer commented Jul 27, 2022

geeksam commented Jul 27, 2022

zspencer commented Jul 28, 2022

geeksam commented Jul 29, 2022

geeksam commented Jul 29, 2022

zspencer commented Aug 1, 2022

Significant compute consumption on main application thread #39

Significant compute consumption on main application thread #39

Comments

zspencer commented Jul 27, 2022

geeksam commented Jul 27, 2022

zspencer commented Jul 28, 2022

geeksam commented Jul 29, 2022

geeksam commented Jul 29, 2022

zspencer commented Aug 1, 2022