Question: Scale up only if two checks are true #784

jaltavilla · 2023-11-29T19:06:20Z

We have some services that consume a limited external resource. There is a metric on how much of that resource is remaining. I would like to be able to write a scaling policy that both scales on service cpu and prevents scaling up beyond the capacity remaining. However, my understanding is that the scale up decision is an or of all checks and uses the max value suggested.

I think the only way to accomplish this goal currently is to set the max count. This is fragile as a change to the external resource needs to also be reflected in the service's max count. It's also more complicated if multiple services all consume the same resource.

Is there some way to accomplish this that I'm missing?

The text was updated successfully, but these errors were encountered:

lgfa29 · 2023-12-22T02:55:34Z

Hi @jaltavilla 👋

The max value sounds like what you need 🤔

Since you have a metric for your resource you may be able to adjust the query instead. For example, with Prometheus you could use the clamp_max function to limit the value of another query result:

query = "clam_max(actual_query_you_want_to_scale, current_resource_metric_value)"

But I haven't tested this and I'm not sure if it works for your use case 😅

jaltavilla · 2024-01-30T22:21:42Z

Thanks, that's a pretty clever idea!

We are using the target strategy, and I sadly I couldn't quite figure out how to make it work with it. I suspect that if we were using the threshold strategy it would be easier. My understanding of the target strategy is that current_resource_metric_value can't return an arbitrary limit. It needs to be in the range [0, target] when the resource is exhausted to allow it to scale down only when actual_query_you_want_to_scale indicates it could. Similarly the range has to be [0, large number>target] when the resource isn't exhausted.

We already clamp_max our query to 200 to limit scaling up on spikes. So we need a function that returns [0, target] or [0, 200] based on whether resources are exhausted. I could transform that into needing an equation that returns 0 or 1 with
query = clamp_max(actual_query_you_want_to_scale, target * one_if_exhausted + 200 * one_if_not_exhausted)

But, I couldn't figure out how to make an equation that returned 0 or 1 with the datadog functions available, so I ended up auditing our services and fixing their max counts.

jaltavilla changed the title ~~Feature Request: Scale up using an and of checks~~ Question: Scale up only if two checks are true Dec 1, 2023

lgfa29 added stage/waiting-reply theme/policy Policy source, parsing and validation type/question labels Dec 22, 2023

lgfa29 self-assigned this Dec 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Scale up only if two checks are true #784

Question: Scale up only if two checks are true #784

jaltavilla commented Nov 29, 2023 •

edited

Loading

lgfa29 commented Dec 22, 2023

jaltavilla commented Jan 30, 2024 •

edited

Loading

Question: Scale up only if two checks are true #784

Question: Scale up only if two checks are true #784

Comments

jaltavilla commented Nov 29, 2023 • edited Loading

lgfa29 commented Dec 22, 2023

jaltavilla commented Jan 30, 2024 • edited Loading

jaltavilla commented Nov 29, 2023 •

edited

Loading

jaltavilla commented Jan 30, 2024 •

edited

Loading