How to know what are the most expensive queries? #8881

agardiman · 2024-08-02T09:33:13Z

agardiman
Aug 2, 2024

Hi team,
I would like to investigate how to change the query pattern of my users, reducing the most expensive queries.
I would like to focus at the moment on reducing the use of regular expressions on high cardinality labels, with the assumption that these require a longer time to find what series IDs each matcher is matching, since an exact matcher finds the matching IDs a lot faster (O(logN)?) than a regular expression (O(N)?).

I want to analyse the query stats that the query frontend log, for example:

response_time=3.962522957s response_size_bytes=1409 query_wall_time_seconds=41.020050413 fetched_series_count=9498 fetched_chunk_bytes=384322 fetched_chunks_count=9593 fetched_index_bytes=0 sharded_queries=16 split_queries=1 estimated_series_count=9494 queue_time_seconds=0.001254239

Is there anywhere in these stats where I can find the complexity of "finding the IDs that match", as in how hard was to find the IDs that match over the total IDs, before even obtaining the postings and datapoints of the matching series?

Thanks

Answered by dimitarvdimitrov

Aug 7, 2024

for example replace
{hundreds_thousands_cardinality=".*info_1.*info_2"}
with
{label1="info_1",label2="info_2"}

This one in particular would be well welcome. Generally regular expressions with lookbacks (ones with * or +) that are before the end of the query can be very expensive.

Another optimization that we have is prefix matching - db.* is translated to strings.HasPrefix("db") and avoid running a regular expression.

One way to verify the impact of regular expressions is to inspect CPU profiles during these queries.

Perhaps worth noting that it may not always be the complexity of the regular expression, but the number of label values of a label. Matching pod!~"db.*" against 1M strings …

View full answer

dimitarvdimitrov · 2024-08-05T09:04:08Z

dimitarvdimitrov
Aug 5, 2024
Maintainer

this isn't something exposed by either the store-gateway or the ingester today. The store-gateway exports an aggregated metric with the amount of time it takes to resolve the matchers to a set of series IDs (cortex_bucket_store_series_request_stage_duration_seconds_bucket{stage="expand_postings"}), which includes the regular expression matching. But it's not per-query.

regular expression (O(N)?).

There are some optimizations if the regular expression has a prefix, then the lookup will not match against all possible values, but only the ones that match the prefix. In those cases, the cost can be close to O(logN)

I'm not sure what would be the best way to expose this data. Simply recording the time to do regular expression matching might not make much sense if we don't have the full time spent in the storage layer (combined time spent in each ingester & store-gateway).

3 replies

agardiman Aug 6, 2024
Author

Thanks @dimitarvdimitrov, I see. The metric you suggest shows that a good percentage of time is used expanding postings, even tho I don't have a way to know if the regular expression is taking a lot more than exact matching.
What I would like to do is to ask all our users to replace regular expression on high cardinality labels with exact matchers on lower cardinality labels, for example replace
{hundreds_thousands_cardinality=".*info_1.*info_2"}
with
{label1="info_1",label2="info_2"}
Do you think it's worth it?

Would you also know what are other optimisations are done on queries, so that I know that behaviour I can try to target and what it's not worth because it's already optimized?
I'm asking because the user base is quite big and it takes ages to change a behaviour via a mix of asking everyone to change their PromQL queries on dashboards/alerts/automation and programmatically rejecting some of these expensive query patterns. So I want to act only on things that makes a different.
For example if I remember correctly, a regexp like label=~"(A|B|C)" is internally treated like ="A" or ="B" or ="C" and it doesn't need optimisation. The same for label=~".*" or label=~".+". So at the moment I'm planning to add automation that rejects regexps on high cardinality labels but that also are NOT any of the above that are already optimised.

dimitarvdimitrov Aug 7, 2024
Maintainer

for example replace
{hundreds_thousands_cardinality=".*info_1.*info_2"}
with
{label1="info_1",label2="info_2"}

This one in particular would be well welcome. Generally regular expressions with lookbacks (ones with * or +) that are before the end of the query can be very expensive.

Another optimization that we have is prefix matching - db.* is translated to strings.HasPrefix("db") and avoid running a regular expression.

One way to verify the impact of regular expressions is to inspect CPU profiles during these queries.

Perhaps worth noting that it may not always be the complexity of the regular expression, but the number of label values of a label. Matching pod!~"db.*" against 1M strings is still very expensive even though there is no regular expression matching happening. Matching against labels with fewer values is generally cheaper.

The store-gateway has optimizations to not use postings from very large matchers. For example {pod=~"app-1.*", job="app-1"} would try to fully resolve the intersection of the matchers, but only use the list of job and lazily filter out any pod which doesn't match "app-1.*" if it meets one. But you still have to pay the price of matching "app-1.*" against a lot of the values of pod (not all in the store-gateway, only the ones prefixed with app-1). The ingester still doesn't have this optimization and matchers like the above can bring in a lot of postings from disk.

In the ingesters consider enabling and forcing the block-postings-for-matchers-cache and head-postings-for-matchers-cache settings. This should reduce the impact of regular expressions, and series selection in general in the ingester.

Answer selected by agardiman

agardiman Aug 8, 2024
Author

Thank you very much @dimitarvdimitrov. Super useful as always!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to know what are the most expensive queries? #8881

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

How to know what are the most expensive queries? #8881

agardiman Aug 2, 2024

Replies: 1 comment · 3 replies

dimitarvdimitrov Aug 5, 2024 Maintainer

agardiman Aug 6, 2024 Author

dimitarvdimitrov Aug 7, 2024 Maintainer

agardiman Aug 8, 2024 Author

agardiman
Aug 2, 2024

Replies: 1 comment 3 replies

dimitarvdimitrov
Aug 5, 2024
Maintainer

agardiman Aug 6, 2024
Author

dimitarvdimitrov Aug 7, 2024
Maintainer

agardiman Aug 8, 2024
Author