Query Frontend: Job weights #4076

joe-elliott · 2024-09-12T16:23:56Z

What this PR does:
The query frontend treats all jobs as the same size when it farms them out to the queriers. This can cause querier instability b/c some jobs actually require quite a bit more resources to execute. By assigning weights to jobs we can reduce the amount each querier is asked to do will hopefully:

reduce querier OOMs/timeouts/retries
reduce querier latency
increase total throughput

Other changes

Removed the roundtripper httpgrpc bridge and pushed the concept of pipeline.Request all the way down into the cortex frontend code. This can be a nice perf improvement b/c translating http -> httpgrpc is costly and we are pushing it to the last moment. Currently for some queries we are translating thousands of jobs and then throwing them away.
Removed redundant parseQuery and createFetchSpansRequest to consolidate on the Compile function in pkg/traceql
Check for context error before going through retry logic in retryWare. This causes retry metrics to be more accurate in the event of many cancelled jobs.

TODO

Fix existing tests
Add tests for two bits of functionality marked PRTODO
Balance weights. Potentially make them configurable.

Testing so far

Setting the trace by ID weight to 2 showed considerable performance improvement over main
The search weight seemed overly tuned. These reduced batch considerably causing an overall lower query latency. We should ease up on these weights.

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Joe Elliott <[email protected]>

zalegrala · 2024-09-18T16:52:08Z

modules/frontend/queue/queue.go

+			}
+			totalWeight += weight
+
+			if totalWeight >= requestedCount {


I think this makes sense. I suppose what we're saying here is that we request of this batch a certain high water mark of work that we're willing to take, and the weight increases the notion of complexity for a single item above this threshold. Implicitly here I suppose is that weight and requestedCount are of the same unit of measure.

Implicitly here I suppose is that weight and requestedCount are of the same unit of measure.

yes! currently all jobs fill a single "slot" in the batch. the "weight" is basically just making it fill more slots.

zalegrala · 2024-09-18T16:55:53Z

modules/frontend/weights/weights.go

+		}
+	}
+
+	if conditions > 4 { // yay, magic!


A fine starting point. I was wonder if each condition is weight++, and maybe each regex is weight+2 or some such. It means for the queue logic that if any condition is present, we'll never consume the entire requested batch. 🤔

modules/frontend/weights/weights.go

modules/frontend/transport/roundtripper.go

…equest

joe-elliott · 2024-09-23T14:08:16Z

modules/frontend/pipeline/async_weight_middleware.go

+	if query.Has("query") {
+		traceQLQuery = query.Get("query")
+	}
+	if traceQLQuery != "" {


nit: for ease of reading i prefer.

if traceQLQuery == "" { req.SetWeight(TraceQLSearchWeight) return } ...

this reduces nesting of the code below and very clearly communicates the logic taken when the query is not found

joe-elliott added 5 commits September 12, 2024 09:28

push down pipelineResponse to the frontend

5d33f7d

Signed-off-by: Joe Elliott <[email protected]>

early exit on ctx.Err()

42e1d11

Signed-off-by: Joe Elliott <[email protected]>

added weights to request

791b6b8

Signed-off-by: Joe Elliott <[email protected]>

remove unused roundtripper

f062280

Signed-off-by: Joe Elliott <[email protected]>

notes cleanup

dde6193

Signed-off-by: Joe Elliott <[email protected]>

joe-elliott requested review from annanay25, mdisibio, mapno, kvrhdn, zalegrala, electron0zero, ie-pham and stoewer as code owners September 12, 2024 16:23

joe-elliott marked this pull request as draft September 12, 2024 16:24

zalegrala reviewed Sep 18, 2024

View reviewed changes

modules/frontend/weights/weights.go Outdated Show resolved Hide resolved

zalegrala reviewed Sep 18, 2024

View reviewed changes

modules/frontend/transport/roundtripper.go Show resolved Hide resolved

javiermolinar and others added 12 commits September 20, 2024 11:03

fix documentation

e1a8363

fix tests

b2c4e4c

added test for weights picking request batches

36d4725

fix ast tests

4b980ca

fix panic in test

d6afbbb

fix another panic

a114967

Add weight test

18fdf16

move weight functionality as a middleware

9535ef9

cleanup

f78838f

more cleanup

0b190b2

rollback some unneded changes

3188260

fix tests

5f294c1

fix traceql errors by propagating the start and end to the fetchspanr…

dd78c7e

…equest

joe-elliott commented Sep 23, 2024

View reviewed changes

add config to disable the feature

2e611e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query Frontend: Job weights #4076

Query Frontend: Job weights #4076

joe-elliott commented Sep 12, 2024

zalegrala Sep 18, 2024

joe-elliott Sep 18, 2024

zalegrala Sep 18, 2024

joe-elliott Sep 23, 2024

Query Frontend: Job weights #4076

Are you sure you want to change the base?

Query Frontend: Job weights #4076

Conversation

joe-elliott commented Sep 12, 2024

zalegrala Sep 18, 2024

Choose a reason for hiding this comment

joe-elliott Sep 18, 2024

Choose a reason for hiding this comment

zalegrala Sep 18, 2024

Choose a reason for hiding this comment

joe-elliott Sep 23, 2024

Choose a reason for hiding this comment