WIP: [Rhythm] Block-builder consumption loop #4480

mdisibio · 2024-12-19T16:05:20Z

What this PR does:
This is an alternate block-builder consumption loop that I think has benefits and would like to get feedback on.

The curent loop can be thought of as top-down. It calculates the total time range that the block builder is lagging, splits it into smaller sections (i.e. 5 minutes), consume/flush/commit each section.

This new loop is: while more data, start at last commit and consume/flush/commit another chunk of data (i.e. 5 minutes).

The benefits are:

less state - Each consume/flush/commit cycle is independent, doesn't require knowledge of the overall state of the queue, or side effects from previous loops
fewer kafka apis are involved, (i.e. no CalculateGroupLag)
doesn't require custom commit metadata
I think core loop is simpler and will make it easier to iterate. For example there is a TODO to round-robin when the block-builder is assigned to multiple partitions and they are all lagging. I think this is more complex to try in the current design.

The drawbacks are:

Mainly around how we want to metric "lag". The current metric is number of messages, but I think finding a way to determine length of time (i.e. "the block-builder is 15 minutes behind") is more useful. However you have to read a record to determine that, so this needs more work.

TODO

This is a draft, and needs more finishing touches before it can be merged. What I want is to think about the overall loop structure. Logs/cleanup/finishing touches would be added before any merge.
Disregard the test failures, they are because the "fake" kafka is not mimicing real behavior. Updated the first test TestBlockbuilder_lookbackOnNoCommit to show an example of how it can be fixed. Specifically, it needs to run with consumer group support, and remove:

k.ControlKey(kmsg.OffsetCommit.Int16(), func(kmsg.Request) (kmsg.Response, error, bool) {
		kafkaCommits.Add(1)
		return nil, nil, false
	})

because this means the commits are not actually committed.

Which issue(s) this PR fixes:
Fixes #

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

…to work using real consumergroup functionality

javiermolinar

Looks good!

javiermolinar · 2024-12-20T11:34:35Z

modules/blockbuilder/blockbuilder.go

+		case <-time.After(waitTime):
+			err := b.consume(ctx)
+			if err != nil {
+				b.logger.Log("msg", "consumeCycle failed", "err", err)


Suggested change

b.logger.Log("msg", "consumeCycle failed", "err", err)

level.Error(b.logger).Log("msg", "consumeCycle failed", "err", err)

javiermolinar · 2024-12-20T11:41:03Z

modules/blockbuilder/blockbuilder.go

+		return false, err
+	}
+
+	lastCommit, ok := commits.Lookup(topic, partition)


Suggested change

lastCommit, ok := commits.Lookup(topic, partition)

lastCommit, exists := commits.Lookup(topic, partition)

if exists && lastCommit.At >= 0 {

startOffset = startOffset.At(lastCommit.At)

} else {

startOffset = kgo.NewOffset().AtStart()

}

https://pkg.go.dev/github.com/twmb/franz-go/pkg/[email protected]#OffsetResponses.Lookup

javiermolinar · 2024-12-20T11:53:37Z

modules/blockbuilder/blockbuilder.go

+			}
+
+			err := b.pushTraces(rec.Key, rec.Value, writer)
+			if err != nil {


What will happen if something is wrong with the WAL? I guess it will enter in a loop

mdisibio added 4 commits December 12, 2024 18:08

Alternate block-builder consume

9ccb477

Set timeout on PollFetches, reduce initial poll delay, update 1 test …

4fa3d19

…to work using real consumergroup functionality

restore metrics

7813692

Merge branch 'main-rhythm' into bb2

949343f

javiermolinar reviewed Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: [Rhythm] Block-builder consumption loop #4480

WIP: [Rhythm] Block-builder consumption loop #4480

mdisibio commented Dec 19, 2024

javiermolinar left a comment

javiermolinar Dec 20, 2024

javiermolinar Dec 20, 2024

javiermolinar Dec 20, 2024

	b.logger.Log("msg", "consumeCycle failed", "err", err)
	level.Error(b.logger).Log("msg", "consumeCycle failed", "err", err)

-	lastCommit, ok := commits.Lookup(topic, partition)
+	lastCommit, exists := commits.Lookup(topic, partition)
+	if exists && lastCommit.At >= 0 {
+		startOffset = startOffset.At(lastCommit.At)
+	} else {
+		startOffset = kgo.NewOffset().AtStart()
+	}

WIP: [Rhythm] Block-builder consumption loop #4480

Are you sure you want to change the base?

WIP: [Rhythm] Block-builder consumption loop #4480

Conversation

mdisibio commented Dec 19, 2024

javiermolinar left a comment

Choose a reason for hiding this comment

javiermolinar Dec 20, 2024

Choose a reason for hiding this comment

javiermolinar Dec 20, 2024

Choose a reason for hiding this comment

javiermolinar Dec 20, 2024

Choose a reason for hiding this comment