Kafka module: query each broker all the partitions it is a leader for #16556

mtojek · 2020-02-25T13:34:25Z

This PR addresses issue reported in #13380 .

Briefly - for the partition metricset:

Get topics metadata
Group partitions per leader broker
Query each broker all the partitions it is a leader for

Meta-issue: #14852

mtojek · 2020-02-26T06:59:03Z

jenkins, test this again please

mtojek · 2020-02-26T09:03:39Z

jenkins, test this again please

mtojek · 2020-02-26T09:36:07Z

jenkins, test this again please

(Jenkins couldn't trigger jobs again...)

jsoriano · 2020-02-26T22:46:34Z

metricbeat/module/kafka/broker.go

+
+	for _, topic := range topics {
+		for _, partition := range topic.Partitions {
+			broker, err := b.client.Leader(topic.Name, partition.ID)


Leader id is available in partition.Leader. We could use this id for the grouping of partitions per broker.

If we continue using b.client.Leader() we have to remember to Close() the returned broker. Maybe we can use b.client.Brokers() to look for the broker per id.

Using the method b.client.Leader(topic, partition) will always return the most actual leader (there might be a metadata update in background, right?).

The method b.client.Brokers() returns brokers without opening connections to them. To establish a connection, I would need a configuration structure: broker.Open(conf *Config). Leader() handles it on its own.

Regarding Close(), I think you're right.

There can still be a problem with this approach, here we are calling Leader for each topic and partition, this may open too many connections to brokers, and we may be leaking connections because we only keep track of one connection per broker in leaderBrokers.

Using the method b.client.Leader(topic, partition) will always return the most actual leader (there might be a metadata update in background, right?).

This is right, but between this moment and the moment we make the offsets request there can still be some metadata change, if we want to solve this for good (not sure if it worths it) we would need to handle leadership errors (second option in #13380).

Answered below.

jsoriano · 2020-02-26T22:47:41Z

metricbeat/module/kafka/broker.go

+			if _, ok := leaderTopicPartition[broker.ID()]; !ok {
+				leaderTopicPartition[broker.ID()] = map[string]int32{}
+			}
+			leaderTopicPartition[broker.ID()][topic.Name] = partition.ID


Is it safe to assume that there cannot be two partitions for the same topic in the same broker? (Probably, but I am not sure about that)

There might be a case in which the number of Kafka brokers is lower than number of Kafka partitions of the same topic, so would rather keep this map as is.

As discussed offline, we may need to list multiple partitions for the same topic in the same broker.

metricbeat/module/kafka/broker.go

jsoriano · 2020-02-27T09:42:06Z

metricbeat/module/kafka/broker.go

+			}
+
+			block := resp.GetBlock(topic, partition)
+			if len(block.Offsets) == 0 {


Should we also check block.Err in any case?

metricbeat/module/kafka/kafka.go

metricbeat/module/kafka/partition/partition.go

jsoriano · 2020-02-27T09:52:47Z

metricbeat/module/kafka/partition/partition.go

+					continue
+				} else if newestPartitionOffsets.Err != nil {
+					msg := fmt.Errorf("failed to query kafka partition (%v:%v) newest offsets: %v",
+						topic.Name, partition.ID, newestPartitionOffsets.Err)


Nit. Extract this common logic for oldest and newest offset to a method?

mtojek · 2020-02-27T13:31:30Z

jenkins, test this again please

metricbeat/module/kafka/broker.go

jsoriano · 2020-02-28T14:33:35Z

metricbeat/module/kafka/broker.go

+
+	for _, topic := range topics {
+		for _, partition := range topic.Partitions {
+			broker, err := b.client.Leader(topic.Name, partition.ID)


There can still be a problem with this approach, here we are calling Leader for each topic and partition, this may open too many connections to brokers, and we may be leaking connections because we only keep track of one connection per broker in leaderBrokers.

Using the method b.client.Leader(topic, partition) will always return the most actual leader (there might be a metadata update in background, right?).

This is right, but between this moment and the moment we make the offsets request there can still be some metadata change, if we want to solve this for good (not sure if it worths it) we would need to handle leadership errors (second option in #13380).

jsoriano · 2020-02-28T14:44:11Z

metricbeat/module/kafka/broker.go

+			if _, ok := leaderTopicPartition[broker.ID()]; !ok {
+				leaderTopicPartition[broker.ID()] = map[string]int32{}
+			}
+			leaderTopicPartition[broker.ID()][topic.Name] = partition.ID


As discussed offline, we may need to list multiple partitions for the same topic in the same broker.

metricbeat/module/kafka/kafka.go

jsoriano · 2020-02-28T14:47:31Z

metricbeat/module/kafka/broker.go

+			req.AddBlock(topic, partition, time, 1)
+		}
+
+		resp, err := leaderBrokers[leader].GetAvailableOffsets(req)


Nit and probably unneeded optimizatiod 😄: we could parallelize requests per leader.

mtojek · 2020-03-02T11:10:26Z

This is right, but between this moment and the moment we make the offsets request there can still be some metadata change, if we want to solve this for good (not sure if it worths it) we would need to handle leadership errors (second option in #13380).

If you consider this solution as incomplete, vulnerable, please don't hesitate to close this PR. Here, I focused on the first option, but maybe the other one might be better.

jsoriano · 2020-03-02T12:17:57Z

metricbeat/module/kafka/broker.go

+		for _, partition := range topic.Partitions {
+			if _, ok := leaderTopicPartition[partition.Leader]; !ok {
+				leaderTopicPartition[partition.Leader] = []topicPartition{}
+			}


Nit. This initialization is not needed, append will initialize it if needed.

jsoriano · 2020-03-04T14:37:50Z

metricbeat/module/kafka/broker.go

+
+	resp, err := broker.GetAvailableOffsets(req)
+	if err != nil {
+		err = fmt.Errorf("get available offsets failed by leader (ID: %d): %v", brokerID, err)


Nit. We could use errors.Wrap to add context to these errors and others added in this PR.

jsoriano · 2020-03-04T14:46:35Z

metricbeat/module/kafka/broker.go

+}
+
+func (b *Broker) queryBrokerForPartitionOffsets(brokerID int32, topicPartitions []topicPartition, time int64) map[string]map[int32]PartitionOffsets {
+	req := new(sarama.OffsetRequest)


Something I am missing here is that in previous implementation we were making a request per replica (the request done in b.PartitionOffset() was also modified with req.SetReplicaID()).
Are we missing the offsets of replicas now? 🤔

It seems that this may work (replicaID = -1 means a leader), with I believe this is getting too complex.

mtojek · 2020-03-04T17:56:39Z

We had an offline discussion about this issue. It's definitely too complex comparing to the gain we may have here. Also, there is still a problem with not fresh metadata, which apparently can be solved only by retries.

Resolving.

Kafka module: query each broker all the partitions it is a leader for

32fdbab

mtojek added Metricbeat Metricbeat Team:Services (Deprecated) Label for the former Integrations-Services team labels Feb 25, 2020

mtojek requested a review from a team February 25, 2020 13:34

mtojek self-assigned this Feb 25, 2020

Fix: can't send queries for oldest and newest offsets together

c4557fe

Fix: build

ee5275b

mtojek marked this pull request as ready for review February 26, 2020 11:25

mtojek changed the title ~~WIP Kafka module: query each broker all the partitions it is a leader for~~ Kafka module: query each broker all the partitions it is a leader for Feb 26, 2020

mtojek requested a review from jsoriano February 26, 2020 11:25

jsoriano reviewed Feb 27, 2020

View reviewed changes

Marcin Tojek added 2 commits February 27, 2020 13:10

Address review comments

67f0cc6

Fix build

d9128f8

mtojek requested review from jsoriano and a team February 27, 2020 12:16

andresrc added [zube]: Inbox [zube]: In Review and removed [zube]: Inbox labels Feb 27, 2020

jsoriano reviewed Feb 28, 2020

View reviewed changes

Marcin Tojek added 2 commits March 2, 2020 08:23

Use slice of partitions

73bf9f3

Refactor querying offsets

bf55b0d

mtojek requested review from jsoriano, kindsun and a team and removed request for kindsun March 2, 2020 11:15

jsoriano reviewed Mar 4, 2020

View reviewed changes

mtojek closed this Mar 4, 2020

zube bot added [zube]: Done and removed [zube]: In Review labels Mar 4, 2020

andresrc removed the [zube]: Done label Mar 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kafka module: query each broker all the partitions it is a leader for #16556

Kafka module: query each broker all the partitions it is a leader for #16556

mtojek commented Feb 25, 2020

mtojek commented Feb 26, 2020

mtojek commented Feb 26, 2020

mtojek commented Feb 26, 2020

jsoriano Feb 26, 2020

mtojek Feb 27, 2020

jsoriano Feb 28, 2020

mtojek Mar 2, 2020

jsoriano Feb 26, 2020

mtojek Feb 27, 2020

jsoriano Feb 28, 2020

mtojek Mar 2, 2020

jsoriano Feb 27, 2020

jsoriano Feb 27, 2020

mtojek Feb 27, 2020

mtojek commented Feb 27, 2020

jsoriano Feb 28, 2020

jsoriano Feb 28, 2020

jsoriano Feb 28, 2020

mtojek Mar 2, 2020

mtojek commented Mar 2, 2020

jsoriano Mar 2, 2020

jsoriano Mar 4, 2020

jsoriano Mar 4, 2020

mtojek Mar 4, 2020

mtojek commented Mar 4, 2020

Kafka module: query each broker all the partitions it is a leader for #16556

Kafka module: query each broker all the partitions it is a leader for #16556

Conversation

mtojek commented Feb 25, 2020

mtojek commented Feb 26, 2020

mtojek commented Feb 26, 2020

mtojek commented Feb 26, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtojek commented Feb 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtojek commented Mar 2, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtojek commented Mar 4, 2020