Every technology has that key concept that people struggle to understand. With databases, is which join clause to use for fetching data from multiple tables. Containers are tricky when you have to pick a storage type given some persistence requirements. With Apache Kafka, the struggle is in understanding partittions. Partitions are at the heart that everything Kafka does. They govern aspects like parallelism, storage, and durability. They also affect how much load Kafka can handle. This repository contains the code and instructions for you to dig a little deeper into partitions in Kafka to fully understand how it works and how it affects clusters.
This repository was also used as the backup material for the session The Right Number of Kafka Partitions presented on the Devnexus conference in 2023.
Alternatively, you can read this detailed blog post that explains everything you need to fully understand partitions in Kafka, and the effect they have on the cluster and clients.
For this exercise, you will execute some load testing to figure out the maximum capacity that one Kafka partition can handle in terms of events per second. You will execute a load test to measure the write throughput, and then another load test to measure the read throughput. Once you have this, you will apply your findings in the formula NUM_PARTITIONS = MAX(T/W, T/R)
, where:
T
= your desired number of events per second
W
= maximum write throughput of one partition
R
= maximum read throughput of one partition
- Start two Kafka brokers.
docker compose up -d
💡 By default, the Docker Compose file has only the brokers broker-1
and broker-2
uncommented. Make sure the brokers broker-3
and broker-4
are commented out before proceeding.
- Create a topic with only one partition.
kafka-topics.sh --bootstrap-server localhost:9092 --create --topic load-test --partitions 1 --replication-factor 1
- Execute a load test to measure write throughput.
kafka-producer-perf-test.sh --producer.config config.properties --throughput 100000 --num-records 1000000 --record-size 1024 --topic load-test
💡 In this example, you will be writing 1M
events into the topic, each one with a payload size of 1K
. The idea is to measure a desired write throughput of 100K
events per second.
- Execute a load test to measure read throughput.
kafka-consumer-perf-test.sh --bootstrap-server localhost:9092 --messages 1000000 --topic load-test
💡 In this example, you will be reading 1M
events from the topic.
For this exercise, you will create a Kafka topic with 8
partitions and write some data on it. You will do this using a cluster with only 2
brokers. Then, you will start two additional brokers, forming a cluster with 4
brokers. You will investigate if the partitions have been automatically moved to the new brokers, and if not, you will move them manually to keep the dataset workload evenly spread.
- Start two Kafka brokers.
docker compose up -d
💡 By default, the Docker Compose file has only the brokers broker-1
and broker-2
uncommented. Make sure the brokers broker-3
and broker-4
are commented out before proceeding.
- Create a topic with only eight partitions.
kafka-topics.sh --bootstrap-server localhost:9092 --create --topic test --partitions 8 --replication-factor 1
- Write some random data into the topic.
kafka-producer-perf-test.sh --producer.config config.properties --throughput 1000 --num-records 10000 --record-size 1024 --topic test
-
Edit the Docker Compose file to uncomment the brokers
broker-3
andbroker-4
. -
Start the two new brokers.
docker compose up -d
- Describe the topic details to check the partition assignment.
kafka-topics.sh --bootstrap-server localhost:9092 --topic test --describe
💡 You should see an output like this:
Topic: test TopicId: J-hfYNmHTpqUHGwG41-zJQ PartitionCount: 8 ReplicationFactor: 1 Configs:
Topic: test Partition: 0 Leader: 2 Replicas: 2 Isr: 2
Topic: test Partition: 1 Leader: 1 Replicas: 1 Isr: 1
Topic: test Partition: 2 Leader: 2 Replicas: 2 Isr: 2
Topic: test Partition: 3 Leader: 1 Replicas: 1 Isr: 1
Topic: test Partition: 4 Leader: 1 Replicas: 1 Isr: 1
Topic: test Partition: 5 Leader: 2 Replicas: 2 Isr: 2
Topic: test Partition: 6 Leader: 1 Replicas: 1 Isr: 1
Topic: test Partition: 7 Leader: 2 Replicas: 2 Isr: 2
💡 As you can see in the leader's configuration, all eight partitions are still assigned to the brokers broker-1
and broker-2
even though the cluster now has four brokers.
- Generate a new partition assignment suggestion.
kafka-reassign-partitions.sh --bootstrap-server localhost:9092 --broker-list 1,2,3,4 --topics-to-move-json-file partitions.json --generate
-
Copy the proposed partition reassignment into the file
suggestion.json
. -
Execute the proposed partition reassignment file.
kafka-reassign-partitions.sh --bootstrap-server localhost:9092 --reassignment-json-file suggestion.json --execute
- Describe the topic details to check the partition assignment.
kafka-topics.sh --bootstrap-server localhost:9092 --topic test --describe
💡 You should see an output like this:
Topic: test TopicId: J-hfYNmHTpqUHGwG41-zJQ PartitionCount: 8 ReplicationFactor: 1 Configs:
Topic: test Partition: 0 Leader: 3 Replicas: 3 Isr: 3
Topic: test Partition: 1 Leader: 4 Replicas: 4 Isr: 4
Topic: test Partition: 2 Leader: 1 Replicas: 1 Isr: 1
Topic: test Partition: 3 Leader: 2 Replicas: 2 Isr: 2
Topic: test Partition: 4 Leader: 3 Replicas: 3 Isr: 3
Topic: test Partition: 5 Leader: 4 Replicas: 4 Isr: 4
Topic: test Partition: 6 Leader: 1 Replicas: 1 Isr: 1
Topic: test Partition: 7 Leader: 2 Replicas: 2 Isr: 2
💡 As you can see in the leader's configuration, all eight partitions are now evenly assigned to all brokers.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.