To run the demo, you will need a local kafka cluster, with a broker listening on localhost:9092
. A docker-compose config file is provided, to run a single broker kafka cluster + single node zookeeper.
To start it:
$ docker-compose -f kafka-docker/docker-compose.yml up -d
And to shut it down:
docker-compose -f kafka-docker/docker-compose.yml down
The ports are 2181
for ZooKeeper and 9092
for Kafka
if zookeeper already installed then run sudo service zookeeper stop
Obiously you can use your own local kafka cluster to run the demo, but make sure the version is >= 0.10
(as is needed for the new spark streaming integration), and the advertised hostname is 127.0.0.1
, or just do it the docker-way because it's cooler ;)
To send records to text topic (the one the demo is subsribed to), you can use kafka-console-producer.sh in the kafka broker docker instance.
$ docker exec -i kafkadocker_kafka_1 /opt/kafka/bin/kafka-console-producer.sh \
--broker-list 127.0.0.1:9092 --topic text < input/shakespeare.txt
Once the kafka cluster + zookeeper is up and running, you can start the demo with your IDE of choice importing the maven pom.xml
, or you can build it from shell with maven and launch the generated jar (it's a shaded fat jar including all the dependencies)
$ mvn clean package
$ java -jar target/spark-streaming-kafka-0-10-demo.jar
You can access the Spark UI
with a web broswer at http://localhost:4040
.
Probably the most interesting tab is the Streaming
one, but feel free to look at the Stages
scheduled and other info around.
You can tune how fast your streaming app ingests data with spark.streaming.kafka.maxRatePerPartition
property defined in the SparkConf
object. If you are using the docker approach to create a local kafka cluster, take into consideration that there are 4 partitions in the text
topic.
Also, take a look at auto.offset.reset
and enable.auto.commit
, to tune if commiting offsets automatically to kafka, or where to start on the topic when starting for the first time.
If you found this talk/demo interesting, and love new challenges involving Big Data, I would really like to meet you!
If you are interested, contact me on twitter @iamsathyaraj, or via email [email protected]