Skip to content

Latest commit

 

History

History
36 lines (25 loc) · 1.18 KB

File metadata and controls

36 lines (25 loc) · 1.18 KB

Running PySpark Streaming

Prerequisite

Ensure your Kafka and Spark services up and running by following the docker setup readme. It is important to create network and volume as described in the document. Therefore please ensure, your volume and network are created correctly

docker volume ls # should list hadoop-distributed-file-system
docker network ls # should list kafka-spark-network 

Running Producer and Consumer

# Run producer
python3 producer.py

# Run consumer with default settings
python3 consumer.py
# Run consumer for specific topic
python3 consumer.py --topic <topic-name>

Running Streaming Script

spark-submit script ensures installation of necessary jars before running the streaming.py

./spark-submit.sh streaming.py 

Additional Resources