A Twitter STreaming application that continuously pulls tweets relating to the topic WorldCup2022/Qatar World Cup from Twitter API, processes the output into a Kafka topic that is then consumed by Spark, transformations applied and the resultant data pushed to a SnowFlake sink database.
The project builds on to my learnings on building Structured STreaming pipelines using Spark + Kafka
- Install SBT as per the guidelines provided here
- Clone this repo into your workspace
- Add your Twitter Developer account credentials to the file twitter.cfg
- Add your SnowFlake User credentials to the StreamHandler.scala file
- Start Kafka with the command sudo docker-compose up -d
- In your editor run the command sbt to access the sbt shell
- Inside sbt shell type compile to package the project
- Next type the command run
- Run app.py script
Head over to your SnowFlake table and confirm that data is being added to your database