Spark-Structured-Streaming

A Twitter STreaming application that continuously pulls tweets relating to the topic WorldCup2022/Qatar World Cup from Twitter API, processes the output into a Kafka topic that is then consumed by Spark, transformations applied and the resultant data pushed to a SnowFlake sink database.

The project builds on to my learnings on building Structured STreaming pipelines using Spark + Kafka

How to get started with the project

Install SBT as per the guidelines provided here
Clone this repo into your workspace
Add your Twitter Developer account credentials to the file twitter.cfg
Add your SnowFlake User credentials to the StreamHandler.scala file
Start Kafka with the command sudo docker-compose up -d
In your editor run the command sbt to access the sbt shell
Inside sbt shell type compile to package the project
Next type the command run
Run app.py script

Head over to your SnowFlake table and confirm that data is being added to your database

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.bsp		.bsp
.github/workflows		.github/workflows
src/main/scala/snowflake		src/main/scala/snowflake
.gitignore		.gitignore
README.md		README.md
app.py		app.py
app_consumer.py		app_consumer.py
build.sbt		build.sbt
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark-Structured-Streaming

How to get started with the project

About

Releases

Packages

Languages

TitoLulu/WorldCup-Twitter-Sentiment-Tracker

Folders and files

Latest commit

History

Repository files navigation

Spark-Structured-Streaming

How to get started with the project

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages