This notebook is intended for learning PySpark from beginning to advanced level.
The dataset is a .json file that keeps track of timestamped events of the following actions performed on the digital music service:
- Play a Song
- Login
- Listening to an advertisement
- Downgrading subscription
- Cancelling subscription
There are 3 different sizes of the dataset available:
- mini_sparkify_event_data.json: the smallest instance of the dataset (125 mb)
- medium-sparkify-event-data.json: a medium-sized instance of the dataset (237 mb)
- sparkify_event_data.json: the full dataset (12 gb)
Download Links for Datasets:
- Link to small-sized subset of Sparkify data (125 mb): https://drive.google.com/open?id=1FwuyO5apNwy8q6BpG-_EIqqN-ED0X9tx
- Link to medium-sized subset of Sparkify data (237 mb): https://drive.google.com/open?id=17Lys6v7LOcAWFMHXwXwslUWj04LoNWp1
- Link to full Sparkify dataset on AWS (12 gb): s3n://dsnd-sparkify/sparkify_event_data.json