This repository includes implementation of following:-
-
Spark DataFrames
- Spark DataFrame Basics
- Spark DataFrame Operations
- Groupby and Aggregate Functions
- Missing Data
- Dates and Timestamps
-
Linear Regression
- Linear Regression with PySpark Example (Car Data)
-
Logistic Regression
- Logistic Regression Example
-
Tree Methods
- Decision Tree and Random Forest Example
-
Clustering
- Clustering Example - Iris Dataset
-
Recommender System
- Recommender Systems and Collaborative Filtering
-
Natural Language Processing
- Introduction to NLP and Naive Bayes Model with examples
- NLP pipelines
-
Spark Streaming
- Introduction to Spark Streaming