Skip to content

Latest commit

 

History

History
33 lines (21 loc) · 2.71 KB

README.md

File metadata and controls

33 lines (21 loc) · 2.71 KB

Big Data Analytics with Python

The material in this repository was presented at a training workshop in Dakar, Senegal on July 1 to July 5, 2019. The training was organized by The African Institute for Mathematical Sciences (AIMS).

Course Outline and Goals

The goal of the course is to introduce participants to the use of Python to perfom data science tasks such as data ingestion, data analysis and machine learning with focus on processing of large scale datasets. This course is different from regular online courses as it uses real life datasets and case studies to challenge participants with real world data science problems, instead of solving toy problems. The structure of the course is as follows:

  • Day 1: Introduction to Python
  • Day 2: Python for Data Science
  • Day 3: Big Data Processign with Pyspark
  • Day 4: Machine Learning in Python
  • Day 5: Big Data and ML Case Studies

Repository Setup

The materials are organised into folders by day. All the code live in the src folder. Due to large size of powerpoint files, these are not included in the repository, instead you can find uptodate powerpoint slides here. Also, some datasets arent included in the repository. All the code use Python 3.

Pre-course Training Materials

In the Big Data Analytics with Python course, we will use the Python programming language to interact with data. To ensure that participants gain the most out of the course, we require that you have basic skills in Python. To this end, I have suggested course materials which you should complete in preparation for the course.

Introduction to Python

See below two links for free Python courses. You need only do one of the courses, but you can do both if you will. They are both free and will take less than 5 hours of your time. Once you finish the course(s), you will have the prerequisite Python knowledge to enable you gain the most out of the 5-day course.

  1. Free Udemy Python Course

  2. Another Free Udemy Python Course

Github

We will use Github for tracking our code and submitting exercises. As such, its important that you make yourself familiar with Github. Refer to the links below for Github training materials.

  1. Github tutorial on Youtube
  2. Github tutorial