Doing Data Science at MEST

The main goal of this course is to introduce students to data science techniques that allows them to produce production-level data products that solve problems. What this means is that we will aim to incorporate and deploy our data products into a web or mobile application for users to interact with. I hope at the end of this course, students will be able to apply core data science principles to build data-driven products and organizations.

This is an introductory Data Science course aim to introduce students to a breadth of concepts in Data Science. The aim is to introduce the OSEMN(Obtain, Scrub, Explore, Model and INterpret) process of data science to students towards developing skills that foster data-driven thinking and products. This course is intended for anyone interested in learning more about the data science process and applying it to their everyday lives, projects or organisations.

Session Facilitator

David Selassie Opoku

Important Resources

Google+ MEST Data Science Community
Mailing list: data[at]meltwater.org
Session times: Fridays 1 - 3pm
Help Sessions: Mondays, Tuesdays & Thursdays, 6 - 7pm

Session Outline [Work in Progress]

These sessions will be made up of two main components:

Book discussion sessions

The reading session will normally be during the first half hour, where we will discuss the data science book we are reading at the moment. This is to help build your qualitative and knowledge about questions and developments that are happening in the space.
Quantitative sessions

This will involve working on building the quantitative/technical skillset needed to do Data Science. This will span Python tutorials, statistics, Machine Learning, Visualisation amongst other. We will be using a combination of well-prepared courses, books and material provided by leading experts and organisation in the field.

Outline

Milestone 1: Core Skills

Outcome 1: Python installation
- Install Essential libraries
Outcome 2: Intro to Python
- Learn Python with Google
Outcome 3: Intro to Data Science
- Udacity Intro to Data Science Course
- Personal Project: Working through data science process in a personal project of your choice. Extra points for MEST-related projects.
Outcome 4: Intro to Statistics [SKIPPED]
- Udacity Intro to Statistics
- Personal Project: Answer a question of interest using statistics and data of your choice. Extra points for MEST-related projects.
Outcome 5: Team Formation & Specialisation
- Choose one area of Specialisation
- Choose teams for final team project. Must have at least one member from each specialisation.

Milestone 2: Specialisations

Data Visualisation -Data Visualisation and D3.js
Data Science Infrastructure
Machine Learning
Statistics
- Intro to Descriptive Statistics
- Intro to Inferential Statistics

Milestone 3: Team Data Product

Model Building and Validation

Prerequisites

Students interested in taking this course should be comfortable or eager to ask questions, experiment with new ideas and build products. A basic familiarity in using a Unix/Linux command line is recommended.

Data Science Toolbox

Mastering the Data Science process requires having a set of basic tools to process your data, test your hypothesis and extract meaning for insights. For this course, we will show examples primarily using Python and its libraries, and sometimes R, JavaScipt or Java for specific topics. Students who have suggestions or prefernces for other tools should feel free to use them and share with the community. The ball is in your court. See below for the tool set for this course:

Tools	Python	R	JavaScript
IDE	IPython	RStudio	X
Data Processing	Pandas, Scipy, Numpy	Core libraries	X
Machine Learning & Data Mining	scikit-learn	Several R libraries	X
Data Graphics	Matplotlib, ggplot2	ggplot2	D3
Interactive Visualization	Plotly	Shiny	D3, Dimple.js
Big Data	Hadoop, Spark, Storm	Read this	X
Web Development	web2py, flask	X	MeteorJS

Hands-on Work

Reading List

February-March: Doing Data Science

April: The Field Guide to Data Science
April: Thinking With Data
May: Data Driven: Creaing a Data Culture
May-June: Agile Data Science

Projects

Personal Data Project
Team Data Project

Resources

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Datasets		Datasets
Hadoop		Hadoop
Notebooks		Notebooks
Pig		Pig
Python		Python
Udacity		Udacity
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Doing Data Science at MEST

Session Facilitator

Important Resources

Session Outline [Work in Progress]

Outline

Milestone 1: Core Skills

Milestone 2: Specialisations

Milestone 3: Team Data Product

Prerequisites

Data Science Toolbox

Hands-on Work

Reading List

Projects

Resources

Books

Articles

Videos

Data Sources

Blogs

About

Releases

Packages

Contributors 2

Languages

mestafrica/DataScience

Folders and files

Latest commit

History

Repository files navigation

Doing Data Science at MEST

Session Facilitator

Important Resources

Session Outline [Work in Progress]

Outline

Milestone 1: Core Skills

Milestone 2: Specialisations

Milestone 3: Team Data Product

Prerequisites

Data Science Toolbox

Hands-on Work

Reading List

Projects

Resources

Books

Articles

Videos

Data Sources

Blogs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages