Doing Data Science at MEST
The main goal of this course is to introduce students to data science techniques that allows them to produce production-level data products that solve problems. What this means is that we will aim to incorporate and deploy our data products into a web or mobile application for users to interact with. I hope at the end of this course, students will be able to apply core data science principles to build data-driven products and organizations.
This is an introductory Data Science course aim to introduce students to a breadth of concepts in Data Science. The aim is to introduce the OSEMN(Obtain, Scrub, Explore, Model and INterpret) process of data science to students towards developing skills that foster data-driven thinking and products. This course is intended for anyone interested in learning more about the data science process and applying it to their everyday lives, projects or organisations.
- Google+ MEST Data Science Community
- Mailing list: data[at]meltwater.org
- Session times: Fridays 1 - 3pm
- Help Sessions: Mondays, Tuesdays & Thursdays, 6 - 7pm
These sessions will be made up of two main components:
-
Book discussion sessions
The reading session will normally be during the first half hour, where we will discuss the data science book we are reading at the moment. This is to help build your qualitative and knowledge about questions and developments that are happening in the space.
-
Quantitative sessions
This will involve working on building the quantitative/technical skillset needed to do Data Science. This will span Python tutorials, statistics, Machine Learning, Visualisation amongst other. We will be using a combination of well-prepared courses, books and material provided by leading experts and organisation in the field.
-
Outcome 1: Python installation
- Install Essential libraries
-
Outcome 2: Intro to Python
-
Outcome 3: Intro to Data Science
- Udacity Intro to Data Science Course
- Personal Project: Working through data science process in a personal project of your choice. Extra points for MEST-related projects.
-
Outcome 4: Intro to Statistics [SKIPPED]
- Udacity Intro to Statistics
- Personal Project: Answer a question of interest using statistics and data of your choice. Extra points for MEST-related projects.
-
Outcome 5: Team Formation & Specialisation
- Choose one area of Specialisation
- Choose teams for final team project. Must have at least one member from each specialisation.
-
Data Visualisation -Data Visualisation and D3.js
-
Data Science Infrastructure
-
Machine Learning
-
Statistics
Students interested in taking this course should be comfortable or eager to ask questions, experiment with new ideas and build products. A basic familiarity in using a Unix/Linux command line is recommended.
Mastering the Data Science process requires having a set of basic tools to process your data, test your hypothesis and extract meaning for insights. For this course, we will show examples primarily using Python and its libraries, and sometimes R, JavaScipt or Java for specific topics. Students who have suggestions or prefernces for other tools should feel free to use them and share with the community. The ball is in your court. See below for the tool set for this course:
Tools | Python | R | JavaScript |
---|---|---|---|
IDE | IPython | RStudio | X |
Data Processing | Pandas, Scipy, Numpy | Core libraries | X |
Machine Learning & Data Mining | scikit-learn | Several R libraries | X |
Data Graphics | Matplotlib, ggplot2 | ggplot2 | D3 |
Interactive Visualization | Plotly | Shiny | D3, Dimple.js |
Big Data | Hadoop, Spark, Storm | Read this | X |
Web Development | web2py, flask | X | MeteorJS |
- February-March: Doing Data Science
- April: The Field Guide to Data Science
- April: Thinking With Data
- May: Data Driven: Creaing a Data Culture
- May-June: Agile Data Science
- Personal Data Project
- Team Data Project
- General
- Statistics and Machine Learning
- Data Infrastructure
- Visualisation
- A Taxonomy of Data Science
- NY Times Bits post on Looking to the Future of Data Science
- Hanna Wallach's Medium post on Big Data, Machine Learning, and the Social Sciences
- Data Robot's post on A Primer on Deep Learning
- Data Science Has Been Using Rebel Statistics for a Long Time
- Hilary Mason's Devs Love Bacon: Everything you need to know about Machine Learning in 30 minutes or less
- Jeremy Howard's TED talk on The wonderful and terrifying implications of computers that can learn
- The Human Data Exchange
- Open Data for Africa
- Ghana Open Data Initiative
- Africa Open Data
- Hilary Mason's Bundle of Research-Quality Data Sets
- Quandl
- Enigma
- Reddit Open Data
- Datamob
- Twitter API
- UCI Machine Learning Respository
- The Open-Source Data Science Masters
- Nathan Yau's Flowing Daa
- Slender Means Will It Python posts
- United Nations Global Pulse
- Fast ML
- Data Science Central