title
Lesson Design

This page documents the design process and motivation of this lesson material.

Lesson Title: An Introduction to Deep Learning

Target audience

The main audience of this carpentry lesson is PhD students that have little to no experience with deep learning. In addition, we expect them to know basics of statistics and machine learning.

Learner Profiles

Ann from Meteorology

Ann has collected 2-3 GB of structured image data from several autonomous microscope on baloon expeditions into the atmostphere within her PhD programme. Each image has a time stamp to it which can be related to the height of the baloon at this point and the current weather conditions. The images are unstructured and she would like to detect from the images if the baloon traversed a cloud or not. She has tried to do that with standard image processing methods, but the image artifacts to descriminate are somewhat diverse. Ann has used machine learning on tabular data before and would like to use Deep Learning for the images at hand. She saw collaborators in another lab do that and would like to pick up this skill.

Barbara from Material Science

Barbara just started her PostDoc in Material Science. Her new group has a large amount of scanning electron miscroscope images stored which exhibit several metals when exposed to a plasma. The team also made the effort to highlight solid deposits in these images and thus obtained 20,000 images with such annotations. Barbara performed some image analysis before and hence has the feeling that Deep Learning may help her in this task. She saw her labmates use ML algorithms for this and is motivated to finally understand these approaches.

Dan from Life Sciences

Dan produced a large population of bacteria that were subject to genetic alterations resulting in 10 different phenotypes. The latter can be identified by different colors, shapes and movement speed under a fluorescence microscope. Dan has not a lot of experience with image processing techniques to segment these different objects, but used GUI based tools like fiji and others. He has recorded 50-60 movies of 30 minutes each. 10 of these movies have been produced with one type of phenotype only. Dan doesn't consider himself a strong coder, but would need to identify bacteria of the phenotypes in the dataset. He is interested to learn if Deep Learning can help.

Eric from Pediatrics Science

Eric ran a large array of clinical trials in his hospital to improve children pharmaceutics for treating a common (non-lethal) virus. He obtained a table that lists the progression of the treatment for each patient, the dose of the drug given, whether the patient was in the placebo group or not, etc. As the table has more than 100 000 rows, Eric is certain that he can use ML to cluster the rows in one column where the data taking was inconsistent. Eric has touched coding here and there where necessary, but never saw it necessary to learn coding. His cheatsheet is his core wisdom with code. So his supervisor invited him to take a course on ML as "this is the tech of these days!" as his boss said.

Notes

Probably have overhyped expectations of deep learning.
They don’t know if it’s the right tool for their situations.
They have no idea what it takes to actually do deep learning.
Want to quickly have some useful skills for their own data.

Required Pre-Knowledge

Python – Previous programming experience in Python is required (Refer to Python Data Carpentry Lesson)
Pandas – Knowledge of the Pandas Python package
Basic Machine Learning Knowledge – Data cleaning, train & test split, overfitting & underfitting, metrics (accuracy, recall, etc.),

Learning objectives

Overview

After following this lesson, learners will be able to:

Prepare input data for use for deep learning

Design and train a Deep Neural Network

Troubleshoot the learning process

Measure the performance of the network

Visualizing data and results

Re-use existing network architectures with and without pre-trained weights

{: .objectives }

The following offers more details to each learning objective based on Bloom's Taxonomy. For hints on how to use this approach, see lesson 15 of the instructor training

Prepare input data for use for deep learning

This includes cleaning data, filling missing values, normalizing, and transforming categorical columns into dummy encoding.