- Description
- Learning Outcomes
- Course Contacts
- Pre-Course Work
- Design
- Schedule
- Prerequisites
- Expectations
- Policies
- Folder Structure
- Key Texts
- Acknowledgements
The course was created by the University of Toronto's Data Sciences Institute. It is designed for those who have a degree in something other than Computer Science/Statistics and are looking to enhance their data science skills for their career.
The first half of the course will focus on the essentials of coding in Python and ethical considerations of using algorithms. You will learn how to design functions, repeat code using loops, store data in lists, test and debug your code, and manipulate data using various data analysis and visualization tools such as numpy
, pandas
, matplotlib
, seaborn
, and plotly
. You will have discussions about the Tuskegee experiment, its long-term effects, and the trustworthiness of AI applications in disparate social systems.
The second half of the course will develop the professional skills necessary to be a data scientist with a focus on machine learning. You will go through an industry overview, explore the job interview process, including potential technical questions, and receive additional resources.
After successfully completing the course, the students will:
- Understand various Python data types and their role in coding. This includes being able to differentiate and evaluate expressions using numeric types (integer, long, and floating-point numbers), Booleans, strings, and lists. This will be assessed in Assignment 1.
- Be able to reduce the duplication of code by following the Function Design Recipe and create functions in Python. This will be assessed in Assignment 1.
- Be able to use
numpy
andpandas
to analyze a dataset, more specifically, be able to use these libraries to manipulate numerical and tabular data in Python. This will be assessed in Assignment 1 and 2. - Know how to interact with databases via Python. This includes using visualization techniques like
matplotlib
,seaborn
, andplotly
. This will be assessed in Assignment 2. - Know how to debug and test Python code. Students will learn to troubleshoot errors and to select test cases to check for correctness, reliability, and robustness of code. This will be assessed in Assignment 1 and 2.
- Understand the ethical issues with software and be aware of case studies in which software failure resulted in catastrophe.
- Be able to answer job interview questions with confidence.
- Instructor: Kaylie Lau (she/her). Emails to the instructor can be sent to [email protected].
- TA: Salaar Liaqat (he/him). Emails to the teaching assistant can be sent to [email protected].
Prior to the first class please:
- Create a Google account that can use Google Colab:
- Go to https://colab.research.google.com/. In the upper left corner, click File, then New Notebook
- Enter
!python --version
in the code cell, then hit ctrl+enter to run the cell and confirm that your Python version is 3.6 or above.
- If you are having issues with the set-up, the TA will be available to help with this Monday 28 November from 5pm-6pm.
- Complete the pre-course survey: https://forms.gle/rcVCTfZasarXAGQg9
The course runs synchronously over Zoom. It consists of three classes a week for three weeks, or nine classes total. Classes are 6 PM - 8 PM EDT on Mondays and Thursdays, and 9 AM - 12 PM EDT on Saturdays. Being mindful of online fatigue, there will be one or two breaks during each class where students are encouraged to stretch, grab a drink and snacks, or ask any additional questions.
Tutorial sessions with a TA will also be offered over Zoom. These will take place from 5 PM - 6 PM EDT on Mondays and Thursdays, and 8:30 AM - 9 AM EDT and 12 PM - 12:30 PM EDT on Saturdays.
Schedule is tentative and may be modified as needed. Learners will be notified of schedule changes.
- Day 1 (Monday 28 November, 6pm-8pm): Getting Started I (Introduction; Python fundamentals)
- Day 2 (Thursday 1 December, 6pm-8pm): Getting Started II (Python fundamentals)
- Day 3 (Saturday 3 December, 9am-noon): Dealing with Reality (Control flow using conditionals and loops; Lists, tuples, sets, and dictionaries)
- Day 4 (Monday 5 December, 6pm-8pm): In/Out (Modules; Working with files; Object-oriented programming)
- Day 5 (Thursday 8 December, 6pm-8pm): Doing More with Data I (
numpy
) - Day 6 (Saturday 10 December, 9am-noon): Doing More with Data II (
pandas
) - Day 7 (Monday 12 December, 6pm-8pm): Visualizing Data (
matplotlib
;seaborn
;plotly
) - Day 8 (Thursday 15 December, 6pm-8pm): Professional skills: Industry case study - Hareem Naveed
- Day 9 (Saturday 17 December, 9am-noon): Review and Ethics
Learners are expected to know how to operate a computer and are also expected to be familiar with the parts of a data table or spreadsheet, summary statistics, and basic data visualizations. No prior programming knowledge is required.
The course is a live-coding class. Learners are expected to follow along with the coding in their own Python notebooks. Learners should be active participants while coding and are encouraged to ask questions throughout. Although slides will be available, they should be referenced before or after class, as class will be dedicated to coding with the instructor.
- Learners must have an internet connection and a computer to participate in online activities
- Learners must have a Google account that can use Google Colab
- Accessiblity: We want to provide an accessible learning environment for all. If there is something we can do to make this course more accessible to you, please let us know.
- Course communications: Communications take place over email. Please include "DSI-Python" or similar in the subject line, e.g. "DSI-Python: pandas question"
- Camera: Keeping your camera on is optional.
- Microphone: Please keep microphones muted unless you need to speak. Please indicate your name before speaking as some Zoom configurations make it hard to tell who is talking!
- Assessment: There will be homework which is not graded, but highly reccomended, and there will be three assignments which are graded.
- 01-slides: Course slides as interactive Google Colab notebooks (.ipynb files)
- 02-html-slides: Course slides as HTML files that can be downloaded and viewed in a web browser
- 03-pdf-slides: Course slides as PDF files
- 04-homework: Optional homework to practice concepts covered in class
- 05-assignments: Graded assignments
- 06-html-assignments: Assignments as HTML files
- 07-pdf-assignments: Assignments as PDF files
- 08-live-code: Notebooks from class live coding sessions
- data: Datasets used in the course
- README: This file!
- LICENSE: Copyright information for these materials
- .gitignore: Files to exclude from this folder, specified by the instructor
- 00 Hello Python
- 01 Getting Started: Python Fundamentals
- 02 Dealing with Reality: Control Flow and Iterables
- 03 In/Out: Modules, Files, OOP
- 04a Doing More with Data:
numpy
- 04b Doing More with Data:
pandas
- 05 Visualizing Data
- 06 Ethics
Gries, Campbell, and Montojo, 2017, Practical Programming: An Introduction to Computer Science Using Python 3.6. Adhikari, DeNero, and Wagner, Computational and Inferential Thinking: The Foundations of Data Science.
Course materials were originally developed by Asel Kushkeyeva under the supervision of Rohan Alexander, University of Toronto. Materials have been modified by A. Mahfouz and Kaylie Lau for 2022.