- Instructor: Erin Shellman, [email protected]
- Teaching Assistant: Bryan Mayer, [email protected]
- Course Location: Puget Sound Plaza in room 406 (map)
- Course Time: Mondays 6:00 - 9:00 PM
- Dates: April 4, 2016 through June 13, 2016
Welcome to data mining! This is an applied course meant to teach you practical tools for data mining and knowledge discovery. The course is composed of three units: prediction of continuous outcomes, classification, and unsupervised learning. The goal is to provide experience in a breadth of applications and to prepare you for the job of an analyst, data scientist, or any role that calls for data mining. If you already have previous experience with R or data mining, there are additional readings and techniques in the projects to challenge you and elevate your skills.
Grading is based on classroom participation, completion of homework and projects, and attendance. Students are required to attend 80% of the lectures to receive a passing grade.
There are three projects, one in each topic area. For each project, you will receive a business problem and a corresponding data set. You're free to use any methods you like, so long as you support your choices. You will write a brief report of your analyses and provide/receive feedback from your classmates. You will have time in class to ask questions and work on your projects.
When you turn in your project reports, you will receive the reports of three of your classmates. During the following week, read their reports and provide thoughts and feedback. Please write at least a paragraph discussing parts of the analyses you liked and disliked. While you're reading, try to put yourself into the mind of the business stakeholder and ask if your requests were adequately met. Are you confident in the conclusions drawn? Were the figures and supporting evidence compelling? Remember to maintain a tone of mutual respect and read the section on policies and values for more information.
Assignment | Date |
---|---|
Project 1 | May 9 |
Project 1 Critiques | May 16 |
Project 2 | May 23 |
Project 2 Critiques | May 30 |
No class! | May 30 |
Project 3 | June 13 |
Last class! | June 13 |
There is no required textbook for this course. Everything you need to succeed is available in the course repository.
A large component of this course is in-class discussion and providing critical feedback on the analyses of your peers. It is imperative that all students are thoughtful when providing written feedback and participating in class. This means using respectful language in discussions and writings, but also being respectful of our limited class time by arriving prepared and engaged.
Everyone is required to do original work for all projects. You're free to openly discuss the projects and your approaches, just like you would in a professional setting, but reports should be your own.
Students with disabilities requiring addition services can find resources at the UW Disability Resources for Students page.
The lecture notes are available here.
Week | Date | Topic | Dataset |
---|---|---|---|
1 | April 4 | Introduction to data mining and programming with R | Capital Bikeshare: bikeshare_2015.tsv |
2 | April 11 | Linear regression | Capital BikeShare |
3 | April 18 | Linear regression extensions | Capital BikeShare |
4 | April 25 | Flex-time, Logistic regression | Twitter user data: bot_or_not.tsv |
5 | May 2 | Classification trees | Twitter user data |
6 | May 9 | Classification | Twitter user data |
7 | May 16 | Association rules | colleges.tsv |
8 | May 23 | Clustering | |
9 | June 6 | Sharing your work | None! |
10 | June 13 | Guest panel | None! |
We'll be using the statistical programming language R for this course. In addition, I highly recommend that you use RStudio, a powerful interactive development environment (IDE) for R. If you plan to use your own laptop computer in class, please install R and RStudio on your laptop before the first day of class. The computers in the classroom will have everything you need installed.