Quantitative Methods in the Social Sciences (QMSS)
Graduate School of Arts and Sciences (GSAS)
Columbia University
Course: QMSS G4063 Spring 2015
Lecture: MW 1:10pm-2:25pm at 313 Fayerweather
Office Hours: W 2:30pm-3:30pm at IAB 270C
Elliot Cohen, Ph.D.
Lecturer in the Department of Statistics
Columbia University
This course offers a rigorous introduction to data visualization from theory to implementation. Drawing on a combination of lectures, readings, discussions and coding, we will translate the timeless concepts of Minard, Playfair, Tufte and Wilkinson to new and diverse fields of study. Students will receive a coding crash-course in R, JavaScript, CSS, HTML and D3. The goal is not to become computer scientists, but to build the requisite foundation for modern implementation of exploratory and explanatory data visualizations. Students will have the opportunity to work in small teams to create interactive data visualizations worthy of their portfolios. The final deliverable will be a research-driven data visualization with accompanying prose in the form of a conference paper submission. A working knowledge of R from at least one previous class is highly recommended.
- Quizzes (30%) Quizzes are essential for assessing student learning and pedagogical efficacy. There will be 4 quizzes in total; students may pick their best three to count towards their final grade.
- Homework (30%)
Students will complete at least three assignments and submit them to the course repository as pull requests. All submissions must follow a standard naming convention:
Year-Month-Day-YourName-AssignmentName.FileExtension
Assignments will serve as progress indicators on key concepts, methods and techniques. - Semester Project (30%) Students will have the opportunity to work in small teams to create data visualizations worthy of their portfolios. The final deliverable will be a well-articulated, research-driven data visualization and accompanying prose in the form of a conference paper submission. Students will have considerable leeway in choosing a project topic and finding an appropriate conference or forum for submission.
- Class Participation (10%)
- Attendance
- Being awake, attentive and respectful
- Being helpful to peers and the class as a whole
- Contributing to group work and peer code reviews
- Joint Committee on Standards for Graphic Presentation. 1915. American Statistical Association, 14 (112): 790-797.
- Edward R. Tufte. 2001. The visual display of quantitative information. Cheshire, Conn.: Graphics Press, c2001.
- Leland Wilkinson. 2005. The grammar of graphics. New York: Springer, 2005.
- Hadley Wickham. 2009. ggplot2: elegant graphics for data analysis
- Norman Matloff. 2011. The art of R programming. San Francisco : No Starch Press, c2011.
- Scott Murray. 2013. Interactive data visualization for the web. Sebastopol, CA: O'Reilly Media, 2013.
... and resources to help you get there
- Tufte's Rules. Above all else, show the data.
- Grammar of Graphics. Wilkinson's theory and Wickham's implementation.
- Meet your computer
- command line
- text editors
- file paths
- Working with data in
R
- basic training
- data analysis with
plyr
- data visualization with
ggplot
- scripting, debugging and writing functions
- reproducible research and dynamic output with Rmarkdown
- communicating & sharing your results in the browser
- github.io
HTML
,CSS
,JavaScript
- interactive visualization with
D3
- Version control and collaboration with
github
-
Read about RMarkdown
-
Install git
-
Create a github account if you don't already have one
-
Fork the class repo. Your assignments will be submitted as pull requests!
git clone https://github.com/YOUR-NAME/data-viz.git cd data-viz git remote add upstream https://github.com/ecohen4/data-viz.git
Your assignments will be submitted as pull requests to the class repository on github! Suppose you saved changes on your own gh-pages
branch and would like to submit a ‘clean’ pull request with only your files and the commits you want. This is pretty easy.
git checkout upstream/master #you will be on a ‘detached HEAD’
git checkout -b hw1 #checkout a new branch called 'hw1'
git checkout <branch> <folder/filename> #pluck a folder/file from another branch but stay on the current branch (in this case 'hw1').
git add <folder/filename>
git commit -m "add only the right files on new clean branch"
git push -u origin hw1 #push commits to a new branch called hw1.
Your new hw1
branch now has a copy of the folder/file(s) your plucked from elsewhere. Your working tree is still on the hw1
branch and you can continue to work on the files and commit+push further changes as frequently as you like.
Rebasing rewrites history of a branch, in a really clever way. Each commit becomes a new commit, on top of a new beginning point. This is probably the most common way of making a clean pull request.
git checkout gh-pages
git checkout -b hw1
git fetch upstream
git rebase -i origin/gh-pages
At this point you’re given a list, where you can pick, squash, or remove commits from your branch. Remember, a branch is just a collection of commits. If, for example, you only want to include the last few commits, simply delete all the others and allow rebase to continue. You should now have a branch that contains only the commits you want.