Note: This repository has been archived

This course was developed under a previous phase of the Yale Digital Humanities Lab. Now a part of Yale Library’s Computational Methods and Data department, the Lab no longer includes this course in its scope of work. As such, it will receive no further updates.

YData: Humanities Data Mining

This repository contains materials for YData: Humanities Data Mining (S&DS 176 / S&DS 576), taught in the Spring of 2022 at Yale University. For more information on the course, please consult the preliminary syllabus or the course materials below (please note that some materials require a Yale NetID for access).

You can also view the syllabus from the first year the class was taught, Spring 2021.

Week One: Introduction to Data Mining

In our first week, of class we will discuss some of the ways researchers from the humanities and beyond have used data mining, and we will take our first steps with the Python programming language.

Tuesday Slides
Thursday Lab Notebook
Readings

Michael Witmore, “Text: A Massively Addressable Object”
Ted Underwood, “Seven ways humanists are using computers to understand text”

Week Two: Collecting Data from APIs

In our second week, of class we will take a deeper dive into data--what it is, how it's created, and how we can find and use it. In particular, we'll explore Application Programming Interfaces (APIs)--little machines that give us data to analyze!

Tuesday Slides
Readings

Christof Schöch, “Big? Smart? Clean? Messy? Data in the Humanities”
Johanna Drucker, “Why Distant Reading Isn’t” (VPN or on-campus network needed)

Week Three: Data Visualization

Tuesday Slides
Thursday Lab Tutorial
Readings

Catherine D’Ignazio and Lauren Klein, “Feminist Data Visualization”

In our third week, we will consider strategies and best practices for visualizing data that take into account what kind of data we have, who we have in mind as our audience, what story we're aiming to tell, and where we think the visualization will circulate. For Thursday's lab, please download Tableau Public.

No problem set assigned this week -- Work on Project Review 1: Text Mining. You can find the prompt in Canvas under "Assignments."

Week Four: Text Analysis: Named Entity Recognition

In our fourth week, we will begin turning our attention to text analysis in more detail. In particular, we will experiment with an approach called named entity recognition, which can help us extract entities (names, locations, organizations) from text.'

Tuesday Slides

Thursday Notebook

Readings

Richard Jean So, “All Models are Wrong”
Jean Baptiste-Michel et al. “Quantitative Analysis of Culture Using Millions of Digitized Books”

Week Five: Clustering and Classification

In our fifth week, we will explore supervised methods for classifying and clustering data using Python. We will consider when such approaches could be helpful, as well as what the limitations are and what kind of data we need to have.

Tuesday Slides

Thursday Notebook

Thursday Problem Set

Readings

Patrick Juola, “How a Computer Program Helped Show J.K. Rowling write A Cuckoo’s Calling” [sic]
Franco Moretti, "The Slaughterhouse of Literature" [on-campus network or VPN required]

Week Six: Review and Topic Modeling

In our sixth week, we will review several of the programming topics we have covered so far in the semester, and we'll explore a few new topics that will prove useful as we continue our data science work in the coming weeks. We will learn about topic modeling by looking at case studies and experimenting with model parameters. The particular approach we'll be using is called non-negative matrix factorization (NMF), which like the classifier we trained in week five, starts with a Term-Document Matrix.

Tuesday Slides

Thursday Notebook

Thursday Problem Set

Readings

Underwood, Ted. "Topic Modeling Made Just Simple Enough"
Blevins, Cameron. "Topic Modeling Martha Ballard's Diary"

Week Seven: Text & Image Analysis: Neural Networks

In our seventh week, we will begin our transition from text mining to image mining techniques by way of neural networks. On Thursday, we will focus on word embeddings, a technique for identifying words that appear in similar contexts.

Tuesday Slides

Readings

Gideon Lewis-Kraus, “The Great A.I. Awakening”
Jonathan Fitzgerald, “Word Embeddings are the New Topic Models”
Optional: Ryan Heuser, “Word Vectors in the Eighteenth Century”
Optional: Ben Schmidt, “Vector Space Models for the Digital Humanities”

Thursday Notebook

Thursday Problem Set

Week Eight: Computer Vision: Color & Art

In our eighth week, we will start looking more closely at image mining, with an overview of projects, techniques, and data considerations. For hands-on practice, we will experiment with color extraction.

Tuesday Slides

Tuesday Readings:

The True Colors of America’s Political Spectrum Are Gray and Green

Thursday Notebooks and links:

Week Nine: Spring Recess

Week 10: Visual Similarity

In our tenth week, we will be discussing techniques for measuring and identifying image similarity. In particular, we will focus on Convolutional Neural Networks as our approach.

Tuesday Readings:

The visual digital turn: Using neural networks to study historical images

Tuesday Slides

Tuesday In-Class Links:

Lyrics Text Comparison
Image Similarity Ordering
Neural Neighbors (Meserve-Kunhardt Collection)

Thursday Notebooks:

Week 11: Image, Video and Music Analaysis

In our eleventh week, we will look at methods for video (or moving image) analysis and consider when, why, and how we might go about it. As a capstone to our image analysis module, we will use the Distant Viewing Toolkit. We'll also explore classifing sound files according to musical genre with guest lecturer Nicole Cosme (Yale Music Department).

Tuesday Slides
Tuesday Notebook

Distant Viewing Toolkit (DVT) Demo

Tuesday Readings

Distant viewing: analyzing large visual corpora
Optional: Unraveling the JPEG

Week 12: Open Lab 1, Finding and Preparing Data

In our twelfth week, we will begin our open lab sessions, which are designed to pull together the material from the course while giving you time to work on your final projects. For our first open lab, we will discuss pathways for finding and preparing data.

Week 13: Open Labs 2 & 3, Analyzing and Visualizing Data

In our thirtienth week, we will continue bringing the course material together by discussing how to identify an appropriate method based on your research question and available data. We will also review strategies for visualizing and sharing results.

Week 14: Presentations

In our fourteenth week, we will conclude with class presentations to showcase everyone's work. Thank you for an incredible semester!

Acknowledgements

The course materials are published under a CC BY 3.0 US license. This course, Humanities Data Mining, was created in 2021 by Dr. Catherine DeRose and Dr. Douglas Duhaime.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
data		data
workshop-materials		workshop-materials
.gitignore		.gitignore
GLOSSARY.md		GLOSSARY.md
README.md		README.md
REVIEW.md		REVIEW.md
Spring-2021.md		Spring-2021.md
YDATA-HumanitiesDataMiningSpring2022.pdf		YDATA-HumanitiesDataMiningSpring2022.pdf
ydata-syllabus-2021.pdf		ydata-syllabus-2021.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Note: This repository has been archived

YData: Humanities Data Mining

Week One: Introduction to Data Mining

Week Two: Collecting Data from APIs

Week Three: Data Visualization

Week Four: Text Analysis: Named Entity Recognition

Week Five: Clustering and Classification

Week Six: Review and Topic Modeling

Week Seven: Text & Image Analysis: Neural Networks

Week Eight: Computer Vision: Color & Art

Week Nine: Spring Recess

Week 10: Visual Similarity

Week 11: Image, Video and Music Analaysis

Week 12: Open Lab 1, Finding and Preparing Data

Week 13: Open Labs 2 & 3, Analyzing and Visualizing Data

Week 14: Presentations

Acknowledgements

About

Releases

Packages

Contributors 3

YaleDHLab/humanities-data-mining

Folders and files

Latest commit

History

Repository files navigation

Note: This repository has been archived

YData: Humanities Data Mining

Week One: Introduction to Data Mining

Week Two: Collecting Data from APIs

Week Three: Data Visualization

Week Four: Text Analysis: Named Entity Recognition

Week Five: Clustering and Classification

Week Six: Review and Topic Modeling

Week Seven: Text & Image Analysis: Neural Networks

Week Eight: Computer Vision: Color & Art

Week Nine: Spring Recess

Week 10: Visual Similarity

Week 11: Image, Video and Music Analaysis

Week 12: Open Lab 1, Finding and Preparing Data

Week 13: Open Labs 2 & 3, Analyzing and Visualizing Data

Week 14: Presentations

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages