Skip to content
This repository has been archived by the owner on Sep 4, 2024. It is now read-only.

Materials for YData Course "Humanities Data Mining"

Notifications You must be signed in to change notification settings

YaleDHLab/humanities-data-mining

Repository files navigation

Note: This repository has been archived

This course was developed under a previous phase of the Yale Digital Humanities Lab. Now a part of Yale Library’s Computational Methods and Data department, the Lab no longer includes this course in its scope of work. As such, it will receive no further updates.

YData: Humanities Data Mining

This repository contains materials for YData: Humanities Data Mining (S&DS 176 / S&DS 576), taught in the Spring of 2022 at Yale University. For more information on the course, please consult the preliminary syllabus or the course materials below (please note that some materials require a Yale NetID for access).

You can also view the syllabus from the first year the class was taught, Spring 2021.

Week One: Introduction to Data Mining

In our first week, of class we will discuss some of the ways researchers from the humanities and beyond have used data mining, and we will take our first steps with the Python programming language.

Tuesday Slides
Thursday Lab Notebook
Readings

Week Two: Collecting Data from APIs

In our second week, of class we will take a deeper dive into data--what it is, how it's created, and how we can find and use it. In particular, we'll explore Application Programming Interfaces (APIs)--little machines that give us data to analyze!

Tuesday Slides
Readings

Week Three: Data Visualization

Tuesday Slides
Thursday Lab Tutorial
Readings

In our third week, we will consider strategies and best practices for visualizing data that take into account what kind of data we have, who we have in mind as our audience, what story we're aiming to tell, and where we think the visualization will circulate. For Thursday's lab, please download Tableau Public.

No problem set assigned this week -- Work on Project Review 1: Text Mining. You can find the prompt in Canvas under "Assignments."

Week Four: Text Analysis: Named Entity Recognition

In our fourth week, we will begin turning our attention to text analysis in more detail. In particular, we will experiment with an approach called named entity recognition, which can help us extract entities (names, locations, organizations) from text.'

Tuesday Slides

Thursday Notebook

Readings

Week Five: Clustering and Classification

In our fifth week, we will explore supervised methods for classifying and clustering data using Python. We will consider when such approaches could be helpful, as well as what the limitations are and what kind of data we need to have.

Tuesday Slides

Thursday Notebook

Thursday Problem Set

Readings

Week Six: Review and Topic Modeling

In our sixth week, we will review several of the programming topics we have covered so far in the semester, and we'll explore a few new topics that will prove useful as we continue our data science work in the coming weeks. We will learn about topic modeling by looking at case studies and experimenting with model parameters. The particular approach we'll be using is called non-negative matrix factorization (NMF), which like the classifier we trained in week five, starts with a Term-Document Matrix.

Tuesday Slides

Thursday Notebook

Thursday Problem Set

Readings

Week Seven: Text & Image Analysis: Neural Networks

In our seventh week, we will begin our transition from text mining to image mining techniques by way of neural networks. On Thursday, we will focus on word embeddings, a technique for identifying words that appear in similar contexts.

Tuesday Slides

Readings

Thursday Notebook

Thursday Problem Set

Week Eight: Computer Vision: Color & Art

In our eighth week, we will start looking more closely at image mining, with an overview of projects, techniques, and data considerations. For hands-on practice, we will experiment with color extraction.

Tuesday Slides

Tuesday Readings:

Thursday Notebooks and links:

Week Nine: Spring Recess

Week 10: Visual Similarity

In our tenth week, we will be discussing techniques for measuring and identifying image similarity. In particular, we will focus on Convolutional Neural Networks as our approach.

Tuesday Readings:

Tuesday Slides

Tuesday In-Class Links:

Thursday Notebooks:

Week 11: Image, Video and Music Analaysis

In our eleventh week, we will look at methods for video (or moving image) analysis and consider when, why, and how we might go about it. As a capstone to our image analysis module, we will use the Distant Viewing Toolkit. We'll also explore classifing sound files according to musical genre with guest lecturer Nicole Cosme (Yale Music Department).

Tuesday Slides
Tuesday Notebook

Tuesday Readings

Week 12: Open Lab 1, Finding and Preparing Data

In our twelfth week, we will begin our open lab sessions, which are designed to pull together the material from the course while giving you time to work on your final projects. For our first open lab, we will discuss pathways for finding and preparing data.

Week 13: Open Labs 2 & 3, Analyzing and Visualizing Data

In our thirtienth week, we will continue bringing the course material together by discussing how to identify an appropriate method based on your research question and available data. We will also review strategies for visualizing and sharing results.

Week 14: Presentations

In our fourteenth week, we will conclude with class presentations to showcase everyone's work. Thank you for an incredible semester!

Acknowledgements

The course materials are published under a CC BY 3.0 US license. This course, Humanities Data Mining, was created in 2021 by Dr. Catherine DeRose and Dr. Douglas Duhaime.

About

Materials for YData Course "Humanities Data Mining"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •