This repository provides code, data, and training models to reproduce the SSO@S pipeline outlined in Hwang and Naik's (2023) paper, "Systematic Social Observation at Scale: Using Crowdsourcing and Computer Vision to Measure Visible Neighborhood Conditions." This repository accompanies data and statistical code to replicate tables and figures in the manuscript provided here.
Preferred Citation:
Hwang, J. and Naik, N. (2023). Unrestricted data and statistical code accompanying Hwang, J. and N. Naik. 2023. "Systematic Social Observation at Scale: Using Crowdsourcing and Computer Vision to Measure Visible Neighborhood Conditions". Stanford Digital Repository. Available at https://purl.stanford.edu/xy095yh6422. https://doi.org/10.25740/xy095yh6422.
Download Conda following the following online installation guide:
https://conda.io/projects/conda/en/stable/user-guide/install/download.html
Setup Conda virtual environment for all needed dependencies with the following commands:
conda env create -f environment.yml
(If on a M2 Mac use :
conda env create -f environment_mac.yml
)
conda activate gsv_trash
-
constants.py: Specifies global constants such as trash trueskill thresholds and hyperparams. Additionally, it holds information that can be used to train the resnet classifier model. IE Information such as where to store the CSV file, where to find images, where to output CSV files, etc.
-
discretize_trueskill.py: Creates csv using inputted thresholds and raw trueskill scores to produce true labels.
-
extract_vectors.py: Uses resnet model to produce feature embeddings of images.
-
build_image_directory.py: Given a directory of images and a csv of images and their labels, splits and copies into new folders based on true labels for resnet training.
-
trainer.py: Defines Trainer class that is utilized for training, checkpointing, evaluating, logging, and creating metrics for the resnet classifier training process.
-
train.py: Initiates trainer and data loaders utilized for the training process and begins the training process for the resnet.
-
image2vec.py: Class to convert images to vector embeddings used to train/test SVMs, uses trained Resnet Classifier to create the embeddings . Used to create CSV of columns: Image name, renet prediction, embedding, and label to be used in svm_classifier.py
-
model.py: Defines the Resnet backbone classifier model
-
svm_classifier.py: Given csv with image feature vectors and associated true labels, trains an SVC (or SVR if specified).
-
test_model.py: Suite of methods to help with error analysis/model testing
-
util.py: Provides miscellaneous helper functions
Inputs: image_dir (directory with all images), trueskill_csv (a csv that contains image_name and associated score)
-
use discretize_trueskill.py using the trueskill_csv produce a csv containing image_name and true label
-
run build_image_directory.py use image_dir and discretize_trueskill.py output to create labeled image directories to be used for training
-
run train.py to use the labeled images from the previous step to train a classifier model with resnet
(To read the evaluation metrics during the training process use the following command: tensorboard --logdir <LOG_DIR>)
-
run extract_vectors.py utilizes a trained classifier model to extract training and test image vectors
-
run svm_classify.py to utilize extracted feature vectors to train/test an SVM model
-
run Trash_analysis.ipynb to make an analysis of the final trained classifier model.
Outputs: Resnet Classifier/Feature Extractor, Feature extractions of the images, Trained SVM classifier on Feature extractions, Tensorboard logs
Contact Jackelyn Hwang at [email protected]
This work is licensed under a Creative Commons Attribution 4.0 International License.