Skip to content

Latest commit

 

History

History
71 lines (57 loc) · 4.93 KB

README.md

File metadata and controls

71 lines (57 loc) · 4.93 KB

Rate My Professor Gender Classifier

This project looks to explore how the writing and wording of comments (with pronouns removed) on ratemyprofessor.com (RMP) can be used to determine the professor's gender. The classification algorithms used are Naive Bayes, Rocchio Algorithm, and K-Nearest Neighbor.

This project consists of the following programs and data files:

Programs for acquiring data, processing, and classification

  • a webcrawler that crawls RMP pages for 21 universities and outputs to one file
  • a text-parser that converts the raw data file into data files of comments for individual professors
  • a text processer that tokenizes, removes stopwords, and stems the files in allData and produces prerocessedData
  • a program used to predict the gender of professors using Naive Bayes. Uses the 'leave one out' strategy, and trains on the remaining preprocessed data files
  • a program used to predict the gender of professors using Rocchio. Uses the 'leave one out' strategy, and trains on the remaining preprocessed data files
  • a program used to extract top adjectives used by students to describe male and female professors
  • a program used to preprocess comment crawler data for new format, to include regional CS professors

Data and Output Files

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

A computer with python 3.7 and the following packages installed:

  • pip
  • nltk
  • selenium
  • BeautifulSoup
  • Python 3 Virtual Environment (optional)

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments