Skip to content

Given reviews of a professor(after removal of gender related pronouns), uses Naive Bayes Algorithm to classify their gender.

License

Notifications You must be signed in to change notification settings

hrohil/rmfGenderClassifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rate My Professor Gender Classifier

This project looks to explore how the writing and wording of comments (with pronouns removed) on ratemyprofessor.com (RMP) can be used to determine the professor's gender. The classification algorithms used are Naive Bayes, Rocchio Algorithm, and K-Nearest Neighbor.

This project consists of the following programs and data files:

Programs for acquiring data, processing, and classification

  • a webcrawler that crawls RMP pages for 21 universities and outputs to one file
  • a text-parser that converts the raw data file into data files of comments for individual professors
  • a text processer that tokenizes, removes stopwords, and stems the files in allData and produces prerocessedData
  • a program used to predict the gender of professors using Naive Bayes. Uses the 'leave one out' strategy, and trains on the remaining preprocessed data files
  • a program used to predict the gender of professors using Rocchio. Uses the 'leave one out' strategy, and trains on the remaining preprocessed data files
  • a program used to extract top adjectives used by students to describe male and female professors
  • a program used to preprocess comment crawler data for new format, to include regional CS professors

Data and Output Files

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

A computer with python 3.7 and the following packages installed:

  • pip
  • nltk
  • selenium
  • BeautifulSoup
  • Python 3 Virtual Environment (optional)

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

About

Given reviews of a professor(after removal of gender related pronouns), uses Naive Bayes Algorithm to classify their gender.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages