CS Conference Gender Analysis

Overview

This repository attempts to analyze the gender of first authors of papers at various conferences. There are several caveats here. Inferring gender based on name is never exact and the accuracy of this method has not been tested at all so any results should be considered suspect. Aside from manually labelling the gender of each author (also a difficult and potentially error-prone task), there are several approaches that could improve the accuracy of this method. For example, attempting to fetch the country of the author's affiliation could provide a more accurate prediction.

Dependencies

We make use of the genderComputer library for gender inference which is installed as a submodule. Therefore it is necessary to run git submodule update --init to fetch submodules in this repository. We also make use of Pipenv to manage dependencies, so this must be installed first as well. To install other dependencies, run pipenv install.

Running

The downloaded files can be analyzed by running the following command:

pipenv run python analyze_genders.py

This will print a CSV file with inferred counts of first authors by gender. You can also use this notebook for further analysis.

Adding a new conference

To add a new conference, simply edit fetch-papers.sh to retrieve new JSON data files. The files should be named CONF-xx.json where CONF is the name of the conference and xx is the year. The link to the JSON files can be obtained by looking at the table of contents for the proceedings in DBLP and selecting the JSON export link. Since data coming from DBLP is CC0 and can be freely shared, any new data files should be committed to this repository.

Fetching data from Scopus

To fetch data from Scopus, you will need an API key. This API key should be set in the .env file as SCOPUS_API_KEY. Data from Scopus can then be fetched by running fetch-scopus.sh. This will fetch all data on DB conferences from Scopus where a DOI is available from DBLP and save to scopus.json. Note that this requires the installation of jq to process the JSON from DBLP.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
data		data
output		output
vendor		vendor
.env		.env
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
analyze_genders.py		analyze_genders.py
authors_of_all_fields_gender_gapi.csv		authors_of_all_fields_gender_gapi.csv
check-notebook-changes.sh		check-notebook-changes.sh
cs-paper-gender-analysis.ipynb		cs-paper-gender-analysis.ipynb
fetch-papers.sh		fetch-papers.sh
fetch-scopus.sh		fetch-scopus.sh
keywords.py		keywords.py
paper_analysis.py		paper_analysis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS Conference Gender Analysis

Overview

Dependencies

Running

Adding a new conference

Fetching data from Scopus

About

Releases

Packages

Languages

dataunitylab/paper-gender-analysis

Folders and files

Latest commit

History

Repository files navigation

CS Conference Gender Analysis

Overview

Dependencies

Running

Adding a new conference

Fetching data from Scopus

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages