college-emails

Colleges love to send email advertisements, so much so that it becomes inbox clutter. This project serves to analyze this spam, and look at some interesting trends in the emails I've received in the past year regarding college.

`/clients`

A Svelte frontend of statistics hosted on Netlify.

`/scripts`

A number of Node JS scripts to parse emails and get college data. These are designed to be used through Github Actions, but can also be run locally.

`/scripts/index.js`

Run all scripts, including downloading emails and generating statistics.

`/scripts/utils`

A set of utilities used to download and parse the data found in data.

Run Locally

To create the same type of visualization locally for your own emails, follow these steps.

Setup

Clone this repository (git clone https://github.com/louismeunier/college-emails.git)
Delete client/src/data.json, client/src/dates.json, and client/src/updated.json.
While in the directory containing the repo, run cd client && yarn && cd ../scripts && yarn to install dependencies.

Authentication

To access your emails, you'll need to authenticate with the GMail API.
Follow these steps to create the project.
Enable the GMail API with the scope 'https://www.googleapis.com/auth/gmail.readonly'
IMPORTANT: Make sure you add your email address as a tester for your application. Otherwise, as your project is unverified, it will not work.
Download your credentials as JSON, and save it to scripts/credentials.json
Run node scripts, and if your setup was done correctly, it should prompt you to visit a URL and authenticate. This should save a file scripts/token.json.

Generating the data

Run node scripts a second time. It should now actually run the program, and regularly print output to the screen indicating progress.
Note: it can take quite a while for the scripts to run, around 1.5 minutes per 1000 emails.
When completed, the scripts should print some tables of output, as well as some statistics of how well the run went.
It should also have created 3 new files, client/src/dev_data.json, client/src/dev_data.json, and client/src/dev_updated.json. Delete dev_ from each of these.

Creating the visuals

Run cd client && yarn dev.

Data Credits

The dataset used containing college websites, names, locations, etc. was found here.

Note

Because of the way emails are linked to their respective college (via the domain name of the sender), there are some emails that are unable to be linked to a college and are thus not included in the final statistics. This, however, only accounts for ~2% of all the emails parsed per run.

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
.github/workflows		.github/workflows
client		client
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

college-emails

`/clients`

`/scripts`

`/scripts/index.js`

`/scripts/utils`

Run Locally

Setup

Authentication

Generating the data

Creating the visuals

Data Credits

Note

About

Releases

Contributors 2

Languages

License

louismeunier/college-emails

Folders and files

Latest commit

History

Repository files navigation

college-emails

/clients

/scripts

/scripts/index.js

/scripts/utils

Run Locally

Setup

Authentication

Generating the data

Creating the visuals

Data Credits

Note

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Contributors 2

Languages

`/clients`

`/scripts`

`/scripts/index.js`

`/scripts/utils`