K-Means Clustering with Python and Olympian Data

This repository contains a Jupyter notebook which can be used in a workshop about k-means clustering using the 120 years of Olympic history: athletes and results dataset available on Kaggle.

Installation/Set-up

You will need miniconda (or the full anaconda) for Python 3.7. Allow it to prepend the install location to your path.
(Don't forget to source your .bash_profile so bash can find the conda binary!)
Clone this repo
Using the environment.yml file, create a new conda environment: conda env create -f environment.yml
To activate the environment, run source activate myenv.
To test that everything works, run jupyter notebook and navigate to localhost:8888/ in your browser. You should see an interface like this:

There are two versions of this notebook:

olympic_kmeans_follow_along.ipynb lets you follow along, filling in the code as you go.
olympic_kmeans.ipynb is the full notebook, with answers if you get stuck

Click on the notebook you wish to run.

Inside each notebook are several cells. When interacting with the cells, you can either be in:

Edit Mode (green border) for editing cells. Selecting a cell and hitting ENTER will put you in Edit Mode.

Command Mode (blue border) for running cells. Hitting ESCAPE on a cell in Insert Mode will put you back in Command Mode.

To run a selected cell, you can either hit the "Run" button in the top menu bar or by hitting Shift+Enter in Command Mode.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
static		static
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
olympic_kmeans.ipynb		olympic_kmeans.ipynb
olympic_kmeans_follow_along.ipynb		olympic_kmeans_follow_along.ipynb