This repo serves as a template for all the resources you need to deploy a Pangeo-style JupyterHub. The focus of this template is for hackweek hubs, which should be easy to spin up or tear down and have most of the settings that short-term users will want and need.
This project is heavily inspired from Yuvi Panda's TESS Prototype Deployment repo and directly utilizes his open-source Hubploy Template.
The two primary tools that are used for this deployment are Terraform and Hubploy.
- Terraform is used to store the cloud infrastructure as code and make deploying the Kubernetes cluster easy.
- Hubploy is used to deploy the JupyterHub software onto the cluster and enables continuous integration (CI) through GitHub Actions.
Note: This repo spins up infrastructure on AWS. If you want it on another cloud provider, it is advised that you become familiar with Hubploy, Hubploy-Template, and Terraform and build a new template yourself. In the future, this repo may expand to include deployments on other cloud providers.
You'll need the following tools installed:
- Terraform
- If you are on MacOS, you can install it with
brew install terraform
- If you are on MacOS, you can install it with
- kubectl
- If you are on MacOS, you can install it with
brew install kubectl
- If you are on MacOS, you can install it with
- awscli
- Helm
- A Docker Environment
- The Hackweek Template
- Get the template repo locally by forking the repo to your own workspace / organization and then cloning.
This module builds off of terraform-deploy
for the infrastructure management
. You can find that repo
here
; we are currently using the hackweek-template-infrastructure
branch.
It is recommended to fork the terraform-deploy
repo and host it wherever
your fork of this repo is. You can then change .gitmodules
to have the
new location of the submodule.
Get the submodule into the cloud-infrastructure
folder by running
git submodule init
This will bring in the infrastructure repo at a specific commit. You can work with it as a normal git repo by running
cd cloud-infrastructure
git checkout master
Before running any Terraform commands, you need to be authenticated to the
awscli. The cloud-infrastructure/aws-creds/
directory has all the
permissions needed for Terraform to run. You can choose to generate a user or
role to insure minimum permission levels. Both of these options are present
in iam.tf
. If you choose to use one of these, uncomment the relevent lines
and run terraform init
, then terraform apply
. You can then configure the
credentials as needed with aws configure
.
The terraform deployment needs several variable names set before it
can start. You can copy the file
cloud-infrastructure/aws/your-cluster.tfvars.template
into a file
named <your-cluster>.tfvars
, and modify the placeholders there as
appropriate. If you want extra users to be mapped to the Kubernetes masters,
add entries to map_users
(it is a list of maps).
In the cloud-infrastructure/aws/
directory, run terraform init
to
download the required Terraform plugins.
Then, run terraform apply -var-file=<path/to/your-cluster.tfvars>
, look
through the resources Terraform plans to spin up, and then type yes.
The cluster creation process occasionally errors out. Re-running the previous command again will usually succeed. The infrastructure can take 15-20 minutes to create, so feel free to open another Terminal and continue with most of the next section while you wait.
When the infrastructure is created, run
aws eks update-kubeconfig --region=<your-region> --name=<your-cluster>
with
values from your-cluster.tfvars
to allow kubectl
to access the cluster.
python3 -m venv .
source bin/activate
python3 -m pip install -r requirements.txt
Each directory inside deployments/
represents an installation of
JupyterHub. The default is called hackweek-hub
in this repo, you are
recommended to change it to be more specific. You need to git commit
the
change as well:
git mv deployments/hackweek-hub deployments/<your-hub-name>
git commit
You need to find all things marked TODO and fill them in. In particular,
hubploy.yaml
needs information about where your docker registry & kubernetes cluster is, and paths to access keys as well.secrets/prod.yaml
andsecrets/staging.yaml
require secure random keys you can generate and fill in.
-
Make sure tha appropriate docker credential helper is installed, so hubploy can push to the registry you need.
- For AWS, you need
docker-ecr-credential-helper
- For AWS, you need
-
Make sure you are in your repo's root directory, so hubploy can find the directory structure it expects.
-
Build and push the image to the registry
hubploy build <hub-name> --push --check-registry
This should check if the user image for your hub needs to be rebuilt, and if so, it’ll build and push it.
Note: This step will fail unless your infrastructure is built and you have
run the aws eks update-kubeconfig
command above.
Each hub will always have two versions - a staging hub that isn’t used by actual users, and a production hub that is. These two should be kept as similar as possible, so you can fearlessly test stuff on the staging hub without feaer that it is going to crash & burn when deployed to production.
To deploy to the staging hub,
hubploy deploy <hub-name> hub staging
This could take a few minutes, but eventually return successfully. You can then find the public IP of your hub with:
kubectl -n <hub-name>-staging get svc proxy-public
If you access that, you should be able to get in with any username & password. It might take a minute to be able to be accessible.
The defaults provision each user their own EBS / Persistent Disk, so this can get expensive quickly :) Watch out!
You can now customize your hub in two major ways:
-
Customize the hub image.
repo2docker
is used to build the image, so you can put any of the supported configuration files underdeployments/<hub-image>/image
. You must make a git commit after modifying this forhubploy build <hub-name> --push --check-registry
to work, since it uses the commit hash as the image tag. -
Customize hub configuration with various YAML files.
hub/values.yaml
is common to all hubs that exist in this repo (multiple
hubs can live under deployments/
).
deployments/<hub-name>/config/common.yaml
is where most of the config
specific to each hub should go. Examples include memory / cpu limits,
home directory definitions, etc
deployments/<hub-name>/config/staging.yaml
and
deployments/<hub-name>/config/prod.yaml
are files specific to the staging &
prod versions of the hub. These should be as minimal as possible. Ideally,
only DNS entries, IP addresses, should be here.
deployments/<hub-name>/secrets/staging.yaml
and
deployments/<hub-name>/secrets/prod.yaml
should contain information that
mustn't be public. This would be proxy / hub secret tokens, any
authentication tokens you have, etc. These files must be protected by
something like git-crypt
or
sops
.
THIS REPO TEMPLATE DOES NOT HAVE THIS PROTECTION SET UP YET
You can customize the staging hub, deploy it with
hubploy deploy <hub-name> hub staging
, and iterate until you like how it
behaves.
You can then do a production deployment with:
hubploy deploy <hub-name> hub prod
, and test it out!
git-crypt
is used to keep encrypted
secrets in the git repository.
- Install git-crypt. You can get it from brew or your package manager.
- In your repo, initialize it:
git crypt init
- The
.gitattributes
file has the configuration of files it will encrypt. - Make a copy of the encryption key. This will be used to decrypt the
secrets. You will need to share it with your CD provider (CircleCI, GitHub
Actions) and anyone else who will be using the repo for the same hub as you.
git crypt export-key key
puts the key in a file called 'key'.
- Get a base64 copy of your key:
cat key | base64
- Put it as a secret named GIT_CRYPT_KEY in GitHub Secrets.
- Make sure you change
hackweek-hub
to your deployment's name in the workflows under.github/workflows/
. - Push to the staging branch and check out GitHub Actions to see if the action completes.
- If the staging action succeeds, make a PR from staging to prod and merge the PR. This should also trigger an action; make sure the action completes.
Note: Always make a PR from staging to prod. Never push directly to prod. We want to keep the staging and prod branches as close to each other as possible, and this is the only long-term guaranteed way to do that.
Hopefully you had a good hackweek! Now you can remove the hub and cloud infrastructure so you stop paying for them.
You will need the helm release's name and namespace. These are procedurally
generated, usually both in the format <your-hub>-hub-<branch>
. For example,
with a deployment/
named hackweek-hub
on my staging
branch, the
helm release and namespace are both named hackweek-hub-staging
.
Uninstalling the hub from the cluster is done with
helm delete <release-name> -n <namespace-name>
For my example, this would be
helm delete hackweek-hub-staging -n hackweek-hub-staging
Before releasing, make sure that the result of kubectl get svc -A
shows no
entries with an EXTERNAL-IP
.
If this is the case, you can run
terraform destroy -var-file=<path/to/your-cluster.tfvars>
Note: This also takes 15-20 minutes and may error out as it tries to destroy a couple Kubernetes resources. Try re-running the command again.
You may now delete the GitHub Repo you set up or forked, and you are done!