PipalHub - JupyterHub setup for Pipal Academy
PipalHub is JuputerHub setup optimized for remote workshops of Pipal Academy.
It contains a JupyterHub server providing one Jupyter instance for each participant, a bunch of scripts to summarize changes and export notebooks as HTML so that instructor can quickly glance though the notebooks of the participants.
There are some easy-install scripts that can be used to setup the node with all dependencies that will be needed to run the server.
After setting up the node, see the section on adding users and packages for next steps.
This one script can be used from your local machine to create a new node on DigitalOcean and get PipalHub up and running on it.
Description:
This script will create a DigitalOcean droplet with the given size and name, assign a DNS entry to it, and set it up with the setup-node.sh
script.
Some defaults are hardcoded as constants at the beginning of this file. These can be changed as needed.
Prerequisites:
DIGITALOCEAN_TOKEN
to create a node and set DNS entry on it.- One of your SSH keys should be saved on DigitalOcean. This is to run the setup script on a new node over SSH.
- DNS for the base domain (default:
"pipal.in"
) should be set in digitalocean.
Usage:
$ git clone https://github.com/pipalacademy/pipalhub
$ cd pipalhub
$ DIGITALOCEAN_TOKEN="token_goes_here" python3 create-node.py --name alpha --size small --hostname alpha-lab.pipal.in
--name
can be a string: this will be the name of your droplet.--size
can be one of small, medium, large. vCPUs / memory for each size are configured in theSIZES
dict defined in create-node.py.--hostname
is the subdomain that this domain will be assigned. For example, ifBASE_DOMAIN
is configured to bepipal.in
in create-node.py, the node will become accessible at{hostname}.pipal.in
If you have a node ready, you can use install-docker.sh
and setup-node.sh
to complete the manual installation.
Please note that create-node.py
executes exactly this over SSH, so you don't have to do it separately.
This script should be run as the user that will run docker for deployment. Besides installing docker on the system, this script will also run the post install steps as the current user.
If you don't have a non-root user, you can create one with create-non-root-user.sh
.
# in case you need to create a new non root user
$ ./create-non-root-user.sh pipal
$ su pipal
# install docker and refresh group assignments
$ ./install-docker
Enter password:
...
$ newgrp docker
$ ... setup node script
Description:
setup-node.sh
can be run as a sudoer, this will setup everything to start running JupyterHub.
It assumes a fresh Ubuntu install, but it will work idempotently too. If a step fails due to some
external reason, you can run this script again after fixing it.
Implementation details:
It does these things:
- install nginx
- install docker
- clone this (pipalacademy/pipalhub) repository
- symlink path to this directory to
/var/www/pipalhub
(it'stmp/
subdirectory will be used to serve from nginx) - save the correct nginx configuration (corrected for hostname from sample one) to
etc/nginx/conf.d/lab.conf
- symlink this to
/etc/nginx/conf.d
- install certbot with nginx plugin
- use certbot to create SSL certificate (does not setup renewal)
- docker compose up
- reload nginx
Prerequisites:
- An Ubuntu 22.04 server with root access.
- Working directory should be
$HOME
Usage:
$ setup-node.sh hostname.pipal.in
TODO: This functionality can be added to the dashboard service
To add users to JupyterHub server,
- SSH into the machine
- Append usernames in the format
username:password
to~/pipalhub/etc/jupyterhub/users.txt
. There should be one user on each line. Refer to this sample file for an example. - Restart the containers with
docker compose restart
(from the~/pipalhub
directory)
dev@home:~$ ssh [email protected] # 1. ssh into the host machine
$ cd pipalhub
$ echo 'bob:bobs_password' >> etc/jupyterhub/users.txt
$ docker compose restart
Adding packages is similar, except this time you need to edit a etc/jupyterhub/requirements.txt
file and repeat the same steps.
This is the directory structure, after ignoring some directories / files that aren't relevant:
├── Readme.md
├── create-node.py
├── docker-compose.yml
├── etc
│ ├── jupyter
│ │ ├── jupyter_notebook_config.py
│ │ └── lab
│ │ └── docmanager.jupyterlab-settings
│ ├── jupyterhub
│ │ ├── jupyterhub_config.py
│ │ └── users.txt.sample
│ └── nginx
│ ├── conf.d
│ │ └── lab.conf.sample
│ └── default.conf
├── home
│ └── Readme.txt
├── services
│ └── dashboard_service
│ ├── dashboard_service.py
│ ├── javascripts
│ │ └── poll.js
│ ├── launch.sh
│ ├── requirements.txt
│ └── scripts
│ ├── build.py
│ ├── build.sh
│ └── ipytail.py
├── setup-node.sh
└── tmp
└── Readme.txt
There is a single docker container that contains JupyterHub and student servers. It is setup with docker compose. A web server configuration (nginx configuration provided) should be setup on the host to expose this container over a domain.
The container also contains a Dashboard service that is a Flask app which
runs some action when a notebook is saved. For now, this is the build script that
updates summaries of notebooks when a student makes a save. The scripts/
directory stores these.
The javascripts/
directory has a polling script that uses the endpoint exposed by dashboard-service
to allow a developer to perform some action on frontend when a particular event (such as save of a notebook)
is logged. For example, this can be used to notify the user on the frontend that an update is available
to the summary page.
Configuration files are kept in etc/
. Of these jupyterhub/
and jupyter/
are for inside the
docker container and nginx/
is for the host.
The jupyterhub container is configured using docker compose
. docker-compose.yml
lists several volume mounts and an expose port.
The volume mounts are either for sharing configuration with the container or for ease of visibility for the trainer.
Related issue: #6
This is a JupyterHub-managed service, i.e. the related process is started and stopped by JupyterHub.
We only need to configure it in jupyterhub_config.py
,
with the command that needs to run.
Currently this is a Flask app that is started on port 10101 using environment variables
configured in jupyterhub_config.py
. JupyterHub will also create a reverse proxy endpoint
on its server to this service. So, this dashboard service will be accessible at https://hostname.pipal.in/services/dashboard
,
and we won't need some separate nginx configuration for this.
The launch.sh
script for this installs dependencies needed for it to function with pip (flask, pydantic)
before starting it. This may change in the future, a possible solution would be to have a single
requirements.txt
for the instance that comes with defaults but can be changed by the trainer.
/events
supports GET
and POST
methods.
GET /events
can also be combined with filters as query params.
These are example requests/responses:
POST /events
{
"type": "test-event",
"user": "alice",
"filename": "module1-day1.ipynb",
"path": "/home/alice/module1-day1.ipynb",
"timestamp": "2022-11-14T16:00:00.511Z"
}
--- response:
201 CREATED
{
"id": 1,
"type": "test-event",
"user": "alice",
"filename": "module1-day1.ipynb",
"path": "/home/alice/module1-day1.ipynb",
"timestamp": "2022-11-14T16:00:00.511000+00:00"
}
Note that if the client doesn't send a timestamp, the server won't raise an error but rather default to using the current timestamp.
GET /events
--- response:
200 OK
[
{
"id": 1,
"type": "test-event",
"user": "alice",
"filename": "module1-day1.ipynb",
"path": "/home/alice/module1-day1.ipynb",
"timestamp": "2022-11-14T16:00:00.511000+00:00"
},
{
"id": 2,
"type": "test-event",
"user": "bob",
"filename": "module1-day1.ipynb",
"path": "/home/bob/module1-day1.ipynb",
"timestamp": "2022-11-15T16:00:00.511000+00:00"
}
]
Listing can also have filters. Filtering can be done on any field in the returned JSON. Example:
GET /events?user=alice
--- response:
[
{
"id": 1,
"type": "test-event",
"user": "alice",
"filename": "module1-day1.ipynb",
"path": "/home/alice/module1-day1.ipynb",
"timestamp": "2022-11-14T16:00:00.511000+00:00"
}
]
There is a bunch of configuration in etc/
that is needed for JupyterHub and JupyterLab to
function as we want.
Important configurations set in this file:
c.Spawner.default_url = '/lab'
: This sets the default page for a student server to the JupyterLab interface. (alternative is the classic notebook, that can be achieved by setting this to''
instead)c.Spawner.pre_spawn_hook = bootstrap_user_env
: Some configuration can be set once and applied for all users, but sometimes this may not be possible and we need to explicitly set configuration on a per-user basis. The pre-spawn-hook currently does that before spawning the student server. It does that by copying some default configuration to the per-user configuration directory.c.JupyterHub.services = ...
: This lists JupyterHub services. We currently have one service called dashboard, as described above.
Note that this configuration may move to jupyter_server_config.py
in the future, as is done
in later versions of Jupyter.
This configuration file is for the spawned Jupyter notebooks. It is shared by all users.
At the moment, this has an important responsibility to send a "save-notebook"
event to
the dashboard service whenever someone saves a notebook. It does this
by setting a post_save_hook
,
a function that runs in the active user's environment when a save is made.
This should house configuration that needs to be explicitly copied into per-user environments. Currently there is only one such configuration for the docmanager plugin to default to a lower autosave interval.
This is being done by the pre_spawn_hook
in etc/jupyterhub/jupyterhub_config.py
at the moment, but should be moved elsewhere in the future. Ideally so that it can run whenever
a new system user is created rather than before spawn.
Clone the repo:
$ git clone git://github.com/pipalacademy/pipalhub.git
$ cd pipalhub
Setup users:
$ cp etc/jupyterhub/users.txt.sample etc/jupyterhub/users.txt
Edit etc/jupyterhub/users.txt
to add more users.
Setup nginx:
$ cd etc/nginx/conf.d
$ cp lab.conf.sample lab.conf
$ sudo ln -s "$(pwd)/lab.conf" /etc/nginx/conf.d/lab.conf
$ cd -
For production, cp lab-ssl.conf.sample to lab.conf and edit the file to set the hostname and ssl certificates.
Start the lab:
$ docker-compose up -d
Reload nginx:
$ sudo systemctl reload nginx