... | ... | @@ -59,8 +59,7 @@ Task: |
|
|
#### How to do install a multi user jupyterhub that scales
|
|
|
|
|
|
##### 1. [The Littlest JupyterHub](https://tljh.jupyter.org/en/latest/)
|
|
|
The simplest installation of Jupyterhub is the superb script/package [The Littlest JupyterHub]
|
|
|
(https://tljh.jupyter.org/en/latest/). While it supports multiple users it runs on a single server does not scale well alas did not fit our use case well.
|
|
|
The simplest installation of Jupyterhub is the superb script/package [The Littlest JupyterHub](https://tljh.jupyter.org/en/latest/). While it supports multiple users it runs on a single server does not scale well alas did not fit our use case well.
|
|
|
|
|
|
##### 2. [Jupyter Enterprise Gateway](https://jupyter-enterprise-gateway.readthedocs.io/en/latest/)
|
|
|
We next tried with a manually installed jupyterhub and [Jupyter Enterprise Gateway](https://jupyter-enterprise-gateway.readthedocs.io/en/latest/).
|
... | ... | @@ -77,11 +76,10 @@ However also this setup did not fly for us in the end. I think we failed mostly |
|
|
|
|
|
In hindsight this (jupyterhub on kubernetes) is probably the most flexible solution and would solve some of the "bugs"/problems we discovered but at that time we thought kubernetes was too much overkill.
|
|
|
|
|
|
##### 4. [SwarmSpawner doc](https://jupyterhub-dockerspawner.readthedocs.io/en/latest/) --- this is the solution we in the end used.
|
|
|
##### 4. [SwarmSpawner](https://jupyterhub-dockerspawner.readthedocs.io/en/latest/) (the solution we in the end used)
|
|
|
While [Jupyter Enterprise Gateway](https://jupyter-enterprise-gateway.readthedocs.io/en/latest/) pointed us to containerization [Zero to JupyterHub with Kubernetes](https://zero-to-jupyterhub.readthedocs.io/en/latest/) made us more cluster savvy and made us look at a simpler cluster solution that is actually built into docker itself [docker swarm](https://docs.docker.com/engine/swarm/) and [DockerSpawner](https://github.com/jupyterhub/dockerspawner).
|
|
|
|
|
|
In comparison to kubernetes, docker swarm is a breeze to install and there is basically no maintenance.
|
|
|
Installation
|
|
|
In comparison to kubernetes, docker swarm is a breeze to install.
|
|
|
1. Install docker
|
|
|
2. on a manager do `docker swarm init --advertise-addr IP-OF-SWARM-FACING-INTERFACE`
|
|
|
3. on a worker do `docker swarm join --token TOKEN-FROM-INIT-COMMAND IP-OF-SWARM-FACING-INTERFACE:2377`
|
... | ... | @@ -91,8 +89,31 @@ Voila your cluster is up and work, for upgrading just upgrade you docker install |
|
|
|
|
|
Similarly [SwarmSpawner (code)](https://github.com/jupyterhub/dockerspawner) is a extension of [DockersSpawner](https://github.com/jupyterhub/dockerspawner) (same repo). We therefore could start with getting docker spawner to spawn our custom notebooks first (yaay) and then later move onto [SwarmSpawner (doc)](https://jupyterhub-dockerspawner.readthedocs.io/en/latest/spawner-types.html#swarmspawner).
|
|
|
|
|
|
#### How to do install a multi user jupyterhub that scales (continued)
|
|
|
Now that we decided on a installation method that we kind of understood and was happy with we needed to configure it both so that it worked for GPU as well as non GPU work (most work is non GPU and we have far more non GPU machines/use cases).
|
|
|
In general jupyterhub is fairly simple to configure as it only have one config file (which in reality is a python file where you can also run code).
|
|
|
|
|
|
|
|
|
##### What happens where?
|
|
|
|
|
|
I think this is the question we asked us most times.
|
|
|
|
|
|
In overview the system looks like this:
|
|
|
```mermaid
|
|
|
graph TD;
|
|
|
Jupyterhub-->SwarmSpawner;
|
|
|
SwarmSpawner-->docker.py;
|
|
|
docker.py-->DockerSwarmAPI;
|
|
|
DockerSwarmAPI-->DockerDaemon-manager;
|
|
|
DockerDaemon-manager-->DockerDaemon-worker;
|
|
|
DockerDaemon-worker-->Notebook-container;
|
|
|
```
|
|
|
|
|
|
As can be seen there are many layers (I have omitted the step where the swarm select the worker) and where a particular piece of code is executed is somewhat difficult to decipher.
|
|
|
Some of the code that you specify in the configuration file happens at the top, ie inside the jupyterhub container, some (ie what command to run) happens at the worker node, some happens inside the notebook container. Then there is the whole interaction between docker.py and DockerSwarmAPI (which differs from docker the command line client) as well as there is a difference between capabilities when running a container locally and using the swarm API, puuh.
|
|
|
|
|
|
Well anyway after we deciphered most of that we got notebooks up and running but all content that a user saved got lost when they shutoff their notebooks!
|
|
|
Not good, but lets get back to that and first talk about user authorization.
|
|
|
|
|
|
## Centralized login and gpu access restrictions
|
|
|
We choose between multiple options but since we have a gitlab server (that authorize the students via the central IT system/ie ldap) we resued that as a oath provider.
|
... | ... | |