Changes

Jonas Karlsson · 04f33cf8
--- a/home.md
+++ b/home.md
@@ -112,37 +112,133 @@ graph TD;
 As can be seen there are many layers (I have omitted the step where the swarm select the worker) and where a particular piece of code is executed is somewhat difficult to decipher.
 Some of the code that you specify in the configuration file happens at the top, ie inside the jupyterhub container, some (ie what command to run) happens at the worker node, some happens inside the notebook container. Then there is the whole interaction between docker.py and DockerSwarmAPI (which differs from docker the command line client) as well as there is a difference between capabilities when running a container locally and using the swarm API, puuh. 
-Well anyway after we deciphered most of that we got notebooks up and running but all content that a user saved got lost when they shutoff their notebooks!
+Well anyway after we deciphered where most of the code ran we got notebooks up and running but all content that a user saved got lost when they shutoff their notebooks!
 Not good, but lets get back to that later and first talk about user authorization. 
 ## Centralized login and gpu access restrictions
 We choose between multiple options but since we have a gitlab server (that authorize the students via the central IT system/ie ldap) we resued that as a oath provider. 
 Luckily (well we would not have chosen this path if it has not been) a oauthenticator is implemented for jupyterhub and they even have a specific gitlab version. 
-Now for the oath dance to work we need internet access and we need incomming connections but we do not want everyone to access our jupyterhub directly so we came up with some [firewall rules](https://git.cs.kau.se/jonakarl/jupyterhub/-/blob/master/host_files/etc/rc.local) to block access. 
+Now for the oath dance to work we need internet access and we need incoming connections (for the callback) but we do not want everyone to access the jupyterhub container directly so we came up with some [firewall rules](https://git.cs.kau.se/jonakarl/jupyterhub/-/blob/master/host_files/etc/rc.local) to only allow access that has first been initiated from the hub (or any of the workers). 
-Other than that it was mostly to configure gitlab to act as a oath provider and generate the needed secrets and keys for that to work. 
+Other than that it was mostly to configure gitlab to act as a [oath service provider](https://docs.gitlab.com/ee/integration/oauth_provider.html) and generate the needed secrets and keys for that to work. 
 Only caveate is that it will use gitlabs username as the jupyterhub username and this might differ from what is used in "upstream" authorization sources like ldap. 
 To differentiate between persons that should be able to use GPU or not we "abused" gitlabs group feature and so that if a user is part of a specific gitlab group he/she/it will be given different options.
+However for this to work we needed to first create the projects and then we used a bot user (with minimal access) to generate a read only PAT to access these groups.  
 ## Persistant storage 
 Our users would be very annoyed if  their files was lost on every restart of there notebook, therefore we needed some place to permanently store the files/folders.
-Furthermore, since the notebooks ran as containers on arbitrary nodes we could not map a local folder directly. 
+Furthermore, since the notebooks ran on arbitrary nodes in the swarm we could not map a local folder directly. However, once again we where lucky as docker volumes can mount nfs shares. 
-Once again we where lucky as docker volumes can mount nfs shares. 
-So we setup the manager node (where we run jupyterhub) to act as a nfs server. That way we could create the needed homefolders for the users on-demand when the users log in the first time. 
+So we setup the manager node (where we run jupyterhub) to act as a nfs server (not needed but this made the design simpler). That way we could create the needed homefolders for the users on-demand when the users log in the first time. The ned goal is to use a seperate nfs server (and mount this via volume in the jupyterhub container) 
 **All great ?***
-Well, it adds a extra dependency on each client (needs the nfs-client package installed) and docker volumes are persistent even if something went wrong.
+Well, the nfs part adds a extra dependency on each client (needs the nfs-client package installed) and this combined with that docker volumes are persistent even if something went wrong lead top all sort of problems.
-This leads to that if something is wrong when the volume is created, like missing nfs client or network error or basically anything, the creation of the notebook will fail and continue to fail for all eternity (until the volume is deleted). 
+If something is wrong when the volume is created, like missing nfs client or network error or basically anything, the creation of the notebook will work but the container will exit immediately as it can't find the folders it expect. The swarm will detect that the container exited directly and try to restart it (probably on the same node) and since the volumes are persistent even if the underlying problem is fixed  the container will continue to fail for all eternity (until the volume is deleted manually and the error is fixed). 
 It also gives no easy to understand error message so our first line of defense when something is not working is to delete all volumes and try again (usually that works fine). 
-But once all is configured currectly the solution works fine. 
+But once all is configured correctly the solution works fine. 
+## Networking, notebook images and other small tidbits 
+*Network*
+For the notebooks to connect to the hub we use a overlay network that we called jupyterhub_network. In theory this should only need to be created on the manager and it will be dynamically alllocated on the worker nodes when a container is started there. That is theory, and in the most simple case that works fine. However, when we started to add further restrictions like GPU the containers could not be spawned anymore. After much investigation we found out that the swarm resource manager looks if the network (jupyterhub_network) exists on the node and (as it looks) creates it if it does not exist but only if no other requirements have been specified. So if we specify that we require GPU support (more on that later) the resource manager fails to solve the situation since that even if there is GPU resources available the network does not exist and then it fails. 
+*Notebook Images*
+Our notebook is rather large (as we have a lot of stuff in it) so it take quite some time to download it to a node. Jupyterhub does not really display so well what is happening in the network so for the user it just looks like it takes forever to start the notebook (until it timeout). Furthermore the notebook image must also exits on the manger node. We never start any notebook on the manager but the swarmspawner always checks for the existence of the image (on the manager) before it starts a notebook. Here we also discovered that it is very wise nto *NOT* use the `:latest` tag on a image but instead use a another scheme (we choose date for when the image is pushed to the repo). Using `:latest` fails for two reasons:
+1. Swarm services works on images and tags so even if you tell a service to update it will only pull a new image if the image name or tag has changed (ie it does not check for updates on the repo of the same `image:tag`
+2. It makes it harder to see what version you are running. 
+*Missing network and notebooks image solution*
+To solve the network issue we manually create a service that runs on every node that only request the jupyterhub_network, we call this service "network-keeper" see this [file](https://git.cs.kau.se/jonakarl/jupyterhub/-/blob/master/start.sh). This way the network will always be present on all nodes in the swarm. That service also depends on the latest notebook image so that all nodes will be pre-populated with the latest image.
+*Environment variables in docker-compose*
+To deploy jupyterhub we use [docker-compose.yml](https://git.cs.kau.se/jonakarl/jupyterhub/-/blob/master/docker-compose.yml) the problem is that we do not want write credentials in that file (and by mistake check those into the repo). 
+While docker-compose support environment files docker stack deploy does not although they share the same config file format. However with some small [shell tricks](https://git.cs.kau.se/jonakarl/jupyterhub/-/blob/master/start.sh) we export a .env file and use that in the docker-compse and thereby mimic what can be used for a proper docker-compose file. 
 ## GPU
 Ooh how much info there is on the Internet for setting up docker with nvidia and get GPU support in containers.
-While some of that information is old and some is incorrect basically all of the information comes without explanation on why you should set    
+While some of that information is old and incorrect most of it is correct but basically all of the information comes without explanation on why you should set the parameters as you do.     
+**What to do on each GPU worker**
+1. Find out what the latest nvidia driver is (currently 450) (https://www.nvidia.com/Download/driverResults.aspx/163238/en-us) and install it: 
+```shell
+add-apt-repository ppa:graphics-drivers
+apt update
+apt purge nvidia-*
+apt install nvidia-driver-450
+reboot
+```
+2. Install docker from (https://docs.docker.com/engine/install/ubuntu/)
+```
+curl -fsSL https://get.docker.com -o get-docker.sh
+sh get-docker.sh
+```
+  - join the cluster
+3. Install docker gpu support from https://github.com/NVIDIA/nvidia-docker
+```
+# Add the package repositories
+distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
+curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
+curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
+sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit 
+```
+4. Add "legacy support"
+```
+apt-get install nvidia-docker2
+```
+5. Find the GPU UUID(s) `nvidia-smi -a` (https://gist.github.com/tomlankhorst/33da3c4b9edbde5c83fc1244f010815c)
+* Take the UUID of the gpu(s) and modify `/etc/docker/daemon.json` so it looks like this
+```
+{
+    "default-runtime": "nvidia",
+    "runtimes": {
+        "nvidia": {
+            "path": "nvidia-container-runtime",
+            "runtimeArgs": []
+        }
+    },
+    "node-generic-resources": [ 
+       "GPU=GPU-73455a8b-be0d-9fc2-62c5-f26827e0c02b",
+       "GPU=GPU-8c2fffbc-8334-c950-1686-dbdbd6b53595"
+    ]
+}
+```
+6. add `swarm-resource = "DOCKER_RESOURCE_GPU"` to `/etc/nvidia-container-runtime/config.toml`
+7. `systemctl restart docker`
+8. In jupyterhub you need to define `NVIDIA_VISIBLE_DEVICES`=None
+Done. 
+**Why**
+1. You need the drivers  dooh
+2. You need docker dooh
+3. You need docker support for GPU (maybe not needed if only running via the swarm but we install it anyway) 
+4. Since docker swarm (kit) not yet support the gpu feature flag you need the old nvidia-gpu version that use a specific startup script to map the necessary devices etc. (both the new and the old can run fine together)
+5. Since swarmkit (or was it docker.py?)  does not (yet) support to specify what runtime to use you need to specify that you will always use `nvidia-container-runtime`
+   - The `nvidia-container-runtime` looks for a magic ENV variable (`NVIDIA_VISIBLE_DEVICES`) to see if it should start a "normal" container ie passthrough to runc (the default runtime) or if it should start a nvidia container with access to the GPU(s). 
+   - You also need to add the GPU serials in daemon.json since this is how the swarm finds and allocates the resources. 
+6. This line tell `nvidia-container-runtime` what environment variable docker swarm use to specify which dynamically allocated resource is used.
+   - eg if GPU is changed to COW in `/etc/nvidia-container-runtime/config.toml` then daemon.json also need to be changed so GPU= becomes COW= (and our jupyterhub config also need to be adapted accordingly).
+7. Needed so docker pick up the changes. 
+8. `NVIDIA_VISIBLE_DEVICES` needs to exist but needs to be None
+   - If it does not exists; `nvidia-container-runtime` will run runc (ie not map teh gpus at all) 
+   - If it is not None; `DOCKER_RESOURCE_GPU` will not be used (and by default all GPUs will be visible)
+**caveate**
+If you upgrade the drivers or the kernel a reboot is necessary as the drivers and the libs need to match otherwise docker will not run.