This blog post tutorial shows how a scalable and high-performance environment for machine learning can be set up using the ingredients GPUs, Kubernetes clusters, Dask and Jupyter. In the first article of our blog series we have set up a Kubernetes cluster with access to GPUs. In this part we will add containerized applications to the cluster to be able to run data processing workloads in our cluster. Being more precise: we will prepare a notebook image that has CUDA installed which is required if we want to use GPU-based frameworks. Furthermore, the image should contain Dask, Rapids and Dask-Rapids. As soon as the image is ready, we will deploy JupyterHub which spawns said notebook image as a container for each user.
We will use JupyterLab notebooks as an interactive environment to start our data processing algorithms. In other words, JupyterLab will act as a Dask client. As we want to provide an environment not only for one data scientist but for a group of users, we decided to install JupyterHub on our Kubernetes cluster. JupyterHub makes it possible to serve a pre-configured data science environment to a group of users.
Permissions for Dask-Clients
At first, we have to care about the permissions of our JupyterLab instances. When being used as a Dask-Client it needs to have sufficient permissions to start new pods acting as Dask-workers. As we decided to install JupyterHub, no extra configuration is required since JupyterHub uses a Service Account with sufficient permissions per default. If you want to use Dask from a different environment, you will have to make sure to grant correct permissions for your client to create, delete, view etc. of your Dask-worker-pods via a Service Account.
Docker Image for Jupyter
JupyterHub is a multi-tenant version of JupyterLab. The hub creates a pod in the cluster for each user and pulls the notebook image that runs on that pod. There are official Jupyter-specific images like the Minimal-Notebook or the Data-Science-Notebook that are ready to use. However, to use the Rapids-Library, CUDA Toolkit is required. So we cannot use these base images and simply add Rapids and Dask to it.
It seems to be a good idea to create a base image which contains Jupyter and CUDA and use it to build an image with Rapids and Dask. Since Rapids and Dask are still in development and new versions are released frequently, keeping Jupyter and CUDA as a separate base image will make it easier to maintain our final image.
Fortunately, there are not only official Notebook images but also official images from NVIDIA with CUDA. We can simply combine both images. We will use the the base-notebook Dockerfile from here and the 10.2-base-ubuntu-18.04 CUDA 10.2 Dockerfile from here . We then combine both of them into a single image. Keep in mind that for the base-notebook you need to have following files together with your Dockerfile:
- fix-permissions
- jupyter_notebook_config.py
- start.sh
- start-notebook-sh
- start-singleuser.sh
All these files can be found in the base-notebook registry from the link above. The resulting Dockerfile is listed below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 |
ARG ROOT_CONTAINER=ubuntu:bionic-20200311@sha256:e5dd9dbb37df5b731a6688fa49f4003359f6f126958c9c928f937bec69836320 ARG BASE_CONTAINER=$ROOT_CONTAINER FROM $BASE_CONTAINER LABEL maintainer="Jupyter Project <jupyter@googlegroups.com>" ARG NB_USER="jovyan" ARG NB_UID="1000" ARG NB_GID="100" USER root # Install all OS dependencies for notebook server that starts but lacks all # features (e.g., download as all possible file formats) ENV DEBIAN_FRONTEND noninteractive RUN apt-get update \ && apt-get install -yq --no-install-recommends \ wget \ bzip2 \ ca-certificates \ sudo \ locales \ fonts-liberation \ run-one \ && apt-get clean && rm -rf /var/lib/apt/lists/* RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen && \ locale-gen # Configure environment ENV CONDA_DIR=/opt/conda \ SHELL=/bin/bash \ NB_USER=$NB_USER \ NB_UID=$NB_UID \ NB_GID=$NB_GID \ LC_ALL=en_US.UTF-8 \ LANG=en_US.UTF-8 \ LANGUAGE=en_US.UTF-8 ENV PATH=$CONDA_DIR/bin:$PATH \ HOME=/home/$NB_USER # Copy a script that we will use to correct permissions after running certain commands COPY fix-permissions /usr/local/bin/fix-permissions RUN chmod a+rx /usr/local/bin/fix-permissions # Enable prompt color in the skeleton .bashrc before creating the default NB_USER RUN sed -i 's/^#force_color_prompt=yes/force_color_prompt=yes/' /etc/skel/.bashrc # Create NB_USER wtih name jovyan user with UID=1000 and in the 'users' group # and make sure these dirs are writable by the `users` group. RUN echo "auth requisite pam_deny.so" >> /etc/pam.d/su && \ sed -i.bak -e 's/^%admin/#%admin/' /etc/sudoers && \ sed -i.bak -e 's/^%sudo/#%sudo/' /etc/sudoers && \ useradd -m -s /bin/bash -N -u $NB_UID $NB_USER && \ mkdir -p $CONDA_DIR && \ chown $NB_USER:$NB_GID $CONDA_DIR && \ chmod g+w /etc/passwd && \ fix-permissions $HOME && \ fix-permissions $CONDA_DIR USER $NB_UID WORKDIR $HOME ARG PYTHON_VERSION=default # Setup work directory for backward-compatibility RUN mkdir /home/$NB_USER/work && \ fix-permissions /home/$NB_USER ENV MINICONDA_VERSION=4.6.14 \ CONDA_VERSION=4.7.10 RUN cd /tmp && \ wget --quiet https://repo.continuum.io/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh && \ echo "718259965f234088d785cad1fbd7de03 *Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh" | md5sum -c - && \ /bin/bash Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh -f -b -p $CONDA_DIR && \ rm Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh && \ echo "conda ${CONDA_VERSION}" >> $CONDA_DIR/conda-meta/pinned && \ $CONDA_DIR/bin/conda config --system --prepend channels conda-forge && \ $CONDA_DIR/bin/conda config --system --set auto_update_conda false && \ $CONDA_DIR/bin/conda config --system --set show_channel_urls true && \ $CONDA_DIR/bin/conda install --quiet --yes conda && \ $CONDA_DIR/bin/conda update --all --quiet --yes && \ conda list python | grep '^python ' | tr -s ' ' | cut -d '.' -f 1,2 | sed 's/$/.*/' >> $CONDA_DIR/conda-meta/pinned && \ conda clean --all -f -y && \ rm -rf /home/$NB_USER/.cache/yarn && \ fix-permissions $CONDA_DIR && \ fix-permissions /home/$NB_USER # Install Tini RUN conda install --quiet --yes 'tini=0.18.0' && \ conda list tini | grep tini | tr -s ' ' | cut -d ' ' -f 1,2 >> $CONDA_DIR/conda-meta/pinned && \ conda clean --all -f -y && \ fix-permissions $CONDA_DIR && \ fix-permissions /home/$NB_USER # Install Jupyter Notebook, Lab, and Hub # Generate a notebook server config # Cleanup temporary files # Correct permissions # Do all this in a single RUN command to avoid duplicating all of the # files across image layers when the permissions change RUN conda install --quiet --yes \ 'notebook=6.0.3' \ 'jupyterhub=1.1.0' \ 'jupyterlab=2.0.1' && \ conda clean --all -f -y && \ npm cache clean --force && \ jupyter notebook --generate-config && \ rm -rf $CONDA_DIR/share/jupyter/lab/staging && \ rm -rf /home/$NB_USER/.cache/yarn && \ fix-permissions $CONDA_DIR && \ fix-permissions /home/$NB_USER EXPOSE 8888 # Configure container startup ENTRYPOINT ["tini", "-g", "--"] CMD ["start-notebook.sh"] # Copy local files as late as possible to avoid cache busting COPY start.sh start-notebook.sh start-singleuser.sh /usr/local/bin/ COPY jupyter_notebook_config.py /etc/jupyter/ # Fix permissions on /etc/jupyter as root USER root RUN fix-permissions /etc/jupyter/ # Switch back to jovyan to avoid accidental container runs as root USER $NB_UID ##################CUDA USER root #FROM ubuntu:18.04 LABEL maintainer "NVIDIA CORPORATION <cudatools@nvidia.com>" RUN apt-get update && apt-get install -y --no-install-recommends \ gnupg2 curl ca-certificates && \ curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub | apt-key add - && \ echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list && \ echo "deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list && \ apt-get purge --autoremove -y curl && \ rm -rf /var/lib/apt/lists/* ENV CUDA_VERSION 10.2.89 ENV CUDA_PKG_VERSION 10-2=$CUDA_VERSION-1 # For libraries in the cuda-compat-* package: https://docs.nvidia.com/cuda/eula/index.html#attachment-a RUN apt-get update && apt-get install -y --no-install-recommends \ cuda-cudart-$CUDA_PKG_VERSION \ cuda-compat-10-2 && \ ln -s cuda-10.2 /usr/local/cuda && \ rm -rf /var/lib/apt/lists/* # Required for nvidia-docker v1 RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf && \ echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH} ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64 # nvidia-container-runtime ENV NVIDIA_VISIBLE_DEVICES all ENV NVIDIA_DRIVER_CAPABILITIES compute,utility ENV NVIDIA_REQUIRE_CUDA "cuda>=10.2 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=396,driver<397 brand=tesla,driver>=410,driver<411 brand=tesla,driver>=418,driver<419" USER $NB_UID |
It is important to enable the root user for the CUDA part and then to switch back to normal user settings afterwards.
We have to build this image and push it to a repository of our choice. Then we have a base image with Jupyter and CUDA. To create the final image on top, we need to install the Rapids-Library (cuDF and cuML), Dask, Dask-cuDF and Dask-cuML. The none-Dask-Rapids is required for the Dask version. This can be easily done in just a few steps and the Dockerfile looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
FROM <your_registry> ###############################################cudf RUN conda install -c rapidsai -c nvidia -c conda-forge \ -c defaults cudf=0.13 cuml=0.13 python=3.7 ##############################################DASK RUN conda install --yes \ -c conda-forge -c rapidsai -c nvidia -c defaults \ python-blosc \ cytoolz \ dask==2.15.0 \ lz4 \ nomkl \ numpy==1.18.1 \ pandas==0.25.3 \ tini==0.18.0 \ zstd==1.4.3 \ && conda clean -tipsy \ && find /opt/conda/ -type f,l -name '*.a' -delete \ && find /opt/conda/ -type f,l -name '*.pyc' -delete \ && find /opt/conda/ -type f,l -name '*.js.map' -delete \ && find /opt/conda/lib/python*/site-packages/bokeh/server/static -type f,l -name '*.js' -not -name '*.min.js' -delete \ && rm -rf /opt/conda/pkgs RUN python3 -m pip install pip --upgrade COPY requirements.txt /home/files/requirements.txt RUN pip install --default-timeout=300 -r /home/files/requirements.txt #USER $NB_UID |
In line 5, cuDF and cuML are installed. Line 10 installs Dask and a few needed libraries like NumPy or Pandas. This part, in particular lines 12 to 19, was copied from the daskdev/dask:latest Dockerfile. We will discuss later, why copying it was a good idea.
Finally, in line 27, libraries specified in the requirements.txt (which needs to be accessible while building the image) are installed via pip. These libraries are dask-kubernetes, dask_cuda, dask_cudf, dask_cuml and GCSFS (needed to read from google Buckets).
Again, we build the image and push it to a repository.
Deploying Jupyterhub
Now we are ready to deploy the JupyterHub image into our Kubernetes Cluster. This link provides a lot of information about deploying it on Kubernetes. There you can find many details on how to customize and personalize your deployment. We will come straight to the point. Create a file config.yaml according to your configuration preferences. My config looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
proxy: secretToken: "<YOUR 32 BYTES SECURITY TOKEN> " # Do not assign a public IP service: type: NodePort singleuser: defaultUrl: "/lab" #The service account we created for Jupyter serviceAccountName: jupyter-service-account #The final image we built image: name: <REGISTRY PATH HERE> tag: <TAG> storage: #customize sotrage for jupyter client (default 10 Gi) capacity: 20Gi #Mounts for NVIDIA Drivers extraVolumes: - name: nvidia-debug-tools hostPath: path: /home/kubernetes/bin/nvidia/bin - name: nvidia-libraries hostPath: path: /home/kubernetes/bin/nvidia/lib64 #The NFS PVC - name: my-pvc-nfs persistentVolumeClaim: claimName: nfs extraVolumeMounts: #Mount NVIDIA drivers paths - name: nvidia-debug-tools mountPath: /usr/local/bin/nvidia - name: nvidia-libraries mountPath: /usr/local/nvidia/lib64 #Mount the NFS - name: my-pvc-nfs mountPath: "/home/jovyan/mnt" #Create 2 Profiles, Notebook with or without a GPU profileList: - display_name: "GPU Server" description: "Spawns a notebook server with access to a GPU" kubespawner_override: extra_resource_limits: nvidia.com/gpu: "1" - display_name: "GPU Server" description: "Spawns a notebook server without access to a GPU" hub: extraConfig: # use jupyterLab by default 1_jupyterlab: c.Spawner.cmd = ['jupyter-labhub'] #Create a simple authentication auth: type: dummy dummy: password: '<YOUR PASSWORD>' whitelist: users: - <USER> |
To create your 32 Bytes security token, simply run:
1 |
openssl rand -hex 32 |
… in the terminal and paste the result into line 2 of your config. Then, specify your image, mount the configMap for accessing the Bucket and path to the NVIDIA Drivers (this might or might not be necessary). You can create different profiles with different requests for resources. In the above example, a profile with access to the GPU and one without it are available. A simple password-based authentication is provided as well.
Now we can add the JupyterHub Helm chart repository:
1 2 3 |
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/ helm repo update |
After a while an “Update Complete. Happy Helming“ info should appear. We are ready to deploy the Hub. From the directory with the config.yaml, run:
1 |
helm upgrade --install jupyterhub jupyterhub/jupyterhub --namespace kubeyard --version=0.8.2 --values config.yaml |
You might want to add a –timeout flag with a higher value, like 1000, since the image is quite big and it sometimes results in timeout errors. The deployment should create a Hub and Proxy pod. As soon as both are running, we can port-forward the proxy to a 8000 port:
1 |
kubectl port-forward <PROXY-POD NAME> 8000 |
Outlook on Part 3
Finally, Jupyter is up and running and port-forwarding is enabled. Now we can access JupyterHub from the browser, log in (if authentication is on) and we see the workspace of our JupyterLab. In the next part of our series we will finally use the prepared infrastructure for data science and compare the efficiency of four various approaches – including usage of multiple GPUs!