Data Processing Scaled Up and Out with Dask and RAPIDS: Installing a Data Science App as Dask Client (2/3)

This blog post tutorial shows how a scalable and high-performance environment for machine learning can be set up using the ingredients GPUs, Kubernetes clusters, Dask and Jupyter. In the first article of our blog series we have set up a Kubernetes cluster with access to GPUs. In this part we will add containerized applications to the cluster to be able to run data processing workloads in our cluster. Being more precise: we will prepare a notebook image that has CUDA installed which is required if we want to use GPU-based frameworks. Furthermore, the image should contain Dask, Rapids and Dask-Rapids. As soon as the image is ready, we will deploy JupyterHub which spawns said notebook image as a container for each user.

We will use JupyterLab notebooks as an interactive environment to start our data processing algorithms. In other words, JupyterLab will act as a Dask client. As we want to provide an environment not only for one data scientist but for a group of users, we decided to install JupyterHub on our Kubernetes cluster. JupyterHub makes it possible to serve a pre-configured data science environment to a group of users.

Permissions for Dask-Clients

At first, we have to care about the permissions of our JupyterLab instances. When being used as a Dask-Client it needs to have sufficient permissions to start new pods acting as Dask-workers. As we decided to install JupyterHub, no extra configuration is required since JupyterHub uses a Service Account with sufficient permissions per default. If you want to use Dask from a different environment, you will have to make sure to grant correct permissions for your client to create, delete, view etc. of your Dask-worker-pods via a Service Account.

Docker Image for Jupyter

JupyterHub is a multi-tenant version of JupyterLab. The hub creates a pod in the cluster for each user and pulls the notebook image that runs on that pod. There are official Jupyter-specific images like the Minimal-Notebook or the Data-Science-Notebook that are ready to use. However, to use the Rapids-Library, CUDA Toolkit is required. So we cannot use these base images and simply add Rapids and Dask to it.

It seems to be a good idea to create a base image which contains Jupyter and CUDA and use it to build an image with Rapids and Dask. Since Rapids and Dask are still in development and new versions are released frequently, keeping Jupyter and CUDA as a separate base image will make it easier to maintain our final image.

Fortunately, there are not only official Notebook images but also official images from NVIDIA with CUDA. We can simply combine both images. We will use the the base-notebook Dockerfile from here and the 10.2-base-ubuntu-18.04 CUDA 10.2 Dockerfile from here . We then combine both of them into a single image. Keep in mind that for the base-notebook you need to have following files together with your Dockerfile:

fix-permissions
jupyter_notebook_config.py
start.sh
start-notebook-sh
start-singleuser.sh

All these files can be found in the base-notebook registry from the link above. The resulting Dockerfile is listed below:

ARG ROOT_CONTAINER=ubuntu:bionic-20200311@sha256:e5dd9dbb37df5b731a6688fa49f4003359f6f126958c9c928f937bec69836320

ARG BASE_CONTAINER=$ROOT_CONTAINER

FROM $BASE_CONTAINER

LABEL maintainer="Jupyter Project <jupyter@googlegroups.com>"

ARG NB_USER="jovyan"

ARG NB_UID="1000"

ARG NB_GID="100"

USER root

# Install all OS dependencies for notebook server that starts but lacks all

# features (e.g., download as all possible file formats)

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update \

&& apt-get install -yq --no-install-recommends \

    wget \

    bzip2 \

    ca-certificates \

    sudo \

    locales \

    fonts-liberation \

    run-one \

&& apt-get clean && rm -rf /var/lib/apt/lists/*

RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen && \

    locale-gen

# Configure environment

ENV CONDA_DIR=/opt/conda \

    SHELL=/bin/bash \

    NB_USER=$NB_USER \

    NB_UID=$NB_UID \

    NB_GID=$NB_GID \

    LC_ALL=en_US.UTF-8 \

    LANG=en_US.UTF-8 \

    LANGUAGE=en_US.UTF-8

ENV PATH=$CONDA_DIR/bin:$PATH \

    HOME=/home/$NB_USER

# Copy a script that we will use to correct permissions after running certain commands

COPY fix-permissions /usr/local/bin/fix-permissions

RUN chmod a+rx /usr/local/bin/fix-permissions

# Enable prompt color in the skeleton .bashrc before creating the default NB_USER

RUN sed -i 's/^#force_color_prompt=yes/force_color_prompt=yes/' /etc/skel/.bashrc

# Create NB_USER wtih name jovyan user with UID=1000 and in the 'users' group

# and make sure these dirs are writable by the `users` group.

RUN echo "auth requisite pam_deny.so" >> /etc/pam.d/su && \

    sed -i.bak -e 's/^%admin/#%admin/' /etc/sudoers && \

    sed -i.bak -e 's/^%sudo/#%sudo/' /etc/sudoers && \

    useradd -m -s /bin/bash -N -u $NB_UID $NB_USER && \

    mkdir -p $CONDA_DIR && \

    chown $NB_USER:$NB_GID $CONDA_DIR && \

    chmod g+w /etc/passwd && \

    fix-permissions $HOME && \

    fix-permissions $CONDA_DIR

USER $NB_UID

WORKDIR $HOME

ARG PYTHON_VERSION=default

# Setup work directory for backward-compatibility

RUN mkdir /home/$NB_USER/work && \

    fix-permissions /home/$NB_USER

ENV MINICONDA_VERSION=4.6.14 \

    CONDA_VERSION=4.7.10

RUN cd /tmp && \

    wget --quiet https://repo.continuum.io/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh && \

    echo "718259965f234088d785cad1fbd7de03 *Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh" | md5sum -c - && \

    /bin/bash Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh -f -b -p $CONDA_DIR && \

    rm Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh && \

    echo "conda ${CONDA_VERSION}" >> $CONDA_DIR/conda-meta/pinned && \

    $CONDA_DIR/bin/conda config --system --prepend channels conda-forge && \

    $CONDA_DIR/bin/conda config --system --set auto_update_conda false && \

    $CONDA_DIR/bin/conda config --system --set show_channel_urls true && \

    $CONDA_DIR/bin/conda install --quiet --yes conda && \

    $CONDA_DIR/bin/conda update --all --quiet --yes && \

    conda list python | grep '^python ' | tr -s ' ' | cut -d '.' -f 1,2 | sed 's/$/.*/' >> $CONDA_DIR/conda-meta/pinned && \

    conda clean --all -f -y && \

    rm -rf /home/$NB_USER/.cache/yarn && \

    fix-permissions $CONDA_DIR && \

    fix-permissions /home/$NB_USER

# Install Tini

RUN conda install --quiet --yes 'tini=0.18.0' && \

    conda list tini | grep tini | tr -s ' ' | cut -d ' ' -f 1,2 >> $CONDA_DIR/conda-meta/pinned && \

    conda clean --all -f -y && \

    fix-permissions $CONDA_DIR && \

    fix-permissions /home/$NB_USER

# Install Jupyter Notebook, Lab, and Hub

# Generate a notebook server config

# Cleanup temporary files

# Correct permissions

# Do all this in a single RUN command to avoid duplicating all of the

# files across image layers when the permissions change

RUN conda install --quiet --yes \

    'notebook=6.0.3' \

    'jupyterhub=1.1.0' \

    'jupyterlab=2.0.1' && \

    conda clean --all -f -y && \

    npm cache clean --force && \

    jupyter notebook --generate-config && \

    rm -rf $CONDA_DIR/share/jupyter/lab/staging && \

    rm -rf /home/$NB_USER/.cache/yarn && \

    fix-permissions $CONDA_DIR && \

    fix-permissions /home/$NB_USER

EXPOSE 8888

# Configure container startup

ENTRYPOINT ["tini", "-g", "--"]

CMD ["start-notebook.sh"]

# Copy local files as late as possible to avoid cache busting

COPY start.sh start-notebook.sh start-singleuser.sh /usr/local/bin/

COPY jupyter_notebook_config.py /etc/jupyter/

# Fix permissions on /etc/jupyter as root

USER root

RUN fix-permissions /etc/jupyter/

# Switch back to jovyan to avoid accidental container runs as root

USER $NB_UID

##################CUDA

USER root

#FROM ubuntu:18.04

LABEL maintainer "NVIDIA CORPORATION <cudatools@nvidia.com>"

RUN apt-get update && apt-get install -y --no-install-recommends \

gnupg2 curl ca-certificates && \

    curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub | apt-key add - && \

    echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list && \

    echo "deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list && \

    apt-get purge --autoremove -y curl && \

rm -rf /var/lib/apt/lists/*

ENV CUDA_VERSION 10.2.89

ENV CUDA_PKG_VERSION 10-2=$CUDA_VERSION-1

# For libraries in the cuda-compat-* package: https://docs.nvidia.com/cuda/eula/index.html#attachment-a

RUN apt-get update && apt-get install -y --no-install-recommends \

        cuda-cudart-$CUDA_PKG_VERSION \

cuda-compat-10-2 && \

ln -s cuda-10.2 /usr/local/cuda && \

    rm -rf /var/lib/apt/lists/*

# Required for nvidia-docker v1

RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf && \

    echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf

ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}

ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64

# nvidia-container-runtime

ENV NVIDIA_VISIBLE_DEVICES all

ENV NVIDIA_DRIVER_CAPABILITIES compute,utility

ENV NVIDIA_REQUIRE_CUDA "cuda>=10.2 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=396,driver<397 brand=tesla,driver>=410,driver<411 brand=tesla,driver>=418,driver<419"

USER $NB_UID

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

ARG ROOT_CONTAINER=ubuntu:bionic-20200311@sha256:e5dd9dbb37df5b731a6688fa49f4003359f6f126958c9c928f937bec69836320

ARG BASE_CONTAINER=$ROOT_CONTAINER

FROM $BASE_CONTAINER

LABEL maintainer="Jupyter Project <jupyter@googlegroups.com>"

ARG NB_USER="jovyan"

ARG NB_UID="1000"

ARG NB_GID="100"

USER root

# Install all OS dependencies for notebook server that starts but lacks all

# features (e.g., download as all possible file formats)

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update \

&& apt-get install -yq --no-install-recommends \

wget \

bzip2 \

ca-certificates \

sudo \

locales \

fonts-liberation \

run-one \

&& apt-get clean && rm -rf /var/lib/apt/lists/*

RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen && \

locale-gen

# Configure environment

ENV CONDA_DIR=/opt/conda \

SHELL=/bin/bash \

NB_USER=$NB_USER \

NB_UID=$NB_UID \

NB_GID=$NB_GID \

LC_ALL=en_US.UTF-8 \

LANG=en_US.UTF-8 \

LANGUAGE=en_US.UTF-8

ENV PATH=$CONDA_DIR/bin:$PATH \

HOME=/home/$NB_USER

# Copy a script that we will use to correct permissions after running certain commands

COPY fix-permissions /usr/local/bin/fix-permissions

RUN chmod a+rx /usr/local/bin/fix-permissions

# Enable prompt color in the skeleton .bashrc before creating the default NB_USER

RUN sed -i 's/^#force_color_prompt=yes/force_color_prompt=yes/' /etc/skel/.bashrc

# Create NB_USER wtih name jovyan user with UID=1000 and in the 'users' group

# and make sure these dirs are writable by the `users` group.

RUN echo "auth requisite pam_deny.so" >> /etc/pam.d/su && \

sed -i.bak -e 's/^%admin/#%admin/' /etc/sudoers && \

sed -i.bak -e 's/^%sudo/#%sudo/' /etc/sudoers && \

useradd -m -s /bin/bash -N -u $NB_UID $NB_USER && \

mkdir -p $CONDA_DIR && \

chown $NB_USER:$NB_GID $CONDA_DIR && \

chmod g+w /etc/passwd && \

fix-permissions $HOME && \

fix-permissions $CONDA_DIR

USER $NB_UID

WORKDIR $HOME

ARG PYTHON_VERSION=default

# Setup work directory for backward-compatibility

RUN mkdir /home/$NB_USER/work && \

fix-permissions /home/$NB_USER

ENV MINICONDA_VERSION=4.6.14 \

CONDA_VERSION=4.7.10

RUN cd /tmp && \

wget --quiet https://repo.continuum.io/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh && \

echo "718259965f234088d785cad1fbd7de03 *Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh" | md5sum -c - && \

/bin/bash Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh -f -b -p $CONDA_DIR && \

rm Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh && \

echo "conda ${CONDA_VERSION}" >> $CONDA_DIR/conda-meta/pinned && \

$CONDA_DIR/bin/conda config --system --prepend channels conda-forge && \

$CONDA_DIR/bin/conda config --system --set auto_update_conda false && \

$CONDA_DIR/bin/conda config --system --set show_channel_urls true && \

$CONDA_DIR/bin/conda install --quiet --yes conda && \

$CONDA_DIR/bin/conda update --all --quiet --yes && \

conda list python | grep '^python ' | tr -s ' ' | cut -d '.' -f 1,2 | sed 's/$/.*/' >> $CONDA_DIR/conda-meta/pinned && \

conda clean --all -f -y && \

rm -rf /home/$NB_USER/.cache/yarn && \

fix-permissions $CONDA_DIR && \

fix-permissions /home/$NB_USER

# Install Tini

RUN conda install --quiet --yes 'tini=0.18.0' && \

conda list tini | grep tini | tr -s ' ' | cut -d ' ' -f 1,2 >> $CONDA_DIR/conda-meta/pinned && \

conda clean --all -f -y && \

fix-permissions $CONDA_DIR && \

fix-permissions /home/$NB_USER

# Install Jupyter Notebook, Lab, and Hub

# Generate a notebook server config

# Cleanup temporary files

# Correct permissions

# Do all this in a single RUN command to avoid duplicating all of the

# files across image layers when the permissions change

RUN conda install --quiet --yes \

'notebook=6.0.3' \

'jupyterhub=1.1.0' \

'jupyterlab=2.0.1' && \

conda clean --all -f -y && \

npm cache clean --force && \

jupyter notebook --generate-config && \

rm -rf $CONDA_DIR/share/jupyter/lab/staging && \

rm -rf /home/$NB_USER/.cache/yarn && \

fix-permissions $CONDA_DIR && \

fix-permissions /home/$NB_USER

EXPOSE 8888

# Configure container startup

ENTRYPOINT ["tini", "-g", "--"]

CMD ["start-notebook.sh"]

# Copy local files as late as possible to avoid cache busting

COPY start.sh start-notebook.sh start-singleuser.sh /usr/local/bin/

COPY jupyter_notebook_config.py /etc/jupyter/

# Fix permissions on /etc/jupyter as root

USER root

RUN fix-permissions /etc/jupyter/

# Switch back to jovyan to avoid accidental container runs as root

USER $NB_UID

##################CUDA

USER root

#FROM ubuntu:18.04

LABEL maintainer "NVIDIA CORPORATION <cudatools@nvidia.com>"

RUN apt-get update && apt-get install -y --no-install-recommends \

gnupg2 curl ca-certificates && \

curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub | apt-key add - && \

echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list && \

echo "deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list && \

apt-get purge --autoremove -y curl && \

rm -rf /var/lib/apt/lists/*

ENV CUDA_VERSION 10.2.89

ENV CUDA_PKG_VERSION 10-2=$CUDA_VERSION-1

# For libraries in the cuda-compat-* package: https://docs.nvidia.com/cuda/eula/index.html#attachment-a

RUN apt-get update && apt-get install -y --no-install-recommends \

cuda-cudart-$CUDA_PKG_VERSION \

cuda-compat-10-2 && \

ln -s cuda-10.2 /usr/local/cuda && \

rm -rf /var/lib/apt/lists/*

# Required for nvidia-docker v1

RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf && \

echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf

ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}

ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64

# nvidia-container-runtime

ENV NVIDIA_VISIBLE_DEVICES all

ENV NVIDIA_DRIVER_CAPABILITIES compute,utility

ENV NVIDIA_REQUIRE_CUDA "cuda>=10.2 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=396,driver<397 brand=tesla,driver>=410,driver<411 brand=tesla,driver>=418,driver<419"

USER $NB_UID

It is important to enable the root user for the CUDA part and then to switch back to normal user settings afterwards.

We have to build this image and push it to a repository of our choice. Then we have a base image with Jupyter and CUDA. To create the final image on top, we need to install the Rapids-Library (cuDF and cuML), Dask, Dask-cuDF and Dask-cuML. The none-Dask-Rapids is required for the Dask version. This can be easily done in just a few steps and the Dockerfile looks like this:

FROM <your_registry>

###############################################cudf

RUN conda install -c rapidsai -c nvidia -c conda-forge \

    -c defaults cudf=0.13 cuml=0.13 python=3.7 

##############################################DASK

RUN conda install --yes \

    -c conda-forge -c rapidsai -c nvidia -c defaults \

    python-blosc \

    cytoolz \

    dask==2.15.0 \

    lz4 \

    nomkl \

    numpy==1.18.1 \

    pandas==0.25.3 \

    tini==0.18.0 \

    zstd==1.4.3 \

    && conda clean -tipsy \

    && find /opt/conda/ -type f,l -name '*.a' -delete \

    && find /opt/conda/ -type f,l -name '*.pyc' -delete \

    && find /opt/conda/ -type f,l -name '*.js.map' -delete \

    && find /opt/conda/lib/python*/site-packages/bokeh/server/static -type f,l -name '*.js' -not -name '*.min.js' -delete \

    && rm -rf /opt/conda/pkgs

RUN python3 -m pip install pip --upgrade

COPY requirements.txt /home/files/requirements.txt

RUN pip install --default-timeout=300 -r /home/files/requirements.txt

#USER $NB_UID

FROM <your_registry>

###############################################cudf

RUN conda install -c rapidsai -c nvidia -c conda-forge \

-c defaults cudf=0.13 cuml=0.13 python=3.7

##############################################DASK

RUN conda install --yes \

-c conda-forge -c rapidsai -c nvidia -c defaults \

python-blosc \

cytoolz \

dask==2.15.0 \

lz4 \

nomkl \

numpy==1.18.1 \

pandas==0.25.3 \

tini==0.18.0 \

zstd==1.4.3 \

&& conda clean -tipsy \

&& find /opt/conda/ -type f,l -name '*.a' -delete \

&& find /opt/conda/ -type f,l -name '*.pyc' -delete \

&& find /opt/conda/ -type f,l -name '*.js.map' -delete \

&& find /opt/conda/lib/python*/site-packages/bokeh/server/static -type f,l -name '*.js' -not -name '*.min.js' -delete \

&& rm -rf /opt/conda/pkgs

RUN python3 -m pip install pip --upgrade

COPY requirements.txt /home/files/requirements.txt

RUN pip install --default-timeout=300 -r /home/files/requirements.txt

#USER $NB_UID

In line 5, cuDF and cuML are installed. Line 10 installs Dask and a few needed libraries like NumPy or Pandas. This part, in particular lines 12 to 19, was copied from the daskdev/dask:latest Dockerfile. We will discuss later, why copying it was a good idea.

Finally, in line 27, libraries specified in the requirements.txt (which needs to be accessible while building the image) are installed via pip. These libraries are dask-kubernetes, dask_cuda, dask_cudf, dask_cuml and GCSFS (needed to read from google Buckets).

Again, we build the image and push it to a repository.

Deploying Jupyterhub

Now we are ready to deploy the JupyterHub image into our Kubernetes Cluster. This link provides a lot of information about deploying it on Kubernetes. There you can find many details on how to customize and personalize your deployment. We will come straight to the point. Create a file config.yaml according to your configuration preferences. My config looks like this:

proxy:

secretToken: "<YOUR 32 BYTES SECURITY TOKEN> "

# Do not assign a public IP

service:

type: NodePort

singleuser:

defaultUrl: "/lab"

#The service account we created for Jupyter

serviceAccountName: jupyter-service-account

#The final image we built

image:

name: <REGISTRY PATH HERE>

tag: <TAG>

storage:

#customize sotrage for jupyter client (default 10 Gi)

capacity: 20Gi

#Mounts for NVIDIA Drivers

extraVolumes:

- name: nvidia-debug-tools

hostPath:

path: /home/kubernetes/bin/nvidia/bin

- name: nvidia-libraries

hostPath:

path: /home/kubernetes/bin/nvidia/lib64

#The NFS PVC

- name: my-pvc-nfs

persistentVolumeClaim:

claimName: nfs

extraVolumeMounts:

#Mount NVIDIA drivers paths

- name: nvidia-debug-tools

mountPath: /usr/local/bin/nvidia

- name: nvidia-libraries

mountPath: /usr/local/nvidia/lib64

#Mount the NFS

- name: my-pvc-nfs

mountPath: "/home/jovyan/mnt"

#Create 2 Profiles, Notebook with or without a GPU

profileList:

- display_name: "GPU Server"

description: "Spawns a notebook server with access to a GPU"

kubespawner_override:

extra_resource_limits:

nvidia.com/gpu: "1"

- display_name: "GPU Server"

description: "Spawns a notebook server without access to a GPU"

hub:

extraConfig:

# use jupyterLab by default

1_jupyterlab:

c.Spawner.cmd = ['jupyter-labhub']

#Create a simple authentication

auth:

type: dummy

dummy:

password: '<YOUR PASSWORD>'

whitelist:

users:

- <USER>

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

proxy:

secretToken: "<YOUR 32 BYTES SECURITY TOKEN> "

# Do not assign a public IP

service:

type: NodePort

singleuser:

defaultUrl: "/lab"

#The service account we created for Jupyter

serviceAccountName: jupyter-service-account

#The final image we built

image:

name: <REGISTRY PATH HERE>

tag: <TAG>

storage:

#customize sotrage for jupyter client (default 10 Gi)

capacity: 20Gi

#Mounts for NVIDIA Drivers

extraVolumes:

- name: nvidia-debug-tools

hostPath:

path: /home/kubernetes/bin/nvidia/bin

- name: nvidia-libraries

hostPath:

path: /home/kubernetes/bin/nvidia/lib64

#The NFS PVC

- name: my-pvc-nfs

persistentVolumeClaim:

claimName: nfs

extraVolumeMounts:

#Mount NVIDIA drivers paths

- name: nvidia-debug-tools

mountPath: /usr/local/bin/nvidia

- name: nvidia-libraries

mountPath: /usr/local/nvidia/lib64

#Mount the NFS

- name: my-pvc-nfs

mountPath: "/home/jovyan/mnt"

#Create 2 Profiles, Notebook with or without a GPU

profileList:

- display_name: "GPU Server"

description: "Spawns a notebook server with access to a GPU"

kubespawner_override:

extra_resource_limits:

nvidia.com/gpu: "1"

- display_name: "GPU Server"

description: "Spawns a notebook server without access to a GPU"

hub:

extraConfig:

# use jupyterLab by default

1_jupyterlab:

c.Spawner.cmd = ['jupyter-labhub']

#Create a simple authentication

auth:

type: dummy

dummy:

password: '<YOUR PASSWORD>'

whitelist:

users:

- <USER>

To create your 32 Bytes security token, simply run:

openssl rand -hex 32

1	openssl rand -hex 32

… in the terminal and paste the result into line 2 of your config. Then, specify your image, mount the configMap for accessing the Bucket and path to the NVIDIA Drivers (this might or might not be necessary). You can create different profiles with different requests for resources. In the above example, a profile with access to the GPU and one without it are available. A simple password-based authentication is provided as well.

Now we can add the JupyterHub Helm chart repository:

helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/

helm repo update

helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/

helm repo update

After a while an “Update Complete. Happy Helming“ info should appear. We are ready to deploy the Hub. From the directory with the config.yaml, run:

helm upgrade --install jupyterhub jupyterhub/jupyterhub --namespace kubeyard --version=0.8.2 --values config.yaml

1	helm upgrade --install jupyterhub jupyterhub/jupyterhub --namespace kubeyard --version=0.8.2 --values config.yaml

You might want to add a –timeout flag with a higher value, like 1000, since the image is quite big and it sometimes results in timeout errors. The deployment should create a Hub and Proxy pod. As soon as both are running, we can port-forward the proxy to a 8000 port:

kubectl port-forward <PROXY-POD NAME> 8000

1	kubectl port-forward <PROXY-POD NAME> 8000

Outlook on Part 3

Finally, Jupyter is up and running and port-forwarding is enabled. Now we can access JupyterHub from the browser, log in (if authentication is on) and we see the workspace of our JupyterLab. In the next part of our series we will finally use the prepared infrastructure for data science and compare the efficiency of four various approaches – including usage of multiple GPUs!

Moderne Android-App-Entwicklung mit Kotlin

Dieses Training führt in die Programmiersprache Kotlin und die native Android-App-Entwicklung ein und hilft, sich im dazugehörigen Ökosystem zurechtzufinden.

Zum Training

Name	Borlabs Cookie
Anbieter	Eigentümer dieser Website
Zweck	Speichert die Einstellungen der Besucher, die in der Cookie Box von Borlabs Cookie ausgewählt wurden.
Cookie Name	borlabs-cookie
Cookie Laufzeit	1 Jahr

Akzeptieren
Name	Google Analytics
Anbieter	Google LLC
Zweck	Cookie von Google für Website-Analysen. Erzeugt statistische Daten darüber, wie der Besucher die Website nutzt.
Datenschutzerklärung	https://policies.google.com/privacy?hl=de
Cookie Name	_ga,_gat,_gid
Cookie Laufzeit	2 Jahre

Akzeptieren
Name	Hotjar
Anbieter	Hotjar Ltd.
Zweck	Hotjar ist ein Analysewerkzeug für das Benutzerverhalten von Hotjar Ltd. Wir verwenden Hotjar, um zu verstehen, wie Benutzer mit unserer Website interagieren.
Datenschutzerklärung	https://www.hotjar.com/legal/policies/privacy/
Host(s)	*.hotjar.com
Cookie Name	_hjClosedSurveyInvites, _hjDonePolls, _hjMinimizedPolls, _hjDoneTestersWidgets, _hjIncludedInSample, _hjShownFeedbackMessage, _hjid, _hjRecordingLastActivity, hjTLDTest, _hjUserAttributesHash, _hjCachedUserAttributes, _hjLocalStorageTest, _hjptid
Cookie Laufzeit	Sitzung / 1 Jahr

Akzeptieren
Name	HubSpot
Anbieter	HubSpot Inc.
Zweck	HubSpot ist ein Verwaltungsdienst für Benutzerdatenbanken bereitgestellt von HubSpot, Inc. Wir nutzen HubSpot auf dieser Website für unsere Online Marketing-Aktivitäten.
Datenschutzerklärung	https://legal.hubspot.com/privacy-policy
Host(s)	*.hubspot.com, hubspot-avatars.s3.amazonaws.com, hubspot-realtime.ably.io, hubspot-rest.ably.io, js.hs-scripts.com
Cookie Name	__hs_opt_out, __hs_d_not_track, hs_ab_test, hs-messages-is-open, hs-messages-hide-welcome-message, __hstc, hubspotutk, __hssc, __hssrc, messagesUtk
Cookie Laufzeit	Sitzung / 30 Minuten / 1 Tag / 1 Jahr / 13 Monate

Akzeptieren
Name	OpenStreetMap
Anbieter	OpenStreetMap Foundation
Zweck	Wird verwendet, um OpenStreetMap-Inhalte zu entsperren.
Datenschutzerklärung	https://wiki.osmfoundation.org/wiki/Privacy_Policy
Host(s)	.openstreetmap.org
Cookie Name	_osm_location, _osm_session, _osm_totp_token, _osm_welcome, _pk_id., _pk_ref., _pk_ses., qos_token
Cookie Laufzeit	1-10 Jahre

Data Processing Scaled Up and Out with Dask and RAPIDS: Installing a Data Science App as Dask Client (2/3)

Permissions for Dask-Clients

Docker Image for Jupyter

Deploying Jupyterhub

Outlook on Part 3

Moderne Android-App-Entwicklung mit Kotlin

Hat dir der Beitrag gefallen? Antworten abbrechen

Ähnliche Artikel

Level Up: Bewegungsmangel im Büro spielerisch bekämpfen

When and How to Start Coding With Kids

Developing an Audio App for Apple CarPlay – Lessons learned

Data Processing Scaled Up and Out with Dask and RAPIDS: Installing a Data Science App as Dask Client (2/3)

Permissions for Dask-Clients

Docker Image for Jupyter

Deploying Jupyterhub

Outlook on Part 3

Moderne Android-App-Entwicklung mit Kotlin

Hat dir der Beitrag gefallen? Antworten abbrechen

Ähnliche Artikel

Level Up: Bewegungsmangel im Büro spielerisch bekämpfen

When and How to Start Coding With Kids

Developing an Audio App for Apple CarPlay – Lessons learned

Newsletter