Requirements for using a PBJ Workbench
Learn about the prerequisites and preparation steps for setting up a PBJ Workbench.
PBJ Workbench setup: Python installation
PBJ Runtimes must have Python installed, even if the Runtime is designed to run another kernel in Cloudera AI, for example, R kernel. The minimum supported Python version is 3.7. Python can be installed by using the package manager of the base image or can be compiled by the user.
The custom PBJ Runtime image must meet the following essential requirements:
- The actual Python binary or a
symlinkfile pointing to the custom PBJ Runtime image must be located at the following path: /usr/local/bin/python3. - The Python binary must be included in the PATH environment variable under the
python name, ensuring that executing the
pythoncommand in a terminal successfully launches Python. - Executing
python --versionmust return the result of a Python version higher than version 3.7. - If the Runtime is configured to run a Python kernel in Cloudera AI,
both the
pythonand the/usr/local/bin/python3commands must launch the same Python process that is registered as a Jupyter kernel.
If the chosen method for installing Python does not place the Python binary under
/usr/local/bin/python3, or does not create the python
command, create the appropriate symlink files.
Installing Jupyter dependencies and registering your kernel
-
Install the Jupyter kernel Gateway 2.5.2 version into the Docker image.
You might need to modify this example command depending on the filename and path of the
pipexecutable in the image.RUN pip3 install "jupyter-kernel-gateway==2.5.2" -
Ensure you document the path to the Jupyter executable file installed by the
pippackage manager. Incorporate the command to run Jupyter kernel Gateway into theML_RUNTIME_JUPYTER_KERNEL_GATEWAY_CMDenvironment variable within the Docker image:ENV ML_RUNTIME_JUPYTER_KERNEL_GATEWAY_CMD="/path/to/jupyter kernelgateway"When launching the Runtime in Cloudera AI, the correct IP address, port configuration for Jupyter kernel Gateway is set automatically by Cloudera AI.
-
Register the Jupyter kernel.
Each instance of the PBJ Workbench communicates with the Jupyter kernel installed in the Runtime image by using the Jupyter protocol. Kernels are available for a wide variety of languages and versions. Install the kernel of your choice to the image by following its installation instructions. A kernel named
python3is registered by default when installingjupyter-kernel-gatewayusingpippackage manager. Installed Jupyter Kernels can be listed by running the following command in a container created from the image:path/to/jupyter kernelspec list -
Defne the name of your chosen kernel within the
ML_RUNTIME_JUPYTER_KERNEL_NAMEenvironment variable in the Docker image.For example, if the name of your kernel is
python3, include the following in the Dockerfile:ENV ML_RUNTIME_JUPYTER_KERNEL_NAME=python3
Adding the cdsw user
The user code executes in the image under the user and group identified as 8536:8536.
Associate these IDs with the cdsw name in the image by adding the following
command to the dockerfile:
RUN groupadd --gid 8536 cdsw && \
useradd -c "CDSW User" --uid 8536 -g cdsw -m -s /bin/bash cdsw
Configuring permissions to enable writing Cloudera user settings
All code within the runtime container, including initial setup, executes under the
cdsw user. The initial setup includes linking client files for Cloudera Data Services on premises to their standard paths. To enable this
process, ensure that the following paths, along with their subfolders, have write permissions
for the user ID 8536:
/etc/bin/usr/share/java/opt/usr
Additionally, set the permissions for the following directories, along with all their
subdirectories to 777.
/etc/etc/alternatives
Additional requirements
ML_RUNTIME_METADATA_VERSIONenvironment variable and the corresponding Docker label must be set to value2.- To use the PBJ Workbench editor, the
ML_RUNTIME_EDITORenvironment variable and the corresponding Docker label must be set toPBJ Workbench. If using a 3rd party editor, for example, JupyterLab or RStudio, set theML_RUNTIME_EDITORenvironment variable and the Docker label to the desired value. - The base image must be Ubuntu.
- The Bash tool must be installed and must be configured as the default terminal used by the
cdswuser. - When the PBJ Runtime is running the R kernel, the kernel must be registered with the
IRkernelpackage and the bracketed paste mode must be disabled for thebashtool. - The executable, that is registered as a Jupyter kernel, must be on the PATH environment
variable, must be found by the
whichcommand and must be named after the programming language of the kernel. For example, the name of the executable must be:pythonin case of a Python kernel.Rin case of an R kernel
- When using a virtual or Conda environment and a Python kernel, Cloudera recommends configuring the PATH environment variable so,
that the default
pipcommand corresponds to the Python executable registered as the Jupyter kernel. - Cloudera AI mounts the project’s filesystem under the path
/home/cdsw and overwrites any files placed in that location within the
Runtime image. Therefore, custom Runtime images must avoid installing any files or
configurations under the home folder of the
cdswuser. - Once the Runtime image starts up in Cloudera AI, the kernel must be configured to install new packages to user site libraries under /home/cdsw. That way, newly installed packages persist in the project filesystem.
- The
xz-utilspackage must be installed on the Runtime image. - The following binaries must be accessible on the PATH variable: kinit, klist, ktutil, and sshd. The binaries are installed on Ubuntu as part of the following packages: krb5-user and ssh.
