Known Issues and Limitations in Cloudera Data Science Workbench 1.1.x

This topic lists the current known issues and limitations in Cloudera Data Science Workbench 1.1.x.

Installation

  • Airgapped Installations

    Airgapped installations will fail when the installer cannot pull the alpine 3.4 image into the airgapped environment.

    Fixed In: This issue has been fixed in Cloudera Data Science Workbench 1.1.1. If you cannot upgrade to Cloudera Data Science Workbench 1.1.1, use the following workaround.

    Workaround: Download and save the alpine 3.4 image, and then manually load it into Docker on the Cloudera Data Science Workbench master node.
    # Pull the missing image
    docker pull alpine:3.4
    
    # Load the image into Docker on the Cloudera Data Science Workbench master node
    docker load -i /tmp/alpine3.4.docker
    cdsw init
  • The cdsw init command will fail if localhost does not resolve to 127.0.0.1.

Upgrade

TSB-350: Permanent Fix for Data Loss Risk During Cloudera Data Science Workbench (CDSW) Shutdown and Restart

TSB-346 was released in the time-frame of CDSW 1.4.2 to fix this issue, but it only turned out to be a partial fix. With CDSW 1.4.3, we have fixed the issue permanently and released TSB-350 to address this fix. Note that the script that was provided with TSB-346 still ensures that data loss is prevented and must be used to shutdown/restart all the affected CDSW released listed below.

Affected Versions: Cloudera Data Science Workbench 1.0.x, 1.1.x, 1.2.x, 1.3.x, 1.4.0, 1.4.1, 1.4.2

Fixed Version: Cloudera Data Science Workbench 1.4.3 (and higher)

Cloudera Bug: DSE-5108

The complete text for TSB-350 is available in the Cloudera Security Bulletins: TSB-350: Risk of Data Loss During Cloudera Data Science Workbench (CDSW) Shutdown and Restart.

Risk of Data Loss During Cloudera Data Science Workbench (CDSW) Shutdown and Restart

Stopping Cloudera Data Science Workbench involves unmounting the NFS volumes that store CDSW project directories and then cleaning up a folder where the kubelet stores its temporary state. However, due to a race condition, this NFS unmount process can take too long or fail altogether. If this happens, CDSW projects that remain mounted will be deleted by the cleanup step.

Products affected: Cloudera Data Science Workbench

Releases affected: Cloudera Data Science Workbench versions -
  • 1.0.x

  • 1.1.x

  • 1.2.x

  • 1.3.0, 1.3.1

  • 1.4.0, 1.4.1

Users affected: This potentially affects all CDSW users.

Detected by: Nehmé Tohmé (Cloudera)

Severity (Low/Medium/High): High

Impact: If the NFS unmount fails during shutdown, data loss can occur. All CDSW project files might be deleted.

CVE: N/A

Immediate action required: If you are running any of the affected Cloudera Data Science Workbench versions, you must run the following script on the CDSW master node every time before you stop or restart Cloudera Data Science Workbench. Failure to do so can result in data loss.

This script should also be run before initiating a Cloudera Data Science Workbench upgrade. As always, we recommend creating a full backup prior to beginning an upgrade.

cdsw_protect_stop_restart.sh - Available for download at: cdsw_protect_stop_restart.sh.

#!/bin/bash

set -e

cat << EXPLANATION


This script is a workaround for Cloudera TSB-346. It protects your
CDSW projects from a rare race condition that can result in data loss.
Run this script before stopping the CDSW service, irrespective of whether
the stop precedes a restart, upgrade, or any other task.

Run this script only on the master node of your CDSW cluster.

You will be asked to specify a target folder on the master node where the
script will save a backup of all your project files. Make sure the target
folder has enough free space to accommodate all of your project files. To
determine how much space is required, run 'du -hs /var/lib/cdsw/current/projects'
on the CDSW master node.

This script will first back up your project files to the specified target
folder. It will then temporarily move your project files aside to protect
against the data loss condition. At that point, it is safe to stop the CDSW
service. After CDSW has stopped, the script will move the project files back
into place.

Note: This workaround is not required for CDSW 1.4.2 and higher.



EXPLANATION

read -p "Enter target folder for backups: " backup_target

echo "Backing up to $backup_target..."
rsync -azp /var/lib/cdsw/current/projects "$backup_target"

read -n 1 -p "Backup complete. Press enter when you are ready to stop CDSW: "

echo "Deleting all Kubernetes resources..."
kubectl delete configmaps,deployments,daemonsets,replicasets,services,ingress,secrets,persistentvolumes,persistentvolumeclaims,jobs --all
kubectl delete pods --all

echo "Temporarily saving project files to /var/lib/cdsw/current/projects_tmp..."
mkdir /var/lib/cdsw/current/projects_tmp
mv /var/lib/cdsw/current/projects/* /var/lib/cdsw/current/projects_tmp

echo -e "Please stop the CDSW service."

read -n 1 -p "Press enter when CDSW has stopped: "

echo "Moving projects back into place..."
mv /var/lib/cdsw/current/projects_tmp/* /var/lib/cdsw/current/projects
rm -rf /var/lib/cdsw/current/projects_tmp

echo -e "Done. You may now upgrade or start the CDSW service."
echo -e "When CDSW is running, if desired, you may delete the backup data at $backup_target"

Addressed in release/refresh/patch: This issue is fixed in Cloudera Data Science Workbench 1.4.2.

Note that you are required to run the workaround script above when you upgrade from an affected version to a release with the fix. This helps guard against data loss when the affected version needs to be shut down during the upgrade process.

For the latest update on this issue see the corresponding Knowledge article:

TSB 2018-346: Risk of Data Loss During Cloudera Data Science Workbench (CDSW) Shutdown and Restart

Crashes and Hangs

  • High I/O utilization on the application block device can cause the application to stall or become unresponsive. Users should read and write data directly from HDFS rather than staging it in their project directories.

  • Installing ipywidgets or a Jupyter notebook into a project can cause Python engines to hang due to an unexpected configuration. The issue can be resolved by deleting the installed libraries from the R engine terminal.

CDS 2.x Powered By Apache Spark

With Spark 2.2 Release 1, a PySpark application can run only once per active Workbench session

With Spark 2.2 Release 1, you can run a PySpark application only once per active Workbench session. Subsequent runs of the application will fail with a getDelegationToken error.

Affected Versions: Cloudera Distribution of Apache Spark 2.2 Release 1

Fixed In: Cloudera Distribution of Apache Spark 2.2 Release 2. If you cannot upgrade, use one of the following workarounds.

  • Use Cloudera Distribution of Apache Spark 2.1.

    or

  • Launch a new PySpark session every time you want to run the application.

Cloudera Bug: CDH-58475

Certain CDS releases require additional configuration to run PySpark on Cloudera Data Science Workbench versions 1.3.x (and lower)

Affected Versions: Due to a security fix in CDS, there is now a mismatch between the versions of py4j that ship with the following versions of the two products:
  • Cloudera Data Science Workbench 1.3.x (and lower) include py4j 0.10.4, and,
  • CDS 2.1 release 3, CDS 2.2 release 3, and CDS 2.3 release 3 include py4j 0.10.7.

This version mismatch results in PySpark session/job failures on Cloudera Data Science Workbench.

The following error messages are indicative of this issue:
TypeError: __init__() got an unexpected keyword argument 'auth_token'
TypeErrorTraceback (most recent call last)
in engine
----> 1 spark = SparkSession    .builder    .appName("PythonPi")    .getOrCreate()
Use one of the following workarounds to continue running PySpark jobs on Cloudera Data Science Workbench 1.3.x. Alternatively, upgrade to Cloudera Data Science Workbench 1.4.x.
  • Workaround 1: Use the PYTHONPATH environmental variable to use CDS's version of py4j
    This process requires site administrator privileges.
    1. Log in to Cloudera Data Science Workbench.
    2. Click Admin > Engines.
    3. Under the Environmental variables section, add a new variable:
      • Name: PYTHONPATH
      • Value: $SPARK_HOME/python/lib/py4j-0.10.7-src.zip
    4. Click Add.
  • Workaround 2: Install py4j 0.10.7 directly in project sessions

    If you do not want to make site-wide changes as described in the previous workaround, individual users can install py4j 0.10.7 directly in their project's sessions to continue running PySpark.

    For example, in a Python 3 session, run the following command in the workbench command prompt:
    !pip3 install py4j==0.10.7

Fixed Versions: Cloudera Data Science Workbench 1.4.x

Version 1.4 ships with engine 5 that includes a CDS-compatible version of py4j. Note that the upgrade itself will not automatically upgrade existing projects to engine 5. Once the upgrade to 1.4 is complete, project administrators/collaborators must manually configure their projects to use engine 5 as the base image (go to the project's Settings > Engine page).

Cloudera Bug: DSE-4046, DSE-4316

Security

TSB-328: Unauthenticated User Enumeration in Cloudera Data Science Workbench

Unauthenticated users can get a list of user accounts of Cloudera Data Science Workbench.

Affected Versions: Cloudera Data Science Workbench 1.4.0 (and lower)

Fixed Versions: Cloudera Data Science Workbench 1.4.2 (and higher)

Immediate action required: Upgrade to the latest version of Cloudera Data Science Workbench (1.4.2 or higher).

For more details, see the Security Bulletins - TSB-328.

Other notable known issues include:

  • The container runtime and application data storage is not fully secure from untrusted users who have SSH access to the gateway nodes. Therefore, SSH access to the gateway nodes for untrusted users should be disabled for security and resource utilization reasons.

  • Self-signed certificates are not supported for TLS termination. For more details, see Enabling TLS/SSL - Limitations.

  • The TLS_KEY parameter is not password protected.

  • LDAP group search filters are currently not supported. To limit access to Cloudera Data Science Workbench to certain groups, use "memberOf" or the equivalent user attribute in LDAP User Filter.

    Fixed Version: Cloudera Data Science Workbench 1.4.0

  • PowerBroker-equipped Active Directory is not supported.

  • Using Kerberos plugin modules in krb5.conf is not supported.

  • Modifying the default_ccache_name parameter in krb5.conf does not work in Cloudera Data Science Workbench. Only the default path for this parameter, /tmp/krb5cc_${uid}, is supported.

  • When you upload a Kerberos keytab to authenticate yourself to the CDH cluster, Cloudera Data Science Workbench displays a fleeting error message ('cancelled') in the bottom right corner of the screen, even if authentication was successful. This error message can be ignored.

  • Cloudera Data Science Workbench does not support the use of a FreeIPA KDC.

GPU Support

  • Currently, Cloudera Data Science Workbench only supports CUDA-enabled NVIDIA GPU cards.

  • Cloudera Data Science Workbench does not support heterogenous GPU hardware in a single deployment.

  • GPUs are not detected after a machine reboot. This is because certain NVIDIA modules do not load automatically after a reboot. To work around this issue, manually load the required modules before Cloudera Data Science Workbench services start.

    The following commands load the nvidia.ko module, creates the /dev/nvidiactl device, and creates the list of devices at /dev/nvidia0. They will also create the /dev/nvidia-uvm and /dev/nvidia-uvm-tools devices, and assign execute privileges to /etc/rc.modules. Run these commands once on all the machines that have GPU hardware.

    # Manually load the required NVIDIA modules
    sudo cat >> /etc/rc.modules <<EOMSG
    /usr/bin/nvidia-smi
    /usr/bin/nvidia-modprobe -u -c=0
    EOMSG
    
    # Set execute permission for /etc/rc.modules 
    sudo chmod +x /etc/rc.modules

For more details on this feature, see Using GPUs for Cloudera Data Science Workbench Workloads.

Jobs API

  • Cloudera Data Science Workbench does not support changing your API key, or having multiple API keys.
  • Currently, you cannot create a job, stop a job, or get the status of a job using the Jobs API.

Engines

  • Autofs mounts are not supported with Cloudera Data Science Workbench.
  • Cloudera Data Science Workbench does not support pulling images from registries that require Docker credentials.
  • When using conda to install Python packages, you must specify the Python version to match the Python versions shipped in the engine image (2.7.11 and 3.6.1). If not specified, the conda-installed Python version will not be used within a project. Pip (pip and pip3) does not face this issue.

  • With the Python 3 engine, matplotlib plots don't return the image of the plot, just the code. To work around this issue, insert %matplotlib inline at the start of your scripts. For example:
    %matplotlib inline
    import matplotlib.pyplot as plt
    plt.plot([1,2,3])

    Fixed In: Cloudera Data Science Workbench 1.2.0.

Usability

  • Custom /etc/hosts entries on Cloudera Data Science Workbench hosts do not propagate to sessions and jobs running in containers.

  • In a scenario where 100s of users are logged in and creating processes, the nproc and nofile limits of the system may be reached. Use ulimits or other methods to increase the maximum number of processes and open files that can be created by a user on the system.

  • When rebooting, Cloudera Data Science Workbench nodes can take a significant amount of time (about 30 minutes) to become ready.

  • Long-running operations such as fork and clone can time out when projects are large or connections outlast the HTTP timeouts of reverse proxies.

  • The Scala kernel does not support autocomplete features in the editor.

  • Scala and R code can sometimes indent incorrectly in the workbench editor.