Known Issues and Limitations in Cloudera Data Science Workbench 1.2.x

This topic lists the current known issues and limitations in Cloudera Data Science Workbench 1.2.x.

Upgrades

Risk of Data Loss During Cloudera Data Science Workbench (CDSW) Shutdown and Restart

Stopping Cloudera Data Science Workbench involves unmounting the NFS volumes that store CDSW project directories and then cleaning up a folder where the kubelet stores its temporary state. However, due to a race condition, this NFS unmount process can take too long or fail altogether. If this happens, CDSW projects that remain mounted will be deleted by the cleanup step.

Products affected: Cloudera Data Science Workbench

Releases affected: Cloudera Data Science Workbench versions -
  • 1.0.x

  • 1.1.x

  • 1.2.x

  • 1.3.0, 1.3.1

  • 1.4.0, 1.4.1

Users affected: This potentially affects all CDSW users.

Detected by: Nehmé Tohmé (Cloudera)

Severity (Low/Medium/High): High

Impact: If the NFS unmount fails during shutdown, data loss can occur. All CDSW project files might be deleted.

CVE: N/A

Immediate action required: If you are running any of the affected Cloudera Data Science Workbench versions, you must run the following script on the CDSW master node every time before you stop or restart Cloudera Data Science Workbench. Failure to do so can result in data loss.

This script should also be run before initiating a Cloudera Data Science Workbench upgrade. As always, we recommend creating a full backup prior to beginning an upgrade.

cdsw_protect_stop_restart.sh - Available for download at: cdsw_protect_stop_restart.sh.

#!/bin/bash

set -e

cat << EXPLANATION


This script is a workaround for Cloudera TSB-346. It protects your
CDSW projects from a rare race condition that can result in data loss.
Run this script before stopping the CDSW service, irrespective of whether
the stop precedes a restart, upgrade, or any other task.

Run this script only on the master node of your CDSW cluster.

You will be asked to specify a target folder on the master node where the
script will save a backup of all your project files. Make sure the target
folder has enough free space to accommodate all of your project files. To
determine how much space is required, run 'du -hs /var/lib/cdsw/current/projects'
on the CDSW master node.

This script will first back up your project files to the specified target
folder. It will then temporarily move your project files aside to protect
against the data loss condition. At that point, it is safe to stop the CDSW
service. After CDSW has stopped, the script will move the project files back
into place.

Note: This workaround is not required for CDSW 1.4.2 and higher.



EXPLANATION

read -p "Enter target folder for backups: " backup_target

echo "Backing up to $backup_target..."
rsync -azp /var/lib/cdsw/current/projects "$backup_target"

read -n 1 -p "Backup complete. Press enter when you are ready to stop CDSW: "

echo "Deleting all Kubernetes resources..."
kubectl delete configmaps,deployments,daemonsets,replicasets,services,ingress,secrets,persistentvolumes,persistentvolumeclaims,jobs --all
kubectl delete pods --all

echo "Temporarily saving project files to /var/lib/cdsw/current/projects_tmp..."
mkdir /var/lib/cdsw/current/projects_tmp
mv /var/lib/cdsw/current/projects/* /var/lib/cdsw/current/projects_tmp

echo -e "Please stop the CDSW service."

read -n 1 -p "Press enter when CDSW has stopped: "

echo "Moving projects back into place..."
mv /var/lib/cdsw/current/projects_tmp/* /var/lib/cdsw/current/projects
rm -rf /var/lib/cdsw/current/projects_tmp

echo -e "Done. You may now upgrade or start the CDSW service."
echo -e "When CDSW is running, if desired, you may delete the backup data at $backup_target"

Addressed in release/refresh/patch: This issue is fixed in Cloudera Data Science Workbench 1.4.2.

Note that you are required to run the workaround script above when you upgrade from an affected version to a release with the fix. This helps guard against data loss when the affected version needs to be shut down during the upgrade process.

For the latest update on this issue see the corresponding Knowledge article:

TSB 2018-346: Risk of Data Loss During Cloudera Data Science Workbench (CDSW) Shutdown and Restart

In version 1.2.2, the cdsw status command fails to perform all the required checks

In Cloudera Data Science Workbench 1.2.2, the cdsw status command fails to perform certain required system checks and displays the following error message.
Sending detailed logs to [/tmp/cdsw_status_abc.log] ...
CDSW Version: [1.2.2....]
OK: Application running as root check
OK: Sysctl params check
Failed to run CDSW Nodes Check.
Failed to run CDSW system pods check.
Failed to run CDSW application pods check.
Failed to run CDSW services check.
Failed to run CDSW config maps check.
Failed to run CDSW secrets check.
...

Affected Versions: Cloudera Data Science Workbench 1.2.2.

Fixed In: Cloudera Data Science Workbench 1.3.0. If you cannot upgrade, use the following workaround.

Workaround: To work around this issue, install the following package on all Cloudera Data Science Workbench master and worker nodes.
pip install backports.ssl_match_hostname==3.5.0.1 

Cloudera Bug: DSE-3070

Environment setup incomplete after upgrade from version 1.2.0 to 1.2.1 on CSD-based deployments

After upgrading from Cloudera Data Science Workbench 1.2.0 to 1.2.1 on a CSD-based deployment, CLI commands might not work as expected due to missing binaries in the environment. Note that this issue does not affect fresh installs.

Fixed In: Cloudera Data Science Workbench 1.2.2. If you cannot upgrade, use the following workaround.

Workaround: Deactivate and reactivate the Cloudera Data Science Workbench parcel in Cloudera Manager. To do so, go to the Cloudera Manager Admin Console. In the top navigation bar, click Hosts > Parcels.

Locate the current active CDSW parcel and click Deactivate. Once deactivation is complete, click Activate.

Cloudera Bug: DSE-2928

Output from CLI commands is inconsistent with Cloudera Manager CDSW service configuration

Clusters that have migrated from package-based to a CSD-based deployment might face issues with CLI commands showing output that is inconsistent with their settings in the Cloudera Manager CDSW service.

Affected Versions: Cloudera Data Science Workbench 1.2.0, 1.2.1

Fixed In: Cloudera Data Science Workbench 1.2.2. If you cannot upgrade, use the following workaround.

Workaround: Delete the /etc/cdsw/config directory on all the Cloudera Data Science Workbench gateway hosts.

Cloudera Bug: DSE-3021

CDH Integration

Cloudera Data Science Workbench (1.4.x and lower) is not supported with Cloudera Manager 6.0.0 and CDH 6.0.0.

Cloudera Data Science Workbench will be supported with Cloudera Enterprise 6 in a future release.

CDH client configuration changes require a full Cloudera Data Science Workbench reset

Cloudera Data Science Workbench does not automatically detect configuration changes on the CDH cluster. Therefore, any changes made to CDH services, ranging from updates to service configuration properties to complete parcel upgrades, must be followed by a full reset of Cloudera Data Science Workbench.

Affected Versions: Cloudera Data Science Workbench 1.2.x, 1.3.x

Workaround: Run the following commands to ensure client configuration changes are picked up by Cloudera Data Science Workbench.

cdsw reset
cdsw init

Cloudera Bug: DSE-3350

Cloudera Manager Integration

  • Cloudera Data Science Workbench (1.4.x and lower) is not supported with Cloudera Manager 6.0.0 and CDH 6.0.0.

    Cloudera Data Science Workbench will be supported with Cloudera Enterprise 6 in a future release.

  • CSD distribution/activation fails on mixed-OS clusters when there are third-party parcels running on OSs that are not supported by Cloudera Data Science Workbench

    For example, adding a new CDSW gateway host on a RHEL 6 cluster running RHEL-6 compatible parcels will fail. This is because Cloudera Manager will not allow distribution of the RHEL 6 parcels on the new host which will likely be running a CDSW-compatible operating system such as RHEL 7.

    Affected Versions: Cloudera Data Science Workbench 1.2.x, 1.3.x

    Workaround: To ensure adding a new CDSW gateway host is successful, you must create a copy of the 'incompatible' third-party parcel files and give them the corresponding RHEL 7 names so that Cloudera Manager allows them to be distributed on the new gateway host. Use the following sample instructions to do so:
    1. SSH to the Cloudera Manager Server host.
    2. Navigate to the directory that contains all the parcels. By default, this is /opt/cloudera/parcels.
      cd /opt/cloudera/parcels
    3. Make a copy of the incompatible third-party parcel with the new name. For example, if you have a RHEL 6 parcel that cannot be distributed on a RHEL 7 CDSW host:
      cp <PARCELNAME.cdh5.x.x.p0.123>-el6.parcel <PARCELNAME.cdh5.x.x.p0.123>-el7.parcel
    4. Repeat the previous step for parcel's SHA file.
      cp <PARCELNAME.cdh5.x.x.p0.123>-el6.parcel.sha <PARCELNAME.cdh5.x.x.p0.123>-el7.parcel.sha
    5. Update the new files' owner and permissions to match those of existing parcels in the /opt/cloudera/parcels directory.
      chown cloudera-scm:cloudera-scm <PARCELNAME.cdh5.x.x.p0.123>-el7.parcel
      chown cloudera-scm:cloudera-scm <PARCELNAME.cdh5.x.x.p0.123>-el7.parcel.sha
      chmod 640 <PARCELNAME.cdh5.x.x.p0.123>-el7.parcel
      chmod 640 <PARCELNAME.cdh5.x.x.p0.123>-el7.parcel.sha
      
    You should now be able to add new gateway hosts for Cloudera Data Science Workbench to your cluster.

    Cloudera Bug: OPSAPS-42130, OPSAPS-31880

  • CDSW Service health status after a restart does not match the actual state of the application

    After a restart, the Cloudera Data Science Workbench service in Cloudera Manager will display Good health even though the Cloudera Data Science Workbench web application might need a few more minutes to get ready to serve requests.

  • On a CSD-based deployment, the cdsw status CLI command always reports: "Cloudera Data Science Workbench is not ready yet". This occurs even if the Cloudera Data Science Workbench service is up and running as expected. To see the correct status, use the Cloudera Manager UI.

    Fixed In: Cloudera Data Science Workbench 1.2.2.

    Cloudera Bug: DSE-2927

  • A file descriptor leak causes "Failed to get Kubernetes client configuration" errors in Cloudera Manager.

    Affected Versions: Cloudera Data Science Workbench 1.2.0, 1.2.1.

    Fixed In: Cloudera Data Science Workbench 1.2.2.

    Cloudera Bug: DSE-2910

CDS 2.x Powered By Apache Spark

With Spark 2.2 Release 1, a PySpark application can run only once per active Workbench session

With Spark 2.2 Release 1, you can run a PySpark application only once per active Workbench session. Subsequent runs of the application will fail with a getDelegationToken error.

Affected Versions: Cloudera Distribution of Apache Spark 2.2 Release 1

Fixed In: Cloudera Distribution of Apache Spark 2.2 Release 2. If you cannot upgrade, use one of the following workarounds.

  • Use Cloudera Distribution of Apache Spark 2.1.

    or

  • Launch a new PySpark session every time you want to run the application.

Cloudera Bug: CDH-58475

Certain CDS releases require additional configuration to run PySpark on Cloudera Data Science Workbench versions 1.3.x (and lower)

Affected Versions: Due to a security fix in CDS, there is now a mismatch between the versions of py4j that ship with the following versions of the two products:
  • Cloudera Data Science Workbench 1.3.x (and lower) include py4j 0.10.4, and,
  • CDS 2.1 release 3, CDS 2.2 release 3, and CDS 2.3 release 3 include py4j 0.10.7.

This version mismatch results in PySpark session/job failures on Cloudera Data Science Workbench.

The following error messages are indicative of this issue:
TypeError: __init__() got an unexpected keyword argument 'auth_token'
TypeErrorTraceback (most recent call last)
in engine
----> 1 spark = SparkSession    .builder    .appName("PythonPi")    .getOrCreate()
Use one of the following workarounds to continue running PySpark jobs on Cloudera Data Science Workbench 1.3.x. Alternatively, upgrade to Cloudera Data Science Workbench 1.4.x.
  • Workaround 1: Use the PYTHONPATH environmental variable to use CDS's version of py4j
    This process requires site administrator privileges.
    1. Log in to Cloudera Data Science Workbench.
    2. Click Admin > Engines.
    3. Under the Environmental variables section, add a new variable:
      • Name: PYTHONPATH
      • Value: $SPARK_HOME/python/lib/py4j-0.10.7-src.zip
    4. Click Add.
  • Workaround 2: Install py4j 0.10.7 directly in project sessions

    If you do not want to make site-wide changes as described in the previous workaround, individual users can install py4j 0.10.7 directly in their project's sessions to continue running PySpark.

    For example, in a Python 3 session, run the following command in the workbench command prompt:
    !pip3 install py4j==0.10.7

Fixed Versions: Cloudera Data Science Workbench 1.4.x

Version 1.4 ships with engine 5 that includes a CDS-compatible version of py4j. Note that the upgrade itself will not automatically upgrade existing projects to engine 5. Once the upgrade to 1.4 is complete, project administrators/collaborators must manually configure their projects to use engine 5 as the base image (go to the project's Settings > Engine page).

Cloudera Bug: DSE-4046, DSE-4316

Crashes and Hangs

  • On a deployment with at least one worker node, running cdsw stop or stopping the CDSW service in Cloudera Manager results in the application hanging indefinitely.

    Fixed In: Cloudera Data Science Workbench 1.2.1. If you cannot upgrade to Cloudera Data Science Workbench 1.2.1, use the following workaround.

    Workaround: Stop the master node first, then stop the worker node(s).

    Cloudera Bug: DSE-2880

  • High I/O utilization on the application block device can cause the application to stall or become unresponsive. Users should read and write data directly from HDFS rather than staging it in their project directories.

  • Installing ipywidgets or a Jupyter notebook into a project can cause Python engines to hang due to an unexpected configuration. The issue can be resolved by deleting the installed libraries from the R engine terminal.

GPU Support

Only CUDA-enabled NVIDIA GPU hardware is supported

Cloudera Data Science Workbench only supports CUDA-enabled NVIDIA GPU cards.

Heterogeneous GPU hardware is not supported

You must use the same GPU hardware across a single Cloudera Data Science Workbench deployment.

GPUs are not detected after a machine reboot

This issue occurs because certain NVIDIA modules do not load automatically after a reboot.

Workaround: To work around this issue, use the following steps to manually load the required modules before Cloudera Data Science Workbench services start. The following commands load the nvidia.ko module, create the /dev/nvidiactl device, and create the list of devices at /dev/nvidia0. They will also create the /dev/nvidia-uvm and /dev/nvidia-uvm-tools devices, and assign execute privileges to /etc/rc.modules. Run these commands once on all the machines that have GPU hardware.

# Manually load the required NVIDIA modules
sudo cat >> /etc/rc.modules <<EOMSG
/usr/bin/nvidia-smi
/usr/bin/nvidia-modprobe -u -c=0
EOMSG

# Set execute permission for /etc/rc.modules 
sudo chmod +x /etc/rc.modules

Cloudera Bug: DSE-2847

NVIDIA driver directory mount is not set correctly

GPUs are not detected by Cloudera Data Science Workbench. This is because the mount for the NVIDIA driver directory is set incorrectly.

Affected Versions: Cloudera Data Science Workbench 1.2.1.

Fixed In: Cloudera Data Science Workbench 1.2.2. If you cannot upgrade, use the following workaround.

Workaround: To work around this, you will need to mount the directory specified by the NVIDIA_LIBRARY_PATH in your site administrator settings, and the project's environment.

For example, if NVIDIA_LIBRARY_PATH is set to /var/lib/nvidia-docker/volumes/nvidia_driver/381.22/, go to Admin > Engines and add the NVIDIA driver directory path to the Mounts section:
/var/lib/nvidia-docker/volumes/nvidia_driver/381.22/
Then go to your project's Settings > Engine page and add the LD_LIBRARY_PATH environmental variable to your project. Set it to:
/var/lib/nvidia-docker/volumes/nvidia_driver/381.22/bin:$LD_LIBRARY_PATH

Cloudera Bug: DSE-2957

CUDA engines cannot access GPU libraries in sessions and jobs

The value for the LD_LIBRARY_PATH environmental variable is not propagated to CUDA engines. As a result, CUDA engines will not be able to access the libraries required to enable GPU usage in sessions and jobs.

Affected Versions: Cloudera Data Science Workbench 1.2.0.

Fixed In: Cloudera Data Science Workbench 1.2.1. If you cannot upgrade to Cloudera Data Science Workbench 1.2.1, use the following workaround.

Workaround: As a workaround, set LD_LIBRARY_PATH in your project's environment by going to the project's Settings > Engine page:
LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/cuda/lib:$LD_LIBRARY_PATH

Cloudera Bug: DSE-2828

Networking

  • Custom /etc/hosts entries on Cloudera Data Science Workbench hosts do not propagate to sessions and jobs running in containers.

    Cloudera Bug: DSE-2598

  • Initialisation of Cloudera Data Science Workbench (cdsw init) will fail if localhost does not resolve to 127.0.0.1.

  • Cloudera Data Science Workbench does not support DNS servers running on 127.0.0.1:53. This IP address resolves to the container localhost within Cloudera Data Science Workbench containers. As a workaround, use either a non-loopback address or a remote DNS server.
  • Due to limits in libc, only two DNS servers are supported in /etc/resolv.conf. Kubernetes uses one additional entry for the cluster DNS.

Security

SSH access to Cloudera Data Science Workbench nodes must be disabled

The container runtime and application data storage is not fully secure from untrusted users who have SSH access to the gateway nodes. Therefore, SSH access to the gateway nodes for untrusted users should be disabled for security and resource utilization reasons.

TSB-328: Unauthenticated User Enumeration in Cloudera Data Science Workbench

Unauthenticated users can get a list of user accounts of Cloudera Data Science Workbench.

Affected Versions: Cloudera Data Science Workbench 1.4.0 (and lower)

Fixed Versions: Cloudera Data Science Workbench 1.4.2 (and higher)

Immediate action required: Upgrade to the latest version of Cloudera Data Science Workbench (1.4.2 or higher).

For more details, see the Security Bulletins - TSB-328.

Remote Command Execution and Information Disclosure in Cloudera Data Science Workbench

A configuration issue in Kubernetes used by Cloudera Data Science Workbench can allow remote command execution and privilege escalation in CDSW. A separate information permissions issue can cause the LDAP bind password to be exposed to authenticated CDSW users when LDAP bind search is enabled.

Affected Versions: Cloudera Data Science Workbench 1.3.0 (and lower)

Fixed Versions: Cloudera Data Science Workbench 1.3.1 (and higher)

For more details, see the Security Bulletins - TSB-313.

TLS/SSL

  • Issues with 'cdsw status' and Cloudera Manager health checks

    With TLS/SSL enabled, the cdsw status command does not display the Cloudera Data Science Workbench is ready message.

    Similarly, Cloudera Manager health checks for the CDSW service will also always report Bad health and display the following message: "Web is not yet up."

    Affected Versions and Workarounds:
    • Cloudera Data Science Workbench 1.2.0
    • Cloudera Data Science Workbench 1.2.1, 1.2.2 - This issue occurs on SLES 12 systems when you are using TLS/SSL certificates signed by your organization's internal Certificate Authority (CA). This is due to an incompatibility with the version of Python (v2.7.9) that ships with SLES 12. This issue will also occur on RHEL 7 systems if the system Python version is 2.7.5 (or higher).

    Fixed Versions: Cloudera Data Science Workbench 1.3.0

    Workarounds: Use one of the following methods to work around this issue:
    • Import your organization's root CA certificate into your machine’s trust store.
      To do so, copy the internal CA certificate in PEM format to the /etc/pki/ca-trust/source/anchors/ directory. If the certificate is in OpenSSL’s 'BEGIN TRUSTED CERTIFICATE' format, copy it to the /etc/pki/ca-trust/source directory. Then run:
      sudo update-ca-trust

      OR

    • Use TLS/SSL certificates signed by a well-known public CA.

    Cloudera Bug: DSE-2871, DSE-3090

  • Self-signed certificates where the Certificate Authority is not part of the user's trust store are not supported for TLS termination. For more details, see Enabling TLS/SSL - Limitations.

  • Cloudera Data Science Workbench does not support the use of encrypted private keys for TLS.

    Cloudera Bug: DSE-1708

LDAP

  • LDAP group search filters are currently not supported. To limit access to Cloudera Data Science Workbench to certain groups, use "memberOf" or the equivalent user attribute in LDAP User Filter.

    Fixed Version: Cloudera Data Science Workbench 1.4.0

    Cloudera Bug: DSE-1616

Kerberos

  • PowerBroker-equipped Active Directory is not supported.

    Cloudera Bug: DSE-1838

  • Using Kerberos plugin modules in krb5.conf is not supported.

  • Modifying the default_ccache_name parameter in krb5.conf does not work in Cloudera Data Science Workbench. Only the default path for this parameter, /tmp/krb5cc_${uid}, is supported.

  • When you upload a Kerberos keytab to authenticate yourself to the CDH cluster, Cloudera Data Science Workbench displays a fleeting error message ('cancelled') in the bottom right corner of the screen, even if authentication was successful. This error message can be ignored.

    Cloudera Bug: DSE-2344

  • Cloudera Data Science Workbench does not support the use of a FreeIPA KDC.

    Cloudera Bug: DSE-1482

Jobs API

  • Cloudera Data Science Workbench does not support changing your API key, or having multiple API keys.

  • Currently, you cannot create a job, stop a job, or get the status of a job using the Jobs API.

Engines

  • Packages installed within an extensible engine Dockerfile using pip or pip3 will not be usable by Cloudera Data Science Workbench.

    Fixed In: Cloudera Data Science Workbench 1.2.1.

    Cloudera Bug: DSE-1873

  • Cloudera Data Science Workbench only supports custom extended engines that are based on the Cloudera Data Science Workbench base image.

  • Autofs mounts are not supported with Cloudera Data Science Workbench.

    Cloudera Bug: DSE-2238

  • Cloudera Data Science Workbench does not support pulling images from registries that require Docker credentials.

    Cloudera Bug: DSE-1521

  • When using Conda to install Python packages, you must specify the Python version to match the Python versions shipped in the engine image (2.7.11 and 3.6.1). If not specified, the conda-installed Python version will not be used within a project. Pip (pip and pip3) does not face this issue.

Usability

  • In a scenario where 100s of users are logged in and creating processes, the nproc and nofile limits of the system may be reached. Use ulimits or other methods to increase the maximum number of processes and open files that can be created by a user on the system.

  • When rebooting, Cloudera Data Science Workbench nodes can take a significant amount of time (about 30 minutes) to become ready.

  • Long-running operations such as fork and clone can time out when projects are large or connections outlast the HTTP timeouts of reverse proxies.

  • The Scala kernel does not support autocomplete features in the editor.

  • Scala and R code can sometimes indent incorrectly in the workbench editor.

    Cloudera Bug: DSE-1218