Known Issues and Limitations in Cloudera Data Science Workbench 1.3.x

Installation

CSD installations fail on Oracle Linux 7.3

CSD installations on Oracle Linux 7.3 fail due to a bug in the prepare-node.sh script that fails to account for Oracle Linux as a supported operating system.

Affected Version: Cloudera Data Science Workbench 1.3.0

Fixed Version: Cloudera Data Science Workbench 1.3.1

Workaround: Modify the /opt/cloudera/parcels/CDSW/scripts/prepare-node.sh script to include Oracle Linux in the list of supported operating systems.

Modify line 19 of prepare-node.sh from
if [ "${OS_TYPE}" == "centos" ] || [ "${OS_TYPE}" == "redhatenterpriseserver" ] 
to
if [ "${OS_TYPE}" == "centos" ] || [ "${OS_TYPE}" == "redhatenterpriseserver" ] || [ "${OS_TYPE}" == "oracleserver" ]

Cloudera Bug: DSE-3257

Upgrades

TSB-350: Permanent Fix for Data Loss Risk During Cloudera Data Science Workbench (CDSW) Shutdown and Restart

TSB-346 was released in the time-frame of CDSW 1.4.2 to fix this issue, but it only turned out to be a partial fix. With CDSW 1.4.3, we have fixed the issue permanently and released TSB-350 to address this fix. Note that the script that was provided with TSB-346 still ensures that data loss is prevented and must be used to shutdown/restart all the affected CDSW released listed below.

Affected Versions: Cloudera Data Science Workbench 1.0.x, 1.1.x, 1.2.x, 1.3.x, 1.4.0, 1.4.1, 1.4.2

Fixed Version: Cloudera Data Science Workbench 1.4.3 (and higher)

Cloudera Bug: DSE-5108

The complete text for TSB-350 is available in the Cloudera Security Bulletins: TSB-350: Risk of Data Loss During Cloudera Data Science Workbench (CDSW) Shutdown and Restart.

TSB-346: Risk of Data Loss During Cloudera Data Science Workbench (CDSW) Shutdown and Restart

Stopping Cloudera Data Science Workbench involves unmounting the NFS volumes that store CDSW project directories and then cleaning up a folder where the kubelet stores its temporary state. However, due to a race condition, this NFS unmount process can take too long or fail altogether. If this happens, CDSW projects that remain mounted will be deleted by the cleanup step.

Products affected: Cloudera Data Science Workbench

Releases affected: Cloudera Data Science Workbench versions -
  • 1.0.x

  • 1.1.x

  • 1.2.x

  • 1.3.0, 1.3.1

  • 1.4.0, 1.4.1

Users affected: This potentially affects all CDSW users.

Detected by: Nehmé Tohmé (Cloudera)

Severity (Low/Medium/High): High

Impact: If the NFS unmount fails during shutdown, data loss can occur. All CDSW project files might be deleted.

CVE: N/A

Immediate action required: If you are running any of the affected Cloudera Data Science Workbench versions, you must run the following script on the CDSW master node every time before you stop or restart Cloudera Data Science Workbench. Failure to do so can result in data loss.

This script should also be run before initiating a Cloudera Data Science Workbench upgrade. As always, we recommend creating a full backup prior to beginning an upgrade.

cdsw_protect_stop_restart.sh - Available for download at: cdsw_protect_stop_restart.sh.

#!/bin/bash

set -e

cat << EXPLANATION


This script is a workaround for Cloudera TSB-346. It protects your
CDSW projects from a rare race condition that can result in data loss.
Run this script before stopping the CDSW service, irrespective of whether
the stop precedes a restart, upgrade, or any other task.

Run this script only on the master node of your CDSW cluster.

You will be asked to specify a target folder on the master node where the
script will save a backup of all your project files. Make sure the target
folder has enough free space to accommodate all of your project files. To
determine how much space is required, run 'du -hs /var/lib/cdsw/current/projects'
on the CDSW master node.

This script will first back up your project files to the specified target
folder. It will then temporarily move your project files aside to protect
against the data loss condition. At that point, it is safe to stop the CDSW
service. After CDSW has stopped, the script will move the project files back
into place.

Note: This workaround is not required for CDSW 1.4.2 and higher.



EXPLANATION

read -p "Enter target folder for backups: " backup_target

echo "Backing up to $backup_target..."
rsync -azp /var/lib/cdsw/current/projects "$backup_target"

read -n 1 -p "Backup complete. Press enter when you are ready to stop CDSW: "

echo "Deleting all Kubernetes resources..."
kubectl delete configmaps,deployments,daemonsets,replicasets,services,ingress,secrets,persistentvolumes,persistentvolumeclaims,jobs --all
kubectl delete pods --all

echo "Temporarily saving project files to /var/lib/cdsw/current/projects_tmp..."
mkdir /var/lib/cdsw/current/projects_tmp
mv /var/lib/cdsw/current/projects/* /var/lib/cdsw/current/projects_tmp

echo -e "Please stop the CDSW service."

read -n 1 -p "Press enter when CDSW has stopped: "

echo "Moving projects back into place..."
mv /var/lib/cdsw/current/projects_tmp/* /var/lib/cdsw/current/projects
rm -rf /var/lib/cdsw/current/projects_tmp

echo -e "Done. You may now upgrade or start the CDSW service."
echo -e "When CDSW is running, if desired, you may delete the backup data at $backup_target"

Addressed in release/refresh/patch: This issue is fixed in Cloudera Data Science Workbench 1.4.2.

Note that you are required to run the workaround script above when you upgrade from an affected version to a release with the fix. This helps guard against data loss when the affected version needs to be shut down during the upgrade process.

For the latest update on this issue see the corresponding Knowledge article:

TSB 2018-346: Risk of Data Loss During Cloudera Data Science Workbench (CDSW) Shutdown and Restart

Upgrading from Cloudera Data Science Workbench 1.1.x requires a change in the proxy configuration

If you are using a proxy server, you must ensure that the IP addresses for the web and Livelog services are skipped from the proxy.

Depending on your deployment (CSD or RPM), append the following IP addresses to either the No Proxy property in the Cloudera Manager CDSW service, or to the NO_PROXY parameter in cdsw.conf.
100.77.0.129
100.77.0.130

These have been also been added to the installation instructions.

Fixed Version: Cloudera Data Science Workbench 1.4.0

Cloudera Bug: DSE-2948

CDH Integration

Cloudera Data Science Workbench (1.4.x and lower) is not supported with Cloudera Manager 6.0.0 and CDH 6.0.0.

Cloudera Data Science Workbench will be supported with Cloudera Enterprise 6 in a future release.

CDH client configuration changes require a full Cloudera Data Science Workbench reset

Cloudera Data Science Workbench does not automatically detect configuration changes on the CDH cluster. Therefore, any changes made to CDH services, ranging from updates to service configuration properties to complete parcel upgrades, must be followed by a full reset of Cloudera Data Science Workbench.

Affected Versions: Cloudera Data Science Workbench 1.2.x, 1.3.x

Workaround: Depending on your deployment, use one of the following sets of steps to perform a full reset of Cloudera Data Science Workbench. Note that this reset does not impact your data in any way.
  • CSD Deployments - To reset Cloudera Data Science Workbench using Cloudera Manager:
    1. Log into the Cloudera Manager Admin Console.
    2. On the Cloudera Manager homepage, click to the right of the CDSW service and select Restart. Confirm your choice on the next screen and wait for the action to complete.
    OR
  • RPM Deployments - Run the following steps on the Cloudera Data Science Workbench master node.

    cdsw reset
    cdsw init

Cloudera Manager Integration

  • Cloudera Data Science Workbench (1.4.x and lower) is not supported with Cloudera Manager 6.0.0 and CDH 6.0.0.

    Cloudera Data Science Workbench will be supported with Cloudera Enterprise 6 in a future release.

  • CSD distribution/activation fails on mixed-OS clusters when there are third-party parcels running on OSs that are not supported by Cloudera Data Science Workbench

    For example, adding a new CDSW gateway host on a RHEL 6 cluster running RHEL-6 compatible parcels will fail. This is because Cloudera Manager will not allow distribution of the RHEL 6 parcels on the new host which will likely be running a CDSW-compatible operating system such as RHEL 7.

    Affected Versions: Cloudera Data Science Workbench 1.2.x, 1.3.x

    Workaround: To ensure adding a new CDSW gateway host is successful, you must create a copy of the 'incompatible' third-party parcel files and give them the corresponding RHEL 7 names so that Cloudera Manager allows them to be distributed on the new gateway host. Use the following sample instructions to do so:
    1. SSH to the Cloudera Manager Server host.
    2. Navigate to the directory that contains all the parcels. By default, this is /opt/cloudera/parcel-repo.
      cd /opt/cloudera/parcel-repo
    3. Make a copy of the incompatible third-party parcel with the new name. For example, if you have a RHEL 6 parcel that cannot be distributed on a RHEL 7 CDSW host:
      cp <PARCELNAME.cdh5.x.x.p0.123>-el6.parcel <PARCELNAME.cdh5.x.x.p0.123>-el7.parcel
    4. Repeat the previous step for parcel's SHA file.
      cp <PARCELNAME.cdh5.x.x.p0.123>-el6.parcel.sha <PARCELNAME.cdh5.x.x.p0.123>-el7.parcel.sha
    5. Update the new files' owner and permissions to match those of existing parcels in the /opt/cloudera/parcel-repo directory.
      chown cloudera-scm:cloudera-scm <PARCELNAME.cdh5.x.x.p0.123>-el7.parcel
      chown cloudera-scm:cloudera-scm <PARCELNAME.cdh5.x.x.p0.123>-el7.parcel.sha
      chmod 640 <PARCELNAME.cdh5.x.x.p0.123>-el7.parcel
      chmod 640 <PARCELNAME.cdh5.x.x.p0.123>-el7.parcel.sha
      
    You should now be able to add new gateway hosts for Cloudera Data Science Workbench to your cluster.

    Cloudera Bug: OPSAPS-42130, OPSAPS-31880

  • CDSW Service health status after a restart does not match the actual state of the application

    After a restart, the Cloudera Data Science Workbench service in Cloudera Manager will display Good health even though the Cloudera Data Science Workbench web application might need a few more minutes to get ready to serve requests.

  • Cloudera Data Science Workbench diagnostics data might be missing from Cloudera Manager diagnostic bundles.

    This occurs because the default timeout for Cloudera Manager data collection is currently set to 3 minutes. However, in the case of Cloudera Data Science Workbench, collecting metrics and logs using the cdsw logs command can take longer than 3 minutes.

    Affected Versions: Cloudera Data Science Workbench 1.2.x, 1.3.x

    Workaround: Use the following steps to modify the default timeout for Cloudera Data Science Workbench data collection:
    1. Login to the Cloudera Manager Admin Console.
    2. Go to the CDSW service.
    3. Click Configuration.
    4. Search for the Docker Daemon Diagnostics Collection Timeout property and set it to 5 minutes.
    5. Click Save Changes.

    Alternatively, you can generate a diagnostic bundle by running the cdsw logs command directly on the Master node.

    Cloudera Bug: DSE-3160

CDS Powered by Apache Spark

Certain CDS releases require additional configuration to run PySpark on Cloudera Data Science Workbench versions 1.3.x (and lower)

Affected Versions: Due to a security fix in CDS, there is now a mismatch between the versions of py4j that ship with the following versions of the two products:
  • Cloudera Data Science Workbench 1.3.x (and lower) include py4j 0.10.4, and,
  • CDS 2.1 release 3, CDS 2.2 release 3, and CDS 2.3 release 3 include py4j 0.10.7.

This version mismatch results in PySpark session/job failures on Cloudera Data Science Workbench.

The following error messages are indicative of this issue:
TypeError: __init__() got an unexpected keyword argument 'auth_token'
TypeErrorTraceback (most recent call last)
in engine
----> 1 spark = SparkSession    .builder    .appName("PythonPi")    .getOrCreate()
Use one of the following workarounds to continue running PySpark jobs on Cloudera Data Science Workbench 1.3.x. Alternatively, upgrade to Cloudera Data Science Workbench 1.4.x.
  • Workaround 1: Use the PYTHONPATH environmental variable to use CDS's version of py4j
    This process requires site administrator privileges.
    1. Log in to Cloudera Data Science Workbench.
    2. Click Admin > Engines.
    3. Under the Environmental variables section, add a new variable:
      • Name: PYTHONPATH
      • Value: $SPARK_HOME/python/lib/py4j-0.10.7-src.zip
    4. Click Add.
  • Workaround 2: Install py4j 0.10.7 directly in project sessions

    If you do not want to make site-wide changes as described in the previous workaround, individual users can install py4j 0.10.7 directly in their project's sessions to continue running PySpark.

    For example, in a Python 3 session, run the following command in the workbench command prompt:
    !pip3 install py4j==0.10.7

Fixed Versions: Cloudera Data Science Workbench 1.4.x

Version 1.4 ships with engine 5 that includes a CDS-compatible version of py4j. Note that the upgrade itself will not automatically upgrade existing projects to engine 5. Once the upgrade to 1.4 is complete, project administrators/collaborators must manually configure their projects to use engine 5 as the base image (go to the project's Settings > Engine page).

Cloudera Bug: DSE-4046, DSE-4316

Spark lineage collection is not supported with Cloudera Data Science Workbench

Lineage collection is enabled by default in Spark 2.3. This feature does not work with Cloudera Data Science Workbench because the lineage log directory is not automatically mounted into CDSW engines when a session/job is started.

Affected Versions: CDS 2.3 release 2 (and higher) Powered By Apache Spark

With Spark 2.3 release 3, if Spark cannot find the lineage log directory, it will automatically disable lineage collection for that application. Spark jobs will continue to execute in Cloudera Data Science Workbench, but lineage information will not be collected.

With Spark 2.3 release 2, Spark jobs will fail in Cloudera Data Science Workbench. Either upgrade to Spark 2.3 release 3 which includes a partial fix (as described above) or use one of the following workarounds to disable Spark lineage:
  • Workaround 1: Disable Spark Lineage Per-Project in Cloudera Data Science Workbench

    To do this, set spark.lineage.enabled to false in a spark-defaults.conf file in your Cloudera Data Science Workbench project. This will need to be done individually for each project as required.

  • Workaround 2: Disable Spark Lineage for the Cluster

    1. Log in to Cloudera Manager and go to the Spark 2 service.
    2. Click Configuration.
    3. Search for the Enable Lineage Collection property and uncheck the checkbox to disable lineage collection.
    4. Click Save Changes.
    5. Go back to the Cloudera Manager homepage and restart the CDSW service for this change to go into effect.

Cloudera Bug: DSE-3720, CDH-67643

Crashes and Hangs

  • High I/O utilization on the application block device can cause the application to stall or become unresponsive. Users should read and write data directly from HDFS rather than staging it in their project directories.

  • Installing ipywidgets or a Jupyter notebook into a project can cause Python engines to hang due to an unexpected configuration. The issue can be resolved by deleting the installed libraries from the R engine terminal.

GPU Support

Only CUDA-enabled NVIDIA GPU hardware is supported

Cloudera Data Science Workbench only supports CUDA-enabled NVIDIA GPU cards.

Heterogeneous GPU hardware is not supported

You must use the same GPU hardware across a single Cloudera Data Science Workbench deployment.

GPUs are not detected after a machine reboot

This issue occurs because certain NVIDIA modules do not load automatically after a reboot.

Workaround: To work around this issue, use the following steps to manually load the required modules before Cloudera Data Science Workbench services start. The following commands load the nvidia.ko module, create the /dev/nvidiactl device, and create the list of devices at /dev/nvidia0. They will also create the /dev/nvidia-uvm and /dev/nvidia-uvm-tools devices, and assign execute privileges to /etc/rc.modules. Run these commands once on all the machines that have GPU hardware.

# Manually load the required NVIDIA modules
sudo cat >> /etc/rc.modules <<EOMSG
/usr/bin/nvidia-smi
/usr/bin/nvidia-modprobe -u -c=0
EOMSG

# Set execute permission for /etc/rc.modules 
sudo chmod +x /etc/rc.modules

Cloudera Bug: DSE-2847

Networking

  • Custom /etc/hosts entries on Cloudera Data Science Workbench hosts do not propagate to sessions and jobs running in containers.

    Cloudera Bug: DSE-2598

  • Initialisation of Cloudera Data Science Workbench (cdsw init) will fail if localhost does not resolve to 127.0.0.1.

  • Cloudera Data Science Workbench does not support DNS servers running on 127.0.0.1:53. This IP address resolves to the container localhost within Cloudera Data Science Workbench containers. As a workaround, use either a non-loopback address or a remote DNS server.
  • Due to limits in libc, only two DNS servers are supported in /etc/resolv.conf. Kubernetes uses one additional entry for the cluster DNS.

Security

TSB-328: Unauthenticated User Enumeration in Cloudera Data Science Workbench

Unauthenticated users can get a list of user accounts of Cloudera Data Science Workbench.

Affected Versions: Cloudera Data Science Workbench 1.4.0 (and lower)

Fixed Versions: Cloudera Data Science Workbench 1.4.2 (and higher)

Immediate action required: Upgrade to the latest version of Cloudera Data Science Workbench (1.4.2 or higher).

For more details, see the Security Bulletins - TSB-328.

SSH access to Cloudera Data Science Workbench nodes must be disabled

The container runtime and application data storage is not fully secure from untrusted users who have SSH access to the gateway nodes. Therefore, SSH access to the gateway nodes for untrusted users should be disabled for security and resource utilization reasons.

Remote Command Execution and Information Disclosure in Cloudera Data Science Workbench

A configuration issue in Kubernetes used by Cloudera Data Science Workbench can allow remote command execution and privilege escalation in CDSW. A separate information permissions issue can cause the LDAP bind password to be exposed to authenticated CDSW users when LDAP bind search is enabled.

Affected Versions: Cloudera Data Science Workbench 1.3.0 (and lower)

Fixed Versions: Cloudera Data Science Workbench 1.3.1 (and higher)

For more details, see the Security Bulletins - TSB-313.

TLS/SSL

  • Self-signed certificates where the Certificate Authority is not part of the user's trust store are not supported for TLS termination. For more details, see Enabling TLS/SSL - Limitations.

  • Cloudera Data Science Workbench does not support the use of encrypted private keys for TLS.

    Cloudera Bug: DSE-1708

LDAP

  • Fixed Version: Cloudera Data Science Workbench 1.4.0

    Cloudera Bug: DSE-1616

Kerberos

  • PowerBroker-equipped Active Directory is not supported.

    Cloudera Bug: DSE-1838

  • Using Kerberos plugin modules in krb5.conf is not supported.

  • Modifying the default_ccache_name parameter in krb5.conf does not work in Cloudera Data Science Workbench. Only the default path for this parameter, /tmp/krb5cc_${uid}, is supported.

  • When you upload a Kerberos keytab to authenticate yourself to the CDH cluster, Cloudera Data Science Workbench might display a fleeting error message ('cancelled') in the bottom right corner of the screen, even if authentication was successful. This error message can be ignored.

    Cloudera Bug: DSE-2344

  • Cloudera Data Science Workbench does not support the use of a FreeIPA KDC.

    Cloudera Bug: DSE-1482

Jobs API

  • Cloudera Data Science Workbench does not support changing your API key, or having multiple API keys.

  • Currently, you cannot create a job, stop a job, or get the status of a job using the Jobs API.

Engines

  • Spawning remote workers fails in R when the env parameter is not set. For more details, see Spawning Workers.

    Cloudera Bug: DSE-3384

  • Cloudera Bug: DSE-1521

  • Autofs mounts are not supported with Cloudera Data Science Workbench.

    Cloudera Bug: DSE-2238

  • When using Conda to install Python packages, you must specify the Python version to match the Python versions shipped in the engine image (2.7.11 and 3.6.1). If not specified, the conda-installed Python version will not be used within a project. Pip (pip and pip3) does not face this issue.

Custom Engine Images

  • Cloudera Data Science Workbench only supports custom extended engines that are based on the Cloudera Data Science Workbench base image.

  • Cloudera Data Science Workbench does not support pulling images from registries that require Docker credentials.

  • Cloudera Data Science Workbench does not support creation of custom engines larger than 10 GB.

    Cloudera Bug: DSE-4420

Usability

  • In a scenario where 100s of users are logged in and creating processes, the nproc and nofile limits of the system may be reached. Use ulimits or other methods to increase the maximum number of processes and open files that can be created by a user on the system.

  • When rebooting, Cloudera Data Science Workbench nodes can take a significant amount of time (about 30 minutes) to become ready.

  • Long-running operations such as fork and clone can time out when projects are large or connections outlast the HTTP timeouts of reverse proxies.

  • The Scala kernel does not support autocomplete features in the editor.

  • Scala and R code can sometimes indent incorrectly in the workbench editor.

    Cloudera Bug: DSE-1218

  • Installation of the XML package fails in the R kernel.

    Cloudera Bug: DSE-2201