Troubleshooting Installation Issues

Review installation logs 🔗

When debugging installation and "first run" issues with Cloudera Data Science Workbench, it is important to review the "role" logs, which are logged by the Cloudera Manager Agent when the services tries to start. These logs will show any issues that occur with the actual host, before the kubernetes and docker systems even start. These logs are located at: /var/run/cloudera-scm-agent/process/[XXX-ROLE]/logs.

These logs are sorted based on the role type (master, application, docker-daemon, worker) and are prepended with an incremental ID so you can find the latest log. When viewing these logs, you can ignore any line that begins with a + symbol, so the best way to view these would be, for example: grep -v ^+ /var/run/cloudera-scm-agent/process/67-CDSW_DOCKER/logs/stderr.log

Stop any existing CDSW processes🔗

If CDSW was not shut down properly through Cloudera Manager, then sometimes old CDSW unix processes can be left intact and can cause problems with starting the application. To ensure this is not the case:

Stop all CDSW roles from Cloudera Manager
Run the following command on all CDSW nodes to kill all CDSW, Docker, and Kubernets proceses, including the stale processes:for i in `ps -ef |grep -e cdsw -e docker -e kube|egrep -v grep|awk '{print $2}'`; do kill -9 $i; done
Start the CDSW roles from Cloudera Manager again

Illegal IP address🔗

Starting with CDSW 1.10.5, when starting CSDW for the first time, the Master role may not start, and you might see errors similar to the following from the master role logs:

Exception encountered: [illegal IP address string passed to inet_pton]
ERROR:: Unable validate IP address [b'10.10.xx.xx'].: 1
ERROR:: Unable to disable export for [b'10.17.xx.xx'].: 1

This can be resolved by adding the IP address of the Master node in the Cloudera Manager > CDSW > Configuration >Master Node IPv4 Address

Stop running security software🔗

As per Networking and Security Requirements, CDSW may have trouble when security tools are running on the master host - this is because these tools often prevent the system from creating the internal virtual network. To test this, ensure that all security tools have been stopped on the master node before starting CDSW. Once CDSW is started and running, these tools can be re-enabled.

Preexisting iptables rules not supported🔗

WARNING: Cloudera Data Science Workbench requires iptables, but does not support preexisting iptables rules.

Kubernetes makes extensive use of iptables. However, it’s hard to know how pre-existing iptables rules will interact with the rules inserted by Kubernetes. Therefore, Cloudera recommends you run the following commands to clear all pre-existing rules before you proceed with the installation.

sudo iptables -P INPUT ACCEPT
sudo iptables -P FORWARD ACCEPT
sudo iptables -P OUTPUT ACCEPT
sudo iptables -t nat -F
sudo iptables -t mangle -F
sudo iptables -F
sudo iptables -X

The warning can be ignored after you clear the pre-existing rules or are sure that there are no pre-existing iptables rules.

Remove the entry corresponding to /dev/xvdc from /etc/fstab🔗

Cloudera Data Science Workbench installs a custom filesystem on its Application and Docker block devices. These filesystems will be used to store user project files and Docker engine images respectively. Therefore, Cloudera Data Science Workbench requires complete access to the block devices. To avoid losing any existing data, make sure the block devices allocated to Cloudera Data Science Workbench are reserved only for the workbench.

Linux sysctl kernel configuration errors🔗

Kubernetes and Docker require non-standard kernel configuration. Depending on the existing state of your kernel, this might result in sysctl errors such as:

sysctl net.bridge.bridge-nf-call-iptables must be set to 1

This is because the settings in /etc/sysctl.conf conflict with the settings required by Cloudera Data Science Workbench. Cloudera cannot make a blanket recommendation on how to resolve such errors because they are specific to your deployment. Cluster administrators may choose to either remove or modify the conflicting value directly in /etc/sysctl.conf, remove the value from the conflicting configuration file, or even delete the module that is causing the conflict.

To start diagnosing the issue, run the following command to see the list of configuration files that are overwriting values in /etc/sysctl.conf.

SYSTEMD_LOG_LEVEL=debug /usr/lib/systemd/systemd-sysctl

You will see output similar to:

Parsing /usr/lib/sysctl.d/00-system.conf
Parsing /usr/lib/sysctl.d/50-default.conf
Parsing /etc/sysctl.d/99-sysctl.conf
Overwriting earlier assignment of net/bridge/bridge-nf-call-ip6tables in file '/etc/sysctl.d/99-sysctl.conf'.
Overwriting earlier assignment of net/bridge/bridge-nf-call-ip6tables in file '/etc/sysctl.d/99-sysctl.conf'.
Overwriting earlier assignment of net/bridge/bridge-nf-call-ip6tables in file '/etc/sysctl.d/99-sysctl.conf'.
Parsing /etc/sysctl.d/k8s.conf
Overwriting earlier assignment of net/bridge/bridge-nf-call-iptables in file '/etc/sysctl.d/k8s.conf'.
Parsing /etc/sysctl.conf
Overwriting earlier assignment of net/bridge/bridge-nf-call-ip6tables in file '/etc/sysctl.conf'.
Overwriting earlier assignment of net/bridge/bridge-nf-call-ip6tables in file '/etc/sysctl.conf'.
Setting 'net/ipv4/conf/all/promote_secondaries' to '1'
Setting 'net/ipv4/conf/default/promote_secondaries' to '1'
...

/etc/sysctl.d/k8s.conf is the configuration added by Cloudera Data Science Workbench. Administrators must make sure that no other file is overwriting values set by /etc/sysctl.d/k8s.conf.

CDH parcels not found at /opt/cloudera/parcels🔗

There are two possible reasons for this warning:

If you are using a custom parcel directory, you can ignore the warning and proceed with the installation. Once the Cloudera Data Science Workbench is running, set the path to the CDH parcel in the admin dashboard.
This warning can be an indication that you have not added gateway roles to the Cloudera Data Science Workbench hosts. In this case, do not ignore the warning. Exit the installer and go to Cloudera Manager to add gateway roles to the cluster.

CDSW docker daemons fail to start 🔗

CDSW docker daemons fail to start with the following error:

Error starting daemon: error initializing graphdriver: devmapper: Unable to take ownership of thin-pool (docker-thinpool) that already has used data blocks.

This issue occurs when the block devices you specified for the Docker Block Device field already have data on them. This is a safeguard to prevent block devices from being wiped inadvertently. Note that resolving this resolving this issue involves deleting data from the block devices.

To resolve this issue, perform the following steps:

Verify that it is okay to delete the data on the block device.
SSH to the Cloudera Data Science Workbench master host.

Run the following script:

/opt/cloudera/parcels/CDSW/scripts/teardown-docker.sh

In the Cloudera Manager Admin Console, select the Cloudera Data Science Workbench service.
On the Instances tab, select the Docker Daemons.
Click Actions for Selected (n) > Prepare Node.
Start the Cloudera Data Science Workbench service by clicking Actions > Start.

User Process Limit🔗

During host validation, you may encounter the following warning message:

{WARN} Cloudera Data Science Workbench recommends that all users have a max-user-processes limit of at least 65536.

This message appears if the user process limit is under 65536. You can increase the user process limit by adding the following line to the /etc/security/limits.conf file:

ulimit -u 65536

Set this configuration on every Cloudera Data Science Workbench host. You can also edit /etc/security/limits.conf to configure the user process limit.

Open Files Limit🔗

During host validation, you may encounter the following warning message:

{WARN} Cloudera Data Science Workbench recommends that all users have a max-open-files limit set to 1048576.

This message appears if the open files limit is under 1048576. Note that on HDP clusters, the open file limit recommendation is 10000 at a minimum. Cloudera recommends a higher limit for clusters with Cloudera Data Science Workbench.

You can configure the file limit with the following command:

ulimit -n 1048576

Set this configuration on every Cloudera Data Science Workbench host. You can also edit /etc/security/limits.conf to configure the open files limit.

Disable SE Linux🔗

During installation, you may encounter the following message:

Please disable SELinux by setting SELINUX=disabled|permissive in /etc/selinux/config, then reboot or using setenforce 0 command"

SELinux enforces additional control policies for what a user, process, or daemon can do. If SELinux is enabled or not in permissive mode, Cloudera Data Science Workbench may not have the proper permissions to run.

To resolve this issue, you must change the SELinux mode on every host by doing one of the following:

Edit the configuration file for SELinux and set it to disabled or permissive. Note that if you set SELinux to permissive mode, events such as access denials will be logged, but the denial will not be enforced. You can find the SELinux configuration file in the following location: /etc/selinux/config.
Run the following command: setenforce 0. This command disables SELinux completely.

DNS is not configured properly🔗

During installation, you might encounter the messages such as:

DNS doesn't resolve <CDSW_domain> to <CDSW_Master_IP_address>; DNS is not configured properly

or

DNS doesn't resolve <CDSW_Master_IP_address> to <CDSW_domain>; DNS is not configured properly"

This indicates that the CDSW domain name configured does not resolve to the IP address of the Master host. You must enable DNS forward and reverse lookup for the CDSW domain and IP address to proceed.