Troubleshooting Cloudera Director
This topic contains information on problems that can occur when you set up, configure, or use Cloudera Director, their causes, and their solutions.
Viewing Cloudera Director Logs
- Cloudera Director client
- One shared log file per user account:
$HOME/.cloudera-director/logs/application.log
- One shared log file per user account:
- Cloudera Director server
- One file for all clusters:
/var/log/cloudera-director-server/application.log
- One file for all clusters:
- Cloudera Director client: /etc/cloudera-director-client/logback.xml
- Cloudera Director server: /etc/cloudera-director-server/logback.xml
<root level="DEBUG">
<logger name="com.cloudera.launchpad" level="DEBUG"/>
The logback.xml file can be reconfigured in many other ways to adjust how logging is performed. See Logback configuration in the Logback project documentation to learn more. Note that major changes to log format and contents will hamper the effectiveness of Cloudera Support, if you should need to forward logs to them as part of troubleshooting.
Configuring Tag-on-create for AWS GovCloud (US) and China (Beijing) Regions
In most AWS regions, Cloudera Director assigns a tag during the creation of each instance it creates to facilitate instance management. The GovCloud (US) and China (Beijing) regions do not support tagging of instances on creation, so for instances in these regions, the tag is created after the instance is created.
useTagOnCreate: false
The aws-plugin.conf file can be found at /var/lib/cloudera-director-plugins/aws-provider-plugin_version/etc/ on your Cloudera Director EC2 instance.
Backing Up the H2 Embedded Database
/var/lib/cloudera-director-server/state.h2.db
Back up the state.h2.db file to avoid losing environment and cluster data. To ensure that your backup copy can be restored, use the H2 backup tools instead of simply copying the file. For more information, see the H2 Tutorial.
Bootstrap fails in Azure when custom image has an attached data disk and dataDiskCount is not 0
Symptom
Bootstrap fails in Azure when a custom image is used that has an attached data disk and dataDiskCount is not set to 0. The error message displayed is, "Cannot specify user image overrides for a disk already defined in the specified image reference."
Cause
This error originates in Azure. It occurs because the Azure image has a data disk attached, while the dataDiskCount value wrongly indicates that Cloudera Director is trying to attach an additional disk or disks. The conflict causes an error to be thrown.
Solution
If you deploy a cluster in Azure with a custom image that has a data disk attached, you must set dataDiskCount to 0. You can use the Azure Portal to check if your custom image has a data disk attached. If you simply comment out the dataDiskCount setting, it will default to 5. Bootstrap fails if the dataDiskCount value is not 0. See Deploying Clusters with Custom Images.
Slow or Failed OS Updates in Some AWS Regions
Symptom
In AWS, Cloudera Director triggers operating system updates and performs software downloads on instances it allocates in your chosen region. Depending on the local network configuration, these updates and downloads might go slowly or fail.
Solutions
-
Disable instance normalization. This causes Cloudera Director to not perform usual automated, general work on new instances. You should replace that work with your own, either by building a custom AMI with the work already accomplished, or by using a bootstrap script. Normalization can be disabled using a configuration file.
-
Create a preloaded AMI. Cloudera Director can avoid downloading Cloudera Manager and CDH software if it is already present in expected locations on instances. This also speeds up deployment and cluster bootstrap processes, even when download speeds from Cloudera repositories are reasonable. See the documentation for more information.
-
Mirror Cloudera repositories. Instead of preloading an AMI with Cloudera software, you can host them at local mirrors, and point Cloudera Director to them as alternative download locations. As with preloaded AMIs, taking this step can speed up bootstrap processes, and make your architecture less vulnerable to network problems. See the documentation for more information.
Cloudera Director Bootstrap Fails with DNS Error
Cloudera Director Bootstrap Fails with IAM Permissions Error
Symptom
ErrorInfo{code=PROVIDER_EXCEPTION, properties={message=User: arn:aws:sts::code:assumed-role/ClouderaDirector-Director-instance is not authorized to perform: iam:GetInstanceProfile on resource: instance profile test}
Solution
Configure the required IAM permissions. Check the list of required IAM permissions: Creating AWS Identity and Access Management (IAM) Policies.
Cloudera Manager API Call Fails
Cloudera Director Cannot Manage a Cluster That Was Kerberized Through Cloudera Manager
Symptom
Cloudera Director cannot manage a cluster after Cloudera Manager is used to enable Kerberos on the cluster.
Cause
Once a cluster is deployed through Cloudera Director, some changes to the cluster that are made using Cloudera Manager cause Cloudera Director to be out of sync and unable to manage the cluster. See Cloudera Director and Cloudera Manager Usage.
RDS Name Conflicts
New Cluster Fails to Start Because of Missing Roles
Cause
Cloudera Director does not validate that all required roles are assigned when provisioning a cluster. This can lead to failures during the intial run of a new cluster. For example, if the gateway instance group was removed, but the Flume Agent and Kafka Broker were assigned to roles in that group, the cluster fails to start.
Cloudera Director Server Will Not Start with Unsupported Java Version
Symptom
Exception in thread "main" java.lang.UnsupportedClassVersionError: com/cloudera/launchpad/Server : Unsupported major.minor version 51.0
Error Occurs if Tags Contain Unquoted Special Characters
Symptom
com.typesafe.config.ConfigException$WrongType: ... <x> has type OBJECT rather than STRING
DNS Issues
Symptom
[27/Mar/2017 20:26:16 +0000] 12596 Thread-13 https ERROR Failed to retrieve/store URL: http://ip-10-202-202-109.ec2.internal:7180/cmf/parcel/download/CDH-5.10.0-1.cdh5.10.0.p0.41-el7.parcel.torrent -> /opt/cloudera/parcel-cache/CDH-5.10.0-1.cdh5.10.0.p0.41-el7.parcel.torrent <urlopen error [Errno -2] Name or service not known>
Cause
- DNS Hostnames is not set to Yes in the Edit DNS Hostnames VPC configuration setting.
- The Amazon Virtual Private Cloud (VPC) is not set up for forward and reverse hostname resolution. Forward and reverse DNS resolution is a requirement for many components of the Cloudera EDH platform, including Cloudera Director.
In the AWS Management Console, go to VPC. In the VPC Dashboard, select your VPC and click Action. In the shortcut menu, click Edit DNS Hostnames and click Yes. If this does not fix the issue, continue with the instructions that follow to configure forward and reverse hostname resolution.
and clickpython -c "import socket; print socket.getfqdn(); print socket.gethostbyname(socket.getfqdn())"
For more information on DNS and Amazon VPCs, see DHCP Options Sets in the Amazon VPC documentation.
-
Log in to the AWS Management Console.
-
Select VPC from the Services navigation list box.
-
In the left pane, click Your VPCs. A list of currently configured VPCs is displayed.
-
Select the VPC you are using and note the DHCP options set ID.
-
In the left pane, click DHCP Option Sets. A list of currently configured DHCP Option Sets is displayed.
-
Select the option set used by the VPC.
-
Check for an entry similar to the following and make sure the domain-name is specified. For example:
domain-name = ec2.internal domain-name-servers = AmazonProvidedDNS
-
If it is not configured correctly, create a new DHCP option set for the specified region and assign it to the VPC. For information on how to specify the correct domain name, see the AWS Documentation.
Server Does Not Start
Symptom
The Cloudera Director server does not start or quickly exits with an Out of Memory exception.
Solution
Run Cloudera Director on an instance that has at least 1 GB of free memory. See Resource Requirements for more details on Cloudera Director hardware requirements.
Problem When Removing Hosts from a Cluster
Cause
You are trying to shrink the cluster below the HDFS replication factor. See How to Remove Instances from a Cluster (Note) for more information about replication factors.
Problems Connecting to Cloudera Director Server
Cause
Configuration of security group and iptables settings. For more information about configuring security groups, see Preparing Your AWS EC2 Resources. For commands to turn off iptables, see either Installing Cloudera Director Server and Client on the EC2 Instance or Installing Cloudera Director Server and Client on Google Compute Engine. Some operating systems have IP tables turned on by default, and they must be turned off.