Known Issues and Workarounds in Cloudera Manager 5
The following sections describe the current known issues in Cloudera Manager 5.
Cloudera Manager 6.x issue with the service role Resume
If a selected service role on a node is restarted and fails, and the customer clicks the "Resume" button in Cloudera Manager, the service role on all of the nodes will be restarted concurrently.
Products affected: Cloudera Manager
- Cloudera Manager 5.5 and later
- Cloudera Manager 6.0 until 6.3.3
- Cloudera Manager 7.1.x
Users affected: Users with admin role in Cloudera Manager can impact end users of the service.
Impact: In production clusters this can result in a cluster-wide service outage; Already observed for the YARN service and the HDFS service in a few clusters.
Severity: High
- A workaround exists where instead of performing a restart we recommend performing a stop/start of the services.
- Issue is fixed in CM-6.3.4, CM-7.2.1 and above.
Knowledge Article: For the latest update on this issue see the corresponding Knowledge article: Cloudera Customer Advisory: Cloudera Manager 6.x issue with service role Resume
ZooKeeper JMX did not support TLS when managed by Cloudera Manager
The ZooKeeper service optionally exposes a JMX port used for reporting and metrics. By default, Cloudera Manager enables this port, but prior to Cloudera Manager 6.1.0, it did not support mutual TLS authentication on this connection. While JMX has a password-based authentication mechanism that Cloudera Manager enables by default, weaknesses have been found in the authentication mechanism, and Oracle now advises JMX connections to enable mutual TLS authentication in addition to password-based authentication. A successful attack may leak data, cause denial of service, or even allow arbitrary code execution on the Java process that exposes a JMX port. Beginning in Cloudera Manager 6.1.0, it is possible to configure mutual TLS authentication on ZooKeeper’s JMX port.
Products affected: ZooKeeper
Releases affected: Cloudera Manager 6.1.0 and lower, Cloudera Manager 5.16 and lower
Users affected: All
Date/time of detection: June 7, 2018
Severity (Low/Medium/High): 9.8 High (CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H)
Impact: Remote code execution
CVE: CVE-2018-11744
Addressed in release/refresh/patch: Cloudera Manager 6.1.0
Yarn Application Masters fail when a concurrent Deploy Client Configuration job is executed
Yarn Application Masters fail because the container-executor.cfg file is missing when a YARN application is executed concurrently with a Deploy Client Configurations job. The container-executor hierarchy has been moved to /var/lib/yarn-ce, which is not modified by the Deploy Client Configuration command.
Cloudera Bug: OPSAPS-24398
- On the Cloudera Manager server host, add the following line to the /etc/default/cloudera-scm-server file:
export CMF_FF_YARN_SAFE_CONTAINER_EXECUTOR_DIR=true
- Restart the Cloudera Manager Server:
service cloudera-scm-server restart
- Restart the NodeManager(s) so they can pick up the new configuration change:
- In the Cloudera Manager Admin console, go to the YARN service.
- Click the Instances tab.
- Select all hosts with the NodeManager Role Type.
- Click .
Fixed Versions: Cloudera Manager/CDH 5.16.1 5.16.0 5.15.1, 5.14.4
Affected Versions Cloudera Manager 5.15 and lower
Cloudera Manager Agent Failed to deactivate alternatives for parcels
In the Cloudera Manager host's agent logs, during the agent startup there are many repeated error entries if the listed parcel in the error logs has a "Distributed" status on the Cloudera Manager Parcels page. The confirmed issue is that the alternative configuration for the second (deactivated) parcel isn't present, resulting in the error seen in the agent log file.
For more information, see this knowledge base article.
Workaround: Un-distribure/remove the affected parcel(s) from the host.
Cloudera bug: OPSAPS-52745
Fixed versions: Cloudera Manager 6.x
Affected versions: Cloudera Manager 5.x
The client configuration deployment fails after upgrade
After upgrade to Cloudera Manager 5.x, client configuration deployment fails with the following error: "/var/lib/alternatives/hadoop-conf empty!"
Workaround: Either update the /var/lib/alternatives/hadoop-conf file from other working hosts, or remove it and restart the CM agent. This re-generates the file. For more information, see this knowledge base article.
Cloudera bug: OPSAPS-38704
Fixed versions: Cloudera Manager 6.x
Affected versions: Cloudera Manager 5.x
Backup and Disaster Recovery replication with Isilon storage can fail
Backup and Disaster Recovery (BDR) jobs with Isilon storage can fail due to Cloudera Manager's host selection policy for BDR jobs or Cloudera Manager not accepting Isilon as a valid storage type for HDFS replication.
Affected Versions: Cloudera Manager 5.8.5, 5.9.3, 5.11.1, 5.12, 5.14.x, 5.15.1
Fixed Versions: Cloudera Manager 5.15.2
Backup and Disaster Recovery replication can fail if one of the destination KMSs fail
Even with KMS HA enabled on the destination cluster, BDR replication can still fail if one of the KMSs cannot be reached.
Workaround: Ensure all of the KMSs are online. Then, perform the replication again.
Affected Versions: Cloudera Manager 5.15.x, 5.16.x
Cloudera Manager installation fails on MariaDB 10.2.8 and later
When installing Cloudera Manager using MariaDB 10.2.8 or later, the Cloudera Manager web server doesn't come up and the install process ends with a failed status. The cloudera-scm-server.log includes the following SQL error:
2019-08-28 04:37:10,171 FATAL main:org.hsqldb.cmdline.SqlFile: SQL Error at 'UTF-8' line 57: "alter table ROLE_CONFIG_GROUPS drop column REVISION_ID" Key column 'REVISION_ID' doesn't exist in table 2019-08-28 04:37:10,171 FATAL main:org.hsqldb.cmdline.SqlFile: Rolling back SQL transaction. 2019-08-28 04:37:10,172 ERROR main:com.cloudera.enterprise.dbutil.SqlFileRunner: Exception while executing ddl scripts. com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Key column 'REVISION_ID' doesn't exist in table
Note that MariaDB 10.2.8 is provided by default in some operating systems, including SLES 12 SP4.
Workaround: Replace the default MariaDB 10.2.x version with MariaDB 10.0.x.
Affected Versions: MariaDB 10.2.8 and later
Cloudera Issue: OPSAPS-52340
Failures during Cloudera Manager installation or upgrade
Cloudera Manager agent installation or upgrade fails due to misconfigured or problematic third-party repositories that were interfering with the process.
Cloudera Bug: OPSAPS-45576
Workaround: None
Affected versions: Cloudera Manager 5.14, 5.15
Fix versions: Cloudera Manager 5.16.1
Upgrade to CDH 5.15.1 fails with CM 5.15.1 and OpenJDK
Cloudera Bug: OPSAPS-47620
Fixed Version: Cloudera Manager and CDH 5.16.1, 5.15.2
After upgrade to Cloudera Manager 6, Kafka brokers and MirrorMaker marked as having stale configurations
The configurations that will be marked as stale are: - BROKER_JAVA_OPTS of the Kafka Broker role, and MM_JAVA_OPTS of the Kafka MirrorMaker role.
- -Dcom.sun.management.jmxremote.host=127.0.0.1
- -Dcom.sun.management.jmxremote.local.only
Affected Versions: Cloudera Manager 5.15.0
Fixed Versions: Cloudera Manager 5.16.1, 5.15.2
Cloudera Bug: OPSAPS-47365
Parcel distribution fails and times out on Ubuntu 14.04 with Cloudera Express license
Parcel distribution to hosts from Cloudera Manager server is known to fail and timeout on Ubuntu 14.04 with a Cloudera Express license.
Workaround: None
Affected Versions: Cloudera Manager 5.15.0 and higher
Cloudera Issue: OPSAPS-44590
Flume service with Kafka over TLS marked as stale after upgrading Cloudera Manager from 5.15.0
After upgrading from Cloudera Manager 5.15.0, if you are using Flume + with Kafka over TLS, your Flume service will be marked as stale. To resolve this issue, restart the Flume service after the upgrade.
Affected Versions: Cloudera Manager 5.15.0
Flume uses default values for Kafka TLS settings
Cloudera Manager 5.15 added support for automatically configuring TLS settings when connecting to a Kafka service. In 5.15.0, the Flume configuration generated by Cloudera Manager when Flume is configured to connect to a secure Kafka cluster contains the default truststore location and password. Kafka TLS configuration inheritance does not handle non-default truststore file and password, which causes the connection to fail when the default values have been changed.
Workaround: Edit the Flume configuration file in Cloudera Manager to override the default parameters and use empty values for the Kafka producer and consumer truststore settings by adding the following lines to the Flume configuration file:
For Kafka channels:
AGENT_NAME.channels.CHANNEL_NAME.kafka.producer.ssl.truststore.location= AGENT_NAME.channels.CHANNEL_NAME.kafka.producer.ssl.truststore.password= AGENT_NAME.channels.CHANNEL_NAME.kafka.consumer.ssl.truststore.location= AGENT_NAME.channels.CHANNEL_NAME.kafka.consumer.ssl.truststore.password=
For Kafka sources:
AGENT_NAME.sources.SOURCE_NAME.kafka.consumer.ssl.truststore.location= AGENT_NAME.sources.SOURCE_NAME.kafka.consumer.ssl.truststore.password=
For Kafka sinks:
AGENT_NAME.sinks.SINK_NAME.kafka.producer.ssl.truststore.location= AGENT_NAME.sinks.SINK_NAME.kafka.producer.ssl.truststore.password=
AGENT_NAME, CHANNEL_NAME, SOURCE_NAME and SINK_NAME need to be replaced with the actual agent, channel, source and sink names respectively.
Affected Versions: Cloudera Manager 5.15.0
Fixed Versions: Cloudera Manager 5.15.1, 5.15.2, 5.16.1
After upgrading from Cloudera Manager 5.15.0, if you are using Flume with Kafka over TLS, your Flume service will be marked as stale. To resolve the problem, restart the Flume service after the upgrade.
Cloudera Issue: OPSAPS-45669
Upgrades from Cloudera Enterprise 5.15 to 6.0 Not Supported
You cannot upgrade from Cloudera Manager or CDH 5.15 to the upcoming release of Cloudera Manager and CDH 6.0.
Cloudera Manager read-only user can access sensitive cluster information
Due to a security vulnerability, a Cloudera Manager read-only user can access sensitive cluster information.
Products affected: Cloudera Manager
Releases affected:
- Cloudera Manager 5.12 and all prior releases
- Cloudera Manager 5.13.0, 5.13.1, 5.13.2, 5.13.3
- Cloudera Manager 5.14.0, 5.14.1, 5.14.2, 5.14.3
Users affected: All
Date of detection: April 18th, 2018
Detected by: Cloudera
Severity: High
Impact: Sensitive Information Disclosure
CVE: CVE-2018-10815
Immediate action required: Upgrade Cloudera Manager to a release with the fix.
Addressed in release/refresh/patch: Cloudera Manager 5.15.0 and higher, 5.14.4
Knowledge base: For the latest update on this issue, see the corresponding Knowledge article: TSB 2018-306: Cloudera Manager Information Disclosure
Accumulo health check fails on Cloudera Manager upgrade
After upgrading Cloudera Manager from 5.14.0 to 5.15.0, a health check failed with the message "ACCUMULO16-ACCUMULO16_MASTER-1 has undesired health: BAD"
Workaround:
Products affected: Cloudera Manager, Accumulo
Cloudera Bug: CDH-69677
Cloudera Manager should not omit LDAP GROUP_MAPPING passwords from NodeManager
After upgrading Cloudera Manager, clusters on CDH 5.5 or higher will be marked as stale if they had set a value for the Hadoop User Group Mapping LDAP TLS/SSL Truststore Password or Hadoop User Group Mapping LDAP Bind User Password configuration properties.
Workaround: Restart the stale services to pick up this fix. The configurations that will be marked stale are: hadoop.security.group.mapping.ldap.ssl.keystore.password, hadoop.security.group.mapping.ldap.bind.password.
Products affected: Cloudera Manager
Affected versions: Cloudera Manager 5.5 or higher
Fixed versions: Cloudera Manager 5.16.1
Cloudera Bug: OPSAPS-45440
Import step in Hive replication can fail in 5.15.0 due to a Hive concurrency issue
In Cloudera Manager 5.15.0, BDR uses multi-threaded import and export for Hive replication by default. Because of a known Hive concurrency issue, the import step may occasionally fail with a NullPointerException for either the add partition or create table operations.
Workaround: Most of the time, running the replication again will be successful. If you continue to run into this issue, you can disable multi-threaded import and export by setting the Number of concurrent HMS connections on the Advanced tab of the replication schedule to 0 or fewer.
Products affected: Cloudera Manager
Affected versions: Cloudera Manager 5.15.0 and higher
Cloudera Bug: CDH-67557
Hard Restart of Cloudera Manager Agents May Cause Subsequent Service Errors
If a “hard restart” or “hard stop” operation is performed on a Cloudera Manager Agent, the restarted agent will erroneously restart roles that existed prior to the restart and, subsequently, 60 days later, these roles may experience errors or be killed.
Products affected: Cloudera Manager
Releases affected: All versions of Cloudera Manager 5.x
Cloudera Bug: OPSAPS-43550, TSB-308
Knowledge base: For the latest update on this issue, see the corresponding Knowledge article: TSB 2018-308: Hard Restart of Cloudera Manager Agents May Cause Subsequent Service Errors
Editing Cloudera Manager configurations in Google Chrome may lead to unexpected configuration loss
Google Chrome version 66, released on 28 April 2018, introduced behavior changes that can trigger unintended modification of existing configuration values in Cloudera Manager. If a page contains a password field, the browser aggressively applies auto-complete form data to the first text field and first password form fields. These unintended changes to the fields could be accidentally submitted unless the user pays careful attention to the visual cues (auto-filled fields that are highlighted) or change summary view presented just before submitting the changes.
Products affected: Cloudera Manager
Releases affected: All releases of Cloudera Manager
For the latest update on this issue see the corresponding Knowledge article: TSB 2018-314: Editing Cloudera Manager configuration in Google Chrome may lead to unexpected configuration loss.
Impala Dynamic Resource Pools wrongly gives everyone access to root pool (and all child pools)
When you use ACLs for YARN queue management in the Cloudera Manager Admin Console and you do not specify any users or groups for the root pool, all users are allowed to access the pools.
Products affected: Cloudera Manager Admin Console: Dynamic Resource Pools
Releases affected: Cloudera Manager 5.12.1 or higher
Workaround: Specify a user or group for the root pool.
Cloudera Bug: OPSAPS-45046
Cross-site scripting vulnerability in Cloudera Manager
Several pages in the Cloudera Manager Admin console are vulnerable to a cross-site scripting attack.
Products affected: Cloudera Manager Admin Console
- Cloudera Manager releases lower than 5.12
- Cloudera Manager 5.12.0, 5.12.1, 5.12.2
- Cloudera Manager 5.13.0, 5.13.1
- Cloudera Manager 5.14.0, 5.14.1
Users roles affected: Cluster Administrator, Full Administrator
Date of detection: January 19, 2018
Detected by: Shafeeque Olassery Kunnikkal of Ingram Micro Asia Ltd, Security Consulting & Services
Severity (Low/Medium/High): High
Impact: A cross-site scripting vulnerability can be used by an attacker to perform malicious actions. One probable form of attack is to steal the credentials of a victim’s Cloudera Manager account.
CVE: CVE-2018-5798
Immediate action required: Upgrade to a release in which this issue has been fixed.
Addressed in release/refresh/patch:
- Cloudera Manager 5.13.2 and higher
- Cloudera Manager 5.14.2 and higher
Knowledge base: For the latest update on this issue, see the corresponding Knowledge article: XSS Scripting Vulnerability in Cloudera Manager
Backup and Disaster Recovery (BDR) HDFS and Hive Replications may fail when replicating from source clusters which use HDFS HA and are managed by Cloudera Manager 5.13.3 or 5.14.2 to destination clusters managed by Cloudera 5.14.2.
All HDFS and Hive replication schedules fail when all the following conditions are met:
- The destination cluster is managed by Cloudera Manager 5.14.2.
- The source cluster is managed by Cloudera Manager 5.13.3+ or 5.14.2+.
- The source cluster uses HDFS HA.
Customers may observe logs with an error exception “java.net.UnknownHostException” that causes all replication jobs on to fail.
There is a workaround described below, which can have a performance impact on replication jobs (especially if the environment was previously running Cloudera Manager 5.13 or 5.14). The preferred solution is to upgrade to a later release. Upgrading is strongly recommended for any customer running HDFS or Hive replications and using Cloudera Manager 5.14.2, and is mandatory when any of the source cluster replicating data to the 5.14.2 clusters is a cluster managed by Cloudera Manager 5.13.3+ or 5.14.2+.
Products affected: Cloudera Manager Backup and Disaster Recovery
Releases affected: Cloudera Manager 5.14.2 (when used as the destination cluster of HDFS and/or Hive replication)
Users affected: Customers using HDFS or Hive Replication
Severity (Low/Medium/High): High
Root Cause and Impact:
In HDFS Replication, Cloudera Manager first runs a process on the source cluster that lists the files to be copied. Due to a bug, inconsistent arguments are issued to this process. This results in an attempt to copy an incorrect file that can’t be accessed, resulting in an exception: java.net.UnknownHostException.
This issue affects all HDFS & Hive replication schedules when the destination cluster is managed by Cloudera Manager 5.14.2 and the source cluster is managed by either Cloudera Manager 5.13.3+ or 5.14.2+.
Immediate action required: If you use BDR, do not upgrade a destination environment to Cloudera Manager 5.14.2. Upgrade to Cloudera Manager 5.14.3 or higher when it becomes available.
If you have already upgraded to Cloudera Manager 5.14.2, you can workaround this bug by disabling the process that runs the file listing (copy listing) on the source cluster. This results in file listing phase happening in the destination environment, which can be slower when there is significant network latency between environments. This feature became available in Cloudera Manager 5.13.0.
To disable the feature that runs the copy listing phase, set the feature flag to false with the Cloudera Manager API as follows:
API endpoint for setting feature flag (HTTP PUT call):
api/v18/cm/config
Body of the API call:
{"items":[{"name":"feature_flag_run_copylist_source","value":"false"}]}
Please note that if you are upgrading from Cloudera Manager 5.12 or earlier to Cloudera Manager 5.14.2, disabling this feature will not have any impact on runtime. If you upgraded from Cloudera Manager 5.13.0 or Cloudera Manager 5.14.x to 5.14.2, you will see some performance degradation in the file listing phase of HDFS & Hive Replication as the process will be running on the destination cluster. The impact is driven by the number of HDFS files and folders in the source cluster.
Addressed in release/refresh/patch: Cloudera Manager 5.14.3 and higher
Knowledge base: For the latest update on this issue, see the corresponding Knowledge article: TSB 2018-305: Backup and Disaster Recovery (BDR) HDFS and Hive Replications may fail after upgrading to Cloudera Manager 5.14.2
Hive Replications Can Fail Intermittently
Hive replication jobs can fail intermittently during the first step of replication while exporting Hive metadata. When this happens, the following message displays in Cloudera Manager: The remote command failed with error message: Hive Replication Export Step failed. The likelihood that this error can occur increases with the length of time it takes to export Hive metadata. If you have a very large Hive deployment (those containing many tables and partitions), and/or environments in which the Hive metastore server is under-resourced, you are more likely to see this issue. However, having many tables and partitions or having an under-resourced Hive metastore server do not cause this issue. The problem is caused by a bug and the only solution is to upgrade to a later release. Cloudera strongly recommends upgrading to anyone running Hive Replication and using Cloudera Manager 5.13.0 or 5.13.1 versions.
Affected Products: Cloudera Manager
Affected Versions: Cloudera Manager 5.13.0, 5.13.1
Who is Affected: Anyone using Hive Replication
Severity (Low/Medium/High): High
Cause and Impact: A file is created during the Hive Export Phase, and is used by later phases, such as the Transfer/Import phases. Because of a bug, sometimes the file is overwritten by another process. Hive Replication can thus fail intermittently because of FileNotFoundExceptions. A retry may or may not resolve the issue. This issue can potentially affect some or all Hive replication schedules.
Workaround: Upgrade to Cloudera Manager 5.13.2, 5.14.1, or higher versions
Addressed in release/refresh/patch: Cloudera Manager 5.13.2, 5.14.1, or higher versions.
For the latest updates on this issue, see the following corresponding Knowledge Base article:
TSB 2018-276: Hive Replications can fail intermittently in Cloudera Manager 5.13.0, 5.13.1 versions
HDFS DataNode stale configuration
After upgrading to Cloudera Manager 5.10 or higher, the following configuration for HDFS DataNodes will be marked as stale: dfs.datanode.balance.max.concurrent.moves.
The staleness is caused by the following new feature: HDFS balancer can now be configured to specify which hosts are included and excluded or which hosts are used as sources for transferring replicas. Additional properties for tuning the performance of the balancer can now also be configured starting with CDH 5.10.0.
Workaround: You can safely ignore this warning and defer restarting.
Affected versions: Cloudera Manager 5.10.0 and lower
Cloudera Bug: OPSAPS-36642
YARN MapReduce Job History stale configuration
When upgrading to Cloudera Manager 5.10, YARN will be marked as having stale configuration due to mapreduce.jobhistory.loadedjob.tasks.max. Unless you change this parameter and want the non-default value to take effect (only takes effect in CDH 5.9+), you can simply ignore this staleness and defer restarting YARN.
Cloudera bug: OPSAPS-32132
Logging issue slows down Backup and Disaster Recovery Hive and HDFS Replication jobs
When using Cloudera Manager for cross-cluster replication of HDFS and Hive, unnecessary warning logs are printed during the initial phase of replication (‘copy listing’). These logs are printed for each and every file that Cloudera Backup and Disaster Recovery (BDR) copies. This causes a very verbose log output which slows down this phase of the replication job and can affect overall replication times considerably. This happens only when both source and target clusters are running Cloudera Manager 5.14.0.
- In the Cloudera Manager Admin Console for the source cluster, go to the HDFS service and click
- Add the following parameter to the configuration snippet: HADOOP_ROOT_LOGGER=ERROR
- Save the Configuration.
Affected versions: Cloudera Manager 5.14.0
Fixed version: Cloudera Manager 5.15.0, Cloudera Manager 5.14.1
Cloudera bug: OPSAPS-44160
See TSB-289.
Cloudera Manager upgrade workflow incorrectly requires deploying some optional management roles
When a user upgrades to Cloudera Manager 5.14.0, after all the agents have been updated to 5.14.0, the user sees an Upgrade Wizard, and is asked to select where to place the Navigator Metadata Server, Navigator Audit Server, and the Reports Manager roles. The Continue button is disabled until user adds one of the roles.
- When prompted to add Navigator and Reports Manager roles, click the main Cloudera Manager Logo to exit the wizard.
- Note that this action skips over a review of the Java heap size of the Host Monitor and Service Monitor roles. (Normally, Cloudera Manager recommends a value if it finds the existing values to be too small for the number of nodes being managed). If configured properly, this is not a concern. For more information on resource requirements for Cloudera Manager, see: Cloudera Manager Hardware Requirements.
- To complete the upgrade, start (or restart) the Cloudera Management Service (not the Cloudera Manager server). From the Cloudera Manager Status page, click the drop-down list next to Cloudera Management Service and select Start or Restart.
- Add the Navigator Metadata Server role and click Continue.
- Configure a database for the Navigator Metadata Server and click Continue.
- Stop and Delete the Navigator Metadata Server role from the Deleting Role Instances. page immediately afterwards. See
Affected Versions: Cloudera Manager 5.14.0
Fixed version: Cloudera Manager 5.14.1, 5.15.0
Cloudera Bug: OPSAPS-44629
See TSB-290
ADLS Credentials in Log Files
When you use Cloudera Manager to configure the ADLS Connector service using the Less Secure option for the Credentials Protection Policy, it is possible for Hive audit logs to include Microsoft Azure credentials. If you are using Navigator Audit Server, these credentials may appear in audit reports. To mitigate this problem, make sure that access to Hive logs is appropriately controlled and that Navigator users with Auditing Viewer roles are cleared to have access to the Hive credentials.
Affected Versions: Cloudera Manager 5.14.x
Cloudera Bug: CDH-56241
Installing Cloudera Manager on SLES Using the Installer Does Not Work
Installing Cloudera Manager 5 using the installer does not work on SLES due to PostgreSQL dependency issues
Workaround:
Perform a regular installation.
Affected Versions: Cloudera Manager 5
Fixed Versions: Cloudera Manager 5.15.0
Cloudera Bug: OPSAPS-44284, OPSAPS-44628
Backup and Disaster Recovery Hive replication fails intermittently
In rare situations, due to a race condition, Hive Replication jobs can fail intermittently during the first step of replication with the following error: The remote command failed with error message: Hive Replication Export Step failed.
Workaround: Run the replication again or upgrade to a version that includes the bug fix.
Affected versions: Cloudera Manager 5.13.x
Fixed in: Cloudera Manager 5.14.0, 5.13.1
Cloudera bug: OPSAPS-43676
Impala logs missing from diagnostic bundle
Impala and Kudu role logs can be missing from the diagnostic bundles if their log directories have broken symlinks.
Affected versions: Cloudera Manager 5.14.0, 5.13.x
Cloudera bug: OPSAPS-41194
Accumulo logs are not included in diagnostic bundles
Accumulo logs are missing from the diagnostic bundle when collected by "date range".
Affected versions: Cloudera Manager 5.14.0, 5.13.x
Cloudera bug: OPSAPS-43206
Backup and Disaster Recovery replication fails between two clusters when data is copied from one encryption zone to another when using CDH 5.12.0 or higher.
This issue occurs when the two clusters use HDFS high availability, and the nameservice names are the same.
Workaround: Use unique nameservice names for the HDFS clusters.
Affected versions: CDH 5.12.0 and higher
Fixed in: Cloudera Manager 5.15.0, 5.14.3, 5.13.3
Cloudera bug: OPSAPS-43406, OPSAPS-43556
Backup and Disaster Recovery Hive replication fails during the Export Hive metastore step when using encrypted zones
If the /user or /user/hdfs directory is encrypted, then Hive replication fails with the following message: User:hdfs not allowed to do 'DECRYPT_EEK' on some-key-name.
Workaround: Configure a directory to store transient data during Hive replication:
- Open the Cloudera Manager Admin Console.
- Click .
- Select the replication schedule that fails and click .
- On the Advanced tab, specify an unencrypted directory for the Export Path or Directory for metadata file field. The field name depends on your Cloudera Manager version.
Affected versions: Cloudera Manager 5.13, 5.12.x, 5.11.x, 5.10.x, 5.9.x, 5.8.x
Cloudera bug: OPSAPS-43108
Backup and Disaster Recovery replication fails if the previous run is aborted
BDR replication fails with "another command" error if the previous run is aborted during the second or third step.
Workaround: Restart the Cloudera Manager Server. To prevent this error from occurring, do not abort replication during the second or third step.
Affected versions: Cloudera Manager 5.13
Fixed in: Cloudera Manager 5.14.0, 5.13.1
Backup and Disaster Recovery (Hive) replication fails if the user directory is encrypted
BDR replication fails if the user directory of the user specified in the Run as Peer User or Run as username field in the replication schedule is encrypted due to a change in how replications are performed. If no Run as Peer User is configured, then replication fails if the hdfs user directory is encrypted. For example, Hive replication fails on clusters where the /user/testuser directory is encrypted and the Run as username field specifies the "testuser" user.
Affected versions: Cloudera Manager 5.13.0, 5.12.x
Fixed in: Cloudera Manager 5.14.0, 5.13.1
- API endpoint: /cm/config
- Payload: {"items":[{"name":"feature_flag_run_copylist_source","value":"false"}]}
Cloudera bug: OPSAPS-42445
Upgrade to CDH 5.13 or higher Requires Pre-installation of Spark 2.1 or Spark 2.2
If your cluster has Spark 2.0 or Spark 2.1 installed and you want to upgrade to CDH 5.13 or higher, you must first upgrade to Spark 2.1 release 2 or later before upgrading CDH. To install these versions of Spark, do the following before running the CDH Upgrade Wizard:
- Install the Custom Service Descriptor (CSD) file. See
- Download, distribute, and activate the Parcel for the version of Spark that you are installing:
- Spark 2.1 release 2: The parcel name includes "cloudera2" in its name.
- Spark 2.2 release 1: The parcel name includes "cloudera1" in its name.
Affected versions: CDH 5.13.0 and higher
Cloudera bug: CDH-56775
Record Service no longer supported as of CDH 5.13; upgrade fails after from some lower versions
When upgrading a cluster with Record Service installed from CDH 5.10.0 or lower to CDH 5.13 or higher, Record Service fails with a java.lang.NoClassDefFoundError. Record Service is no longer supported in CDH.
Affected CDH Versions: 5.10.0 and lower when upgrading to CDH 5.13.0 or higher.
Cloudera bug: CDH-59372
Sentry may require increased Java heap settings before upgrading CDH to 5.13
Before upgrading to CDH 5.13 or higher, you may need to increase the size of the Java heap for Sentry. A warning will be displayed during upgrade, but it its the user's responsibility to ensure this setting is adjusted properly before proceeding. See Performance Guidelines.
Affected versions: CDH 5.13 or higher
Cloudera bug: OPSAPS-42541
Renaming an empty directory on Amazon S3 when using DynamoDB with S3Guard may fail
Renaming an empty directory on Amazon S3 when a DynamoDB table is used for S3Guard may fail and result in an error.
Affected versions: Cloudera Manager 5.12.0, 5.11.2, 5.11.1, 5.11.0
Fixed in: Cloudera Manager 5.12.0
Cloudera bug: CDH-51869
Apache bug: HADOOP-14036
Change in Hue Load Balancer version causes Hue Load Balancer and Server to be marked with stale configuration
- The Enable TLS/SSL for Hue property is set to true.
- The Hue load balancer is enabled
Workaround: If your cluster uses Apache httpd 2.4 as the Hue load balancer, restart the Hue service promptly. If your cluster uses an earlier version of httpd, there is no urgency to restart the Hue service. (Apache httpd 2.4 is installed automatically by some recent versions of Linux, or may have been explicitly installed.)
Cloudera bug: OPSAPS-40700, OPSAPS-41850
Hue Load Balancer SSL Handshake error
Fixed an issue with the SSL handshake in Hue when Apache httpd 2.4 or higher is used as the load balancer. The Hue load balancer previously set the ProxyPreserveHost directive to On, when it should have been set to Off. This causes problems making SSL connections when using Apache httpd 2.4 or higher. The error caused problems when verifying the CN, which older versions of Apache httpd did not encounter because they did not properly verify the CN.
When upgrading Cloudera Manager, the Hue load balancer may be marked as having a stale configuration. If you are experiencing issues connecting to Hue with SSL, restart the Hue service to update the configuration.
Cloudera bug: OPSAPS-40700
Apache MapReduce Jobs May Fail During Rolling Upgrade to CDH 5.11.0 or CDH 5.11.1
2017-06-08 17:43:37,173 WARN [Socket Reader #1 for port 41187] org.apache.hadoop.ipc.Server: Unable to read call parameters for client 10.17.242.22on connection protocol org.apache.hadoop.mapred.TaskUmbilicalProtocol for rpcKind RPC_WRITABLE java.lang.ArrayIndexOutOfBoundsException: 23 at ...
This error could cause the task and the job to fail.
Workaround:
Avoid performing a rolling upgrade to CDH 5.11.0 or CDH 5.11.1 from CDH 5.10.x or lower. Instead, skip CDH 5.11.0 and CDH 5.11.1 if you are performing a rolling upgrade, and upgrade to CDH 5.12 or higher, or CDH 5.11.2 or higher when the release becomes available.
Cloudera bug: DOCS-2384, TSB-241
Rolling Upgrade with GPL Extras Parcel Causes Oozie to Fail
After performing a rolling upgrade where the GPL Extras parcel is upgraded, you must restart the Oozie service after completing the upgrade to let it pick up the latest client configurations. Otherwise, jobs newly submitted through Oozie may fail.
Affected versions: 5.14.0, 5.13.x, 5.12.x, 5.11.x
Cloudera bug: OPSAPS-41564
Maintenance State Minimal Block Replication staleness after upgrade
Upgrading to Cloudera Manager 5.12 or later may show Maintenance State Minimal Block Replication as a stale configuration under HDFS, suggesting a restart. It is safe to ignore this warning and delay restart.
Cloudera bug: OPSAPS-39102
Affected versions: Cloudera Manager 5.12 or higher
YARN ACL configuration property staleness after upgrade
- ACL for viewing a job - mapreduce.job.acl-view-job
- ACL for modifying a job - mapreduce.job.acl-modify-job
- Enable MapReduce ACLs - mapreduce.cluster.acls.enabled
It it safe to defer restart if you are not using YARN job view/modify ACLs.
Affected versions: Cloudera Manager 5.12 or higher
Cloudera bug: OPSAPS-33586
Adding a service to an existing cluster
When you try to add a service such as Impala, Solr, Key-Value Store, or Sentry to an existing cluster, a timeout occurs.
Workaround
Deactivate the following parcels before you try to add a service: KEYTRUSTEE, SQOOP_TERADATA_CONNECTOR and any custom parcels.
Fixed in Cloudera Manager versions 5.8.5, 5.9.2, 5.10.1, and 5.11.
Cloudera bug: OPSAPS-38304
Services die due to HDFS taking too long to start
On Cloudera Manager managed clusters where HDFS takes a long time to come up after a restart, some dependent services may fail to start.
Workaround:
You can either manually start cluster services while waiting between steps or configure HDFS clients to automatically retry.
- Start ZooKeeper and, if it's used, Kudu.
- Start HDFS and wait for the HDFS NameNode to finish starting.
- Run the following command:hdfs dfsadmin -safemode get.
This command turns safe mode off.
- Start the remaining services.
- In the Cloudera Manager web UI, select the HDFS service.
- Select .
- In the HDFS Client Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml field, add the following property: dfs.client.retry.policy.enabled.
- Set the value to true.
- Add a description.
- Save the changes.
- Select .
- Start the cluster.
Cloudera bug: CDH-54889
hostname parameter is not passed to Impala catalog role
IMPALA-5253 contained a security fix for clusters using Impala with TLS (SSL) security enabled. This fix was also made in several maintenance versions of CDH that require you to upgrade Cloudera Manager. If you upgrade to a CDH version with this fix without upgrading Cloudera Manager, Impala will not function when TLS is enabled for Impala. You should upgrade Cloudera Manager first if you want to move to a CDH version with the security fix.
This issue affects upgrades of Cloudera Manager and CDH to version 5.11.1.
There are two ways you can workaround this issue:
- Upgrade to one of the following versions of Cloudera Manager before upgrading CDH:
- 5.13.0
- 5.12.1
- 5.11.2
- 5.10.2
- 5.9.3
- 5.8.5
- Before upgrading CDH, set the -hostname option to the fully-qualified domain name of the Catalog Server using the Catalog Server Command Line Argument Advanced Configuration Snippet (Safety Valve) configuration property:
-hostname=fully-qualified-domain-name of Impala Catalog Server
(To set this property, in Cloudera Manager, go to the Impala service, select the Configuration tab and search for the property.
- 5.11.1
- 5.10.2
- 5.9.3
- 5.8.5
Cloudera bug: OPSAPS-41218
Zookeeper Package Installation fails with Debian 7
Zookeeper installation fails because Debian 7 installs its own version of the Zookeeper package instead of the Cloudera version. As versions change, this may also affect additional Cloudera packages.
- For manual installations of Zookeeper, run the following apt-get command instead of the documented command when installing Zookeeper:
apt-get install zookeeper=3.4.5+cdh5.10.2+108-1.cdh5.10.2.p0.4~wheezy-cdh5.10.2
- To ensure the installation uses the current Cloudera version of all packages, create a file called cloudera.pref in the /etc/apt/preferences.d directory and add the following content before running the apt-get command to manually install CDH or using Cloudera Manager
to install the packages:
Package: * Pin: release o=Cloudera, l=Cloudera Pin-Priority: 501
This change gives Cloudera packages higher priority than Debian-supplied packages.
The change assumes your base packages have the priority set to '500'. You can check whether the priority is correctly set by running the apt-cache policy package_name command. The output should list Cloudera packages as highest priority. This step must be done before both manual and Cloudera Manager-assisted package installations.
Cloudera bug: DOCS-2278
Automated Cloudera Manager installer fails on Ubuntu 16.04
Running the cloudera-manager-installer.bin installer file (as described in the documentation) fails on Ubuntu 16.04 LTS (Xenial).
Affected versions: Cloudera Manager 5.11.0
Fixed in: Cloudera Manager 5.11.1 and higher.
Cloudera bug: DOCS-2037
Reboot of HDFS nodes running RHEL 7 or CentOS 7 shortly after enabling High Availability or migrating NameNodes can lead to data loss or other issues
This issue affects Cloudera Manager 5.7.0 and 5.7.1 installed on RHEL 7.x or CentOS 7.x.
On RHEL 7.x and CentOS 7.x systems, certain configuration actions intended to be executed only once during enablement of HDFS HA or migration of HDFS roles might be erroneously re-executed if a system shutdown or reboot occurs within next 24 hours of either operation. This can result in data loss and inconsistent state if the user has performed a NameNode format or JournalNode format, which are part of both the enablement of HDFS HA (adding a standby NameNode) and relocation (migration) of NameNodes roles to other hosts. Edits to HDFS metadata stored in JournalNodes could be lost in these situations, requiring manual repair to recover recent changes to HDFS data. If you experience this issue, please contact Cloudera Support for assistance.
For the latest update on this issue see the corresponding Knowledge article: TSB 2017-161: Reboot of HDFS nodes running RHEL 7/CentOS 7 after enabling High Availability or migrating NameNodes can lead to data loss or other issues.
Resolution:
Upgrade to Cloudera Manager 5.7.2 or higher immediately, and perform a hard restart of Cloudera Manager agents.
Fixed in: Cloudera Manager 5.7.2 and higher.
The Limit Nonsecure Container Executor Users property has no effect in Cloudera 5.11
Workaround:
To configure the yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users property to true or false, you can do so using the YARN parameter NodeManager Advanced Configuration Snippet (Safety Valve) for yarn-site.xml.
Cloudera bug: OPSAPS-27702
Graceful shutdown of Kafka brokers does not work as expected
In CM 5.11.0, the new Graceful Shutdown Timeout configuration property does not work as expected. As a result, Kafka takes an additional 30 seconds (by default) to shut down, but will still only have 30 seconds to complete its controlled shutdown, before Cloudera Manager forcibly shuts down Kafka brokers regardless of the configured timeout.
Workaround:
Wait longer for the shutdown to occur, or reduce the Graceful Shutdown Timeout to a very low value, for example: 1. (You should restore the original value when you upgrade to a Cloudera Manager release with the fix for this issue, although it will not cause a critical problem if left at a low value.) There is no workaround that allows you to increase the real time that Kafka has to perform shutdown until this issue is fixed.
Fixed in Cloudera Manager 5.11.1.
Cloudera bug: OPSAPS-40106
Spark Gateway roles should be added to every host
If you are using Cloudera Manager 5.9.0, 5.9.1, or 5.10.0 and have hosts with the NodeManager role but without the Spark Gateway role, you must add the Spark Gateway role to all NodeManager hosts and redeploy the client configurations. If you do not use this workaround, Cloudera Manager fails to locate the topology.py file, which can cause task localization issues and failures for very large jobs.
Fixed in Cloudera Manager 5.10.1.
Cloudera bug: OPSAPS-39119
HBase configuration is supplied for Hive when HBase service is not selected
Cloudera Manager provides configuration for the hive-site.xml file even if the HBase Service setting in Hive is not selected. This can cause unnecessary errors when Hive-on-Spark attempts to connect to HBase.
- In Cloudera Manager, go to the Hive service.
- Click the Configuration tab.
- Search for the following Advanced Configuration Snippets:
- HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml
- Hive Client Advanced Configuration Snippet (Safety Valve) for hive-site.xml
- Click the icon to add the following property and value to both Advanced Configuration Snippets:
Name: spark.yarn.security.tokens.hbase.enabled
Value: false
- Click Save Changes.
- Click the stale configuration icon to restart stale services, including any dependent services. Follow the on-screen prompts.
- Click .
- When you execute a query that uses a HBase-backed table, set this parameter back to true. You can do this by changing the configuration as described
above (set each value to true) or by using a set command and then issuing the query. For example:
set spark.yarn.security.tokens.hbase.enabled=true; SELECT * FROM HBASE_BACKED_TABLE ...
Fixed in Cloudera Manager 5.8.5 and 5.9.3.
Cloudera bug: OPSAPS-39021
Hive Replication fails when Impala is SSL enabled but Hadoop services are not
Support for Impala Replication with SSL enabled was added in Cloudera Manager 5.7.4, 5.8.2, 5.9.0 and higher.
- In Cloudera Manager, go to the HDFS service.
- Click the Configuration tab.
- Search for and enter the appropriate values for the following parameters:
- Cluster-Wide Default TLS/SSL Client Truststore Location
- Cluster-Wide Default TLS/SSL Client Truststore Password
Impala metatdata replication is not supported, but will be supported in a later maintenance release.
This support also ensures that connection to Impala is successful when SSL is enabled even if Kerberos is not enabled.
Cloudera bug: OPSAPS-38700, OPSAPS-38720
Fixed in Cloudera Manager 5.11, 5.10.1, 5.9.2, 5.8.4, 5.7.6.
python-psycopg2 Dependency
Cloudera Manager 5.8 and higher has a new dependency on the package python-psycopg2. This package is not available in standard SLES 11 and SLES 12 repositories. You need to add this repository or install it manually to any machine that runs the Cloudera Manager Agent before you install or upgrade Cloudera Manager.
If the Cloudera Manager Server and Agent run on the same host, install the Cloudera Manager Server first and then add the python-psycopg2 repository or package. After adding the repository or package, install the Cloudera Manager Agent.
Workaround
Download the python-psycopg2 repository or package from the following URL by selecting the correct SLES version: https://software.opensuse.org/package/python-psycopg2.
Pauses in Cloudera Manager after adding peer
Pauses and slow performance of the Cloudera Manager Admin Console occur after creating a peer.
Workaround:
- On the Cloudera Manager server host, edit the /etc/default/cloudera-scm-server file.
- In the Java Options section, change the value of the -Xmx argument of the CMF_JAVA_OPTS
property from -Xmx2G to -Xmx4G. For example:
export CMF_JAVA_OPTS="-Xmx4G -XX:MaxPermSize=256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp"
- Restart the Cloudera Manager server:
sudo service cloudera-scm-server restart
Fixed in Cloudera Manager 5.11
Cloudera bug: OPSAPS-38868
Cannot select time period in custom charts in C5.9
The quick time selection (30m, 1h, and so on) on custom dashboards does not work in 5.9.x.
Fixed in Cloudera Manager 5.9.1 and higher.
Cloudera bug: OPSAPS-37190
Cloudera Manager 5.8.2 allows you to select nonexistent CDH 5.8.1 package installation
The Cloudera manager 5.8.2 install/upgrade wizard allows you to select CDH 5.8.1 as a package installation, even though CDH 5.8.1 does not exist. The installation fails with an error message similar to the following:
[Errno 14] HTTPS Error 404 - Not Found
Workaround
Return to the package selection page, and select Latest Release of CDH5 component compatible with this version of Cloudera Manager or CDH 5.8.2.
Cloudera bug: OPSAPS-36860
Error when distributing parcels: No such torrent
Parcel distribution might fail with an error message conforming to:
Error when distributing to <host>: No such torrent: <parcel_name>.torrent
Workaround
Remove the file /opt/cloudera/parcel-cache/<parcel_name>.torrent from the host.
Cloudera bug: OPSAPS-37183
Hive Replication Metadata Transfer Step fails with Temporary AWS Credential Provider
Message: Hive Replication Metadata Transfer Step Failed - com.cloudera.com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 76D1F6A02792908A, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: Xy3nAS4HSPKLA6hHKvpqReBud7M1Fhk7On0HttYGE0eKPHKwiFkTPQxEVU82OZq5d8omSrdbhcI=.
Cloudera bug: OPSAPS-37514
Hive table Views do not get restored from S3
When creating a Hive Replication schedule that copies Hive data from S3 and you select the Reference Data From Cloud option, Hive table Views are not restored correctly and result in a Null Pointer Exception when querying data from the view.
Fixed in Cloudera Manager 5.9.1 and higher.
Cloudera bug: OPSAPS-37549
ACLs are not replicated when restoring Hive data from S3
ACLs are never replicated when the Enable Access Control Lists option in the configuration of the HDFS service is not selected the first time a Replication Schedule that replicates from S3 to Hive runs. Enabling it and re-running the restore operation does not restore the ACLs.
Cloudera bug: OPSAPS-37004
Snapshot diff is not working for Hive to S3 replication when data is deleted on source
If you have enabled snapshots on an HDFS folder and a Hive table uses an external file in that folder, and then you replicate that data to S3 and delete the file on the source cluster, the file is not deleted in subsequent replications to S3, even if the Delete Permanently option is selected.
Cloudera bug: OPSAPS-36910
Block agents from heartbeating to a Cloudera Manager with different UUID until agent restart
- Restore the previous Cloudera Manager server guid
- Remove the cm_guid file from each of the agents and then restart the agent.
Cloudera bug: OPSAPS-34847
Cloudera Manager set catalogd default jvm memory to 4G can cause out of memory error on upgrade to Cloudera Manager 5.7 or higher
After upgrading to 5.7 or higher, you might see a reduced Java heap maximum on Impala Catalog Server due to a change in its default value. Upgrading from Cloudera Manager lower than 5.7 to Cloudera Manager 5.8.2 no longer causes any effective change in the Impala Catalog Server Java Heap size.
When upgrading from Cloudera Manager 5.7 or later to Cloudera Manager 5.8.2, if the Impala Catalog Server Java Heap Size is set at the default (4GB), it is automatically changed to either 1/4 of the physical RAM on that host, or 32GB, whichever is lower. This can result in a higher or a lower heap, which could cause additional resource contention or out of memory errors, respectively.
Cloudera bug: OPSAPS-34039
Cloudera Manager 5.7.4 installer does not show Key Trustee KMS
A fresh install of Cloudera Manager tries to install Key Trustee KMS 5.8.2 when trying to install the latest version. You must either choose 5.7.0 as the Key Trustee KMS version, or manually provide a link to the 5.7.4 bits.
Class Not Found Error when upgrading to Cloudera Manager 5.7.2
When you upgrade to version 5.7.2 of Cloudera Manager, the client configuration for all services is marked stale.
Workaround:
From the Cluster menu, select Deploy Client Configuration to redeploy the client configuration.
Kerberos setup fails on Debian 8.2
This issue is due to the following Debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777579;msg=5;att=0.
- Log in to the host where the Cloudera Manager server is running.
- Edit the systemd/system/krb5-admin-server.service file and add /etc/krb5kdc to the ReadWriteDirectories section.
- Run the following commands:
systemctl daemon-reload sudo service krb5-admin-server restart
- Generate the credentials.
Cloudera bug: OPSAPS-33713
Password in Cloudera Manager's db.properties file is not redacted
The db.properties file is managed by customers and is populated manually when the Cloudera Manager Server database is being set up for the first time. Since this occurs before the Cloudera Manager Server has even started, encrypting the contents of this file is a completely different challenge as compared to that of redacting configuration files.
Releases affected: 5.3 and higher
Cloudera bug: OPSAPS-24813
Cluster provisioning fails
In some cases, provisioning of a cluster may fail at the start of the process. This does not happen in all cases and is mainly noticed on RHEL 6 and especially when some hosts are reporting bad health.
Releases affected: 5.5.0-5.5.3, 5.6.0-5.6.1, 5.7.0
Releases containing the fix: 5.5.4, 5.7.1
For releases containing the fix, parcel activation and first run command now completes as expected, even when some hosts report bad health.
This issue is fixed in Cloudera Manager 5.5.4 and 5.7.1 and higher.
Cloudera bug: OPSAPS-33564
Cloudera Manager can run out of memory if a remote repository URL is unreachable
If one of the URLs specified in on the Parcel Settings page ( ) becomes unreachable, Cloudera Manager may run out of memory.
Workaround:
- If the URL is incorrect, enter the correct URL.
- Deselect the Automatically Download New Parcels setting on the Parcel Settings page.
- Set the value of the Parcel Update Frequency on the Parcel Settings to a large interval such as several days.
Cloudera bug: OPSAPS-31732
Clients can run Hive on Spark jobs even if Hive dependency on Spark is not configured
In CDH 5.7 and higher, when Hive and Spark on YARN both are configured, but Hive is not configured to depend on Spark on YARN, clients can set the execution engine to spark and Hive on Spark jobs will still be executed but will run in an unsupported mode. These jobs may not appear in the Spark History Server.
Workaround: Configure Hive to depend on Spark on YARN.
Cloudera bug: OPSAPS-32023
The YARN NodeManager connectivity health test does not work for CDH 5
The NodeManager connectivity is always GOOD (green) even if the ResourceManager considers the NodeManager to be LOST or DECOMMISSIONED.
Cloudera bug: OPSAPS-31251
Workaround: None.
HDFS HA clusters see NameNode failures when KDC connectivity is bad
When KDC connectivity is bad, the JVM takes 30 seconds before retrying or declaring failure to connect. Meanwhile, the JournalNode write timeout (which needs KDC authentication for the first write, or under troubled connectivity), is only 20 seconds.
Workaround: In krb5.conf, set the kdc_timeout parameter value to 3 seconds. In Cloudera Manager, perform the following steps:
- Go to .
- Add the kdc_timeout parameter to the Advanced Configuration Snippet (Safety Valve) for [libdefaults] section of krb5.conf property. This should give the JVM enough time to try connecting to a KDC before the JournalNode timeout.
The HDFS File browser in Cloudera Manager fails when HDFS federation is enabled
Workaround: Use the command-line hdfs dfs commands to directly manipulate HDFS files when federation is enabled. CDH supports HDFS federation.
Hive Metastore canary fails to drop database
The Hive Metastore canary fails to drop the database due to HIVE-11418.
Cloudera bug: OPSAPS-27632
- Go to the Hive service.
- Click the Configuration tab.
- Select .
- Select .
- Deselect the Hive Metastore Canary Health Test checkbox for the Hive Metastore Server Default Group.
- Click Save Changes to commit the changes.
Cloudera Manager upgrade fails due to incorrect Sqoop 2 path
- Workaround for Upgrading from Cloudera Manager 3 or 4 to Cloudera Manager 5.4.0 or 5.4.1
- Log in to your Sqoop 2 server host using SSH and move the Derby database files to the new location, usually from /var/lib/sqoop2/repository to /var/lib/sqoop2/repositoy.
- Start Sqoop2. If you found this problem while upgrading CDH, run the Sqoop 2 database upgrade command using the Actions drop-down menu for Sqoop 2.
- Workaround for Upgrading from Cloudera Manager 5.4.0 or 5.4.1 to Cloudera Manager 5.4.3
- Log in to your Sqoop 2 server host using SSH and move the Derby database files to the new location, usually from/var/lib/sqoop2/repositoy to /var/lib/sqoop2/repository.
- Start Sqoop2, or if you found this problem while upgrading CDH, run the Sqoop 2 database upgrade command using the Actions drop-down menu for Sqoop 2.
NameNode incorrectly reports missing blocks during rolling upgrade
During a rolling upgrade to any of the CDH releases listed below, the NameNode may report missing blocks after rolling back multiple DataNodes. This is caused by a race condition with block reporting between the DataNode and the NameNode. No permanent data loss occurs, but data can be unavailable for up to six hours before the problem corrects itself.
Releases affected: CDH 5.0.6, 5.1.5, 5.2.5, 5.3.3, 5.4.1, 5.4.2.
Releases containing the fix:: CDH 5.2.6, 5.3.4, 5.4.3
- To avoid the problem - Cloudera advises skipping the affected releases and installing a release containing the fix. For example, do not upgrade to CDH 5.4.2; upgrade to CDH 5.4.3 instead.
- If you have already completed an upgrade to an affected release, or are installing a new cluster - You can continue to run the release, or upgrade to a release that is not affected.
Cloudera bug: OPSAPS-27205
Using ext3 for server dirs easily hit inode limit
Using the ext3 filesystem for the Cloudera Manager command storage directory may exceed the maximum subdirectory size of 32000.
Workaround: Either decrease the value of the Command Eviction Age property so that the directories are more aggressively cleaned up, or migrate to the ext4 filesystem.
Cloudera bug: OPSAPS-26951
Backup and disaster recovery replication does not set MapReduce Java options
Replication used for backup and disaster recovery relies on system-wide MapReduce memory options, and you cannot configure the options using the Advanced Configuration Snippet.
Cloudera bug: OPSAPS-25503
Kafka 1.2 CSD conflicts with CSD included in Cloudera Manager
If the Kafka CSD was installed in Cloudera Manager to 5.3 or lower, the old version must be uninstalled, otherwise it will conflict with the version of the Kafka CSD bundled with Cloudera Manager 5.4.
- Determine the location of the CSD directory:
- Select .
- Click the Custom Service Descriptors category.
- Retrieve the directory from the Local Descriptor Repository Path property.
- Delete the Kafka CSD from the directory.
Cloudera bug: OPSAPS-25995
Recommission host does not deploy client configurations
The failure to deploy client configurations can result in client configuration pointing to the wrong locations, which can cause errors such as the NodeManager failing to start with "Failed to initialize container executor".
Workaround: Deploy client configurations first and then restart roles on the recommissioned host.
Cloudera bug: OPSAPS-25995
Hive on Spark is not supported in Cloudera Manager and CDH 5.4 and CDH 5.5
You can configure Hive on Spark, but it is not recommended for production clusters.
Cloudera bug: OPSAPS-25983
Upgrade wizard incorrectly upgrades the Sentry DB
There's no Sentry DB upgrade in 5.4, but the upgrade wizard says there is. Performing the upgrade command is not harmful, and taking the backup is also not harmful, but the steps are unnecessary.
Cloudera bug: OPSAPS-25405
Cloudera Manager does not correctly generate client configurations for services deployed using CSDs
HiveServer2 requires a Spark on YARN gateway on the same host in order for Hive on Spark to work. You must deploy Spark client configurations whenever there's a change in order for HiveServer2 to pick up the change.
CSDs that depend on Spark will get incomplete Spark client configuration. Note that Cloudera Manager does not ship with any such CSDs by default.
Cloudera bug: OPSAPS-25695
Workaround: Use /etc/spark/conf for Spark configuration, and ensure there is a Spark on YARN gateway on that host.
Solr, Oozie and HttpFS fail when KMS and TLS/SSL are enabled using self-signed certificates
org.apache.oozie.service.AuthorizationException: E0501: Could not perform authorization operation, sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Cloudera bug: OPSAPS-23392, CDH-23460, CDH-23189
Workaround: You must explicitly load the relevant truststores with the KMS certificate to allow these services to communicate with the KMS. To do so, edit the truststore location and password for Solr, Oozie and HttpFS (found under the HDFS service) as follows.
- Go to the Cloudera Manager Admin Console.
- Go to the Solr/Oozie/HDFS service.
- Click the Configuration tab.
- Search for "<service> TLS/SSL Certificate Trust Store File" and set this property to the location of truststore file.
- Search for "<service> TLS/SSL Certificate Trust Store Password" and set this property to the password of the truststore.
- Click Save Changes to commit the changes.
Cloudera Manager 5.3.1 upgrade fails if Spark standalone and Kerberos are configured
CDH upgrade fails if Kerberos is enabled and Spark standalone is installed. Spark standalone does not work in a kerberized cluster.
Cloudera bug: OPSAPS-24983
Workaround: To upgrade, remove the Spark standalone service first and then proceed with upgrade.
Adding Key Trustee KMS 5.4 to Cloudera Manager 5.5 displays warning
Adding the Key Trustee KMS service to a CDH 5.4 cluster managed by Cloudera Manager 5.5 displays the following message, even if Key Trustee KMS is installed:
"The following selected services cannot be used due to missing components: keytrustee-keyprovider. Are you sure you wish to continue with them?"
Workaround: Verify that the Key Trustee KMS parcel or package is installed and click OK to continue adding the service.
KMS and Key Trustee ACLs do not work in Cloudera Manager
ACLs configured for the KMS (File) and KMS (Navigator Key Trustee) services do not work since these services do not receive the values for hadoop.security.group.mapping and related group mapping configuration properties.
Cloudera bug: OPSAPS-24483
Workaround:
KMS (File): Add all configuration properties starting with hadoop.security.group.mapping from the NameNode core-site.xml to the KMS (File) property, Key Management Server Advanced Configuration Snippet (Safety Valve) for core-site.xml
KMS (Navigator Key Trustee): Add all configuration properties starting with hadoop.security.group.mapping from the NameNode core-site.xml to the KMS (Navigator Key Trustee) property, Key Management Server Proxy Advanced Configuration Snippet (Safety Valve) for core-site.xml.
Exporting and importing Hue database sometimes times out after 90 seconds
Executing 'dump database' or 'load database' of Hue from Cloudera Manager returns "command aborted because of exception: Command timed-out after 90 seconds". The Hue database can be exported to JSON from within Cloudera Manager. Unfortunately, sometimes the Hue database is quite large and the export times out after 90 seconds.
Cloudera bug: OPSAPS-24470
Workaround: Ignore the timeout. The command should eventually succeed even though Cloudera Manager reports that it timed out.
Changing the Key Trustee Server hostname requires editing keytrustee.conf
If you change the hostname of your active or passive Key Trustee Server, you must edit the keytrustee.conf file. This issue typically arises if you replace an active or passive server with a server having a different hostname. If the same hostname is used on the replacement server, there are no issues.
Cloudera bug: OPSAPS-24133
Workaround: Use the same hostname on the replacement server.
Hosts with Impala Llama roles must also have at least one YARN role
"Exception running /etc/hadoop/conf.cloudera.yarn/topology.py java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn/topology.py"in the Llama role logs, and Impala queries may fail.
Cloudera bug: OPSAPS-23728
Workaround: Add a YARN gateway role to each Llama host that does not already have at least one YARN role (of any type).
The high availability wizard does not verify that there is a running ZooKeeper service
- 1. ZooKeeper present and not running and the HDFS dependency on ZooKeeper dependency is not set
- 2. ZooKeeper absent
Cloudera bug: OPSAPS-23709
- Create and start a ZooKeeper service if one does not exist.
- Go to the HDFS service.
- Click the Configuration tab.
- Select
- Set the ZooKeeper Service property to the ZooKeeper service.
- Click Save Changes to commit the changes.
Cloudera Manager Installation Path A fails on RHEL 5.7 due to PostgreSQL conflict
On RHEL 5.7, cloudera-manager-installer.bin fails due to a PostgreSQL conflict if PostgreSQL 8.1 is already installed on your host.
Cloudera bug: OPSAPS-23392
Workaround: Remove PostgreSQL from host and rerun cloudera-manager-installer.bin.
Spurious warning on Accumulo 1.6 gateway hosts
When using the Accumulo shell on a host with only an Accumulo 1.6 Service gateway role, users will receive a warning about failing to create the directory /var/log/accumulo. The shell works normally otherwise.
Workaround: The warning is safe to ignore.
Cloudera bug: OPSAPS-21699
Accumulo 1.6 service log aggregation and search does not work
Cloudera Manager log aggregation and search features are incompatible with the log formatting needed by the Accumulo Monitor. Attempting to use either the "Log Search" diagnostics feature or the log file link off of an individual service role's summary page will result in empty search results.
Cloudera bug: OPSAPS-21675
Severity: High
Workaround: Operators can use the Accumulo Monitor to see recent severe log messages. They can see recent log messages below the WARNING level via a given role's process page and can inspect full logs on individual hosts by looking in /var/log/accumulo.
Cloudera Manager incorrectly sizes Accumulo Tablet Server max heap size after 1.4.4-cdh4.5.0 to 1.6.0-cdh4.6.0 upgrade
Because the upgrade path from Accumulo 1.4.4-cdh4.5.0 to 1.6.0-cdh4.6.0 involves having both services installed simultaneously, Cloudera Manager will be under the impression that worker hosts in the cluster are oversubscribed on memory and attempt to downsize the max heap size allowed for 1.6.0-cdh4.6.0 Tablet Servers.
Cloudera bug: OPSAPS-21806
Severity: High
Workaround: Manually verify that the Accumulo 1.6.0-cdh4.6.0 Tablet Server max heap size is large enough for your needs. Cloudera recommends you set this value to the sum of 1.4.4-cdh4.5.0 Tablet Server and Logger heap sizes.
Accumulo installations using LZO do not indicate dependence on the GPL Extras parcel
Accumulo 1.6 installations that use LZO compression functionality do not indicate that LZO depends on the GPL Extras parcel. When Accumulo is configured to use LZO, Cloudera Manager has no way to track that the Accumulo service now relies on the GPL Extras parcel. This prevents Cloudera Manager from warning administrators before they remove the parcel while Accumulo still requires it for proper operation.
Cloudera bug: OPSAPS-21680
Workaround: Check your Accumulo 1.6 service for the configuration changes mentioned in the Cloudera documentation for using Accumulo with CDH prior to removing the GPL Extras parcel. If the parcel is mistakenly removed, reinstall it and restart the Accumulo 1.6 service.
Created pools are not preserved when Dynamic Resource Pools page is used to configure YARN or Impala
Pools created on demand are not preserved when changes are made using the Dynamic Resource Pools page. If the Dynamic Resource Pools page is used to configure YARN or Impala services in a cluster, it is possible to specify pool placement rules that create a pool if one does not already exist. If changes are made to the configuration using this page, pools created as a result of such rules are not preserved across the configuration change.
Cloudera bug: OPSAPS-19942
Workaround: Submit the YARN application or Impala query as before, and the pool will be created on demand once again.
User should be prompted to add the AMON role when adding MapReduce to a CDH 5 cluster
When the MapReduce service is added to a CDH 5 cluster, the user is not asked to add the AMON role. Then, an error displays when the user tries to view MapReduce activities.
Cloudera bug: OPSAPS-20379
Workaround: Manually add the AMON role after adding the MapReduce service.
Enterprise license expiration alert not displayed until Cloudera Manager Server is restarted
When an enterprise license expires, the expiration notification banner is not displayed until the Cloudera Manager Server has been restarted. The enterprise features of Cloudera Manager are not affected by an expired license.
Cloudera bug: OPSAPS-15711
Workaround: None.
Configurations for decommissioned roles not migrated from MapReduce to YARN
When the Import MapReduce Configuration wizard is used to import MapReduce configurations to YARN, decommissioned roles in the MapReduce service do not cause the corresponding imported roles to be marked as decommissioned in YARN.
Cloudera bug: OPSAPS-16670
Workaround: Delete or decommission the roles in YARN after running the import.
The HDFS command Roll Edits does not work in the UI when HDFS is federated
The HDFS command Roll Edits does not work in the Cloudera Manager UI when HDFS is federated because the command does not know which nameservice to use.
Cloudera bug: OPSAPS-10034
Workaround: Use the API, not the Cloudera Manager UI, to execute the Roll Edits command.
Cloudera Manager reports a confusing version number if you have oozie-client, but not oozie installed on a CDH 4.4 node
In CDH versions before 4.4, the metadata identifying Oozie was placed in the client, rather than the server package. Consequently, if the client package is not installed, but the server is, Cloudera Manager will report Oozie has been present but as coming from CDH 3 instead of CDH 4.
Cloudera bug: OPSAPS-15778
Workaround: Either install the oozie-client package, or upgrade to at least CDH 4.4. Parcel based installations are unaffected.
Cloudera Manager does not work with CDH 5.0.0 Beta 1
When you upgrade from Cloudera Manager 5.0.0 Beta 1 with CDH 5.0.0 Beta 1 to Cloudera Manager 5.0.0 Beta 2, Cloudera Manager won't work with CDH 5.0.0 Beta 1 and there's no notification of that fact.
Cloudera bug: OPSAPS-17802
Workaround: None. Do a new installation of CDH 5.0.0 Beta 2.
On CDH 4.1 secure clusters managed by Cloudera Manager 4.8.1 and higher, the Impala Catalog server needs advanced configuration snippet update
Impala queries fail on CDH 4.1 when Hive "Bypass Hive Metastore Server" option is selected.
Workaround: Add the following to Impala catalog server advanced configuration snippet for hive-site.xml, replacing Hive_Metastore_Server_Host with the host name of your Hive Metastore Server:
<property> <name>hive.metastore.local</name> <value>false</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://Hive_Metastore_Server_Host:9083</value> </property>
Rolling Upgrade to CDH 5 is not supported.
Rolling upgrade between CDH 4 and CDH 5 is not supported. Incompatibilities between major versions means rolling restarts are not possible. In addition, rolling upgrade will not be supported from CDH 5.0.0 Beta 1 to any later releases, and may not be supported between any future beta versions of CDH 5 and the General Availability release of CDH 5.
Workaround: None.
Error reading .zip file created with the Collect Diagnostic Data command.
After collecting Diagnostic Data and using the Download Diagnostic Data button to download the created zip file to the local system, the zip file cannot be opened using the FireFox browser on a Macintosh. This is because the zip file is created as a Zip64 file, and the unzip utility included with Macs does not support Zip64. The zip utility must be version 6.0 or later. You can determine the zip version with unzip -v.
Cloudera bug: OPSAPS-13850
Workaround: Update the unzip utility to a version that supports Zip64.
After JobTracker failover, complete jobs from the previous active JobTracker are not visible.
When a JobTracker failover occurs and a new JobTracker becomes active, the new JobTracker UI does not show the completed jobs from the previously active JobTracker (that is now the standby JobTracker). For these jobs the "Job Details" link does not work.
Cloudera bug: OPSAPS-13864
Severity: Med
Workaround: None.
After JobTracker failover, information about rerun jobs is not updated in Activity Monitor.
When a JobTracker failover occurs while there are running jobs, jobs are restarted by the new active JobTracker by default. For the restarted jobs the Activity Monitor will not update the following: 1) The start time of the restarted job will remain the start time of the original job. 2) Any Map or Reduce task that had finished before the failure happened will not be updated with information about the corresponding task that was rerun by the new active JobTracker.
Cloudera bug: OPSAPS-13879
Severity: Med
Workaround: None.
Installing on AWS, you must use private EC2 hostnames.
When installing on an AWS instance, and adding hosts using their public names, the installation will fail when the hosts fail to heartbeat.
Severity: Med
Workaround:
Use the Back button in the wizard to return to the original screen, where it prompts for a license.
Rerun the wizard, but choose "Use existing hosts" instead of searching for hosts. Now those hosts show up with their internal EC2 names.
Continue through the wizard and the installation should succeed.
If HDFS uses Quorum-based Storage without HA enabled, the SecondaryNameNode cannot checkpoint.
If HDFS is set up in non-HA mode, but with Quorum-based storage configured, the dfs.namenode.edits.dir is automatically configured to the Quorum-based Storage URI. However, the SecondaryNameNode cannot currently read the edits from a Quorum-based Storage URI, and will be unable to do a checkpoint.
Severity: Medium
Workaround: Add to the NameNode's advanced configuration snippet the dfs.namenode.edits.dir property with both the value of the Quorum-based Storage URI as well as a local directory, and restart the NameNode. For example,
<property> <name>dfs.namenode.edits.dir</name> <value>qjournal://jn1HostName:8485;jn2HostName:8485;jn3HostName:8485/journalhdfs1,file:///dfs/edits</value> </property>
Changing the rack configuration may temporarily cause mis-replicated blocks to be reported.
A rack re-configuration will cause HDFS to report mis-replicated blocks until HDFS rebalances the system, which may take some time. This is a normal side-effect of changing the configuration.
Severity: Low
Workaround: None
Cannot use '/' as a mount point with a Federated HDFS Nameservice.
A Federated HDFS Service does not support nested mount points, so it is impossible to mount anything at '/'. Because of this issue, the root directory will always be read-only, and any client application that requires a writeable root directory will fail.
Severity: Low
- In the CDH 4 HDFS Service > Configuration tab of the Cloudera Manager Admin Console, search for "nameservice".
- In the Mountpoints field, change the mount point from "/" to a list of mount points that are in the namespace that the Nameservice will manage. (You can enter this as a comma-separated list - for example, "/hbase, /tmp, /user" or by clicking the plus icon to add each mount point in its own field.) You can determine the list of mount points by running the command hadoop fs -ls / from the CLI on the NameNode host.
Historical disk usage reports do not work with federated HDFS.
Severity: Low
Workaround: None.
(CDH 4 only) Activity monitoring does not work on YARN activities.
Activity monitoring is not supported for YARN in CDH 4.
Severity: Low
Workaround: None
HDFS monitoring configuration applies to all Nameservices
The monitoring configurations at the HDFS level apply to all Nameservices. So, if there are two federated Nameservices, it's not possible to disable a check on one but not the other. Likewise, it's not possible to have different thresholds for the two Nameservices.
Severity: Low
Workaround: None
Supported and Unsupported Replication Scenarios and Limitations
See Data Replication.
Restoring snapshot of a file to an empty directory does not overwrite the directory
Restoring the snapshot of an HDFS file to an HDFS path that is an empty HDFS directory (using the Restore As action) will result in the restored file present inside the HDFS directory instead of overwriting the empty HDFS directory.
Workaround: None.
HDFS Snapshot appears to fail if policy specifies duplicate directories.
In an HDFS snapshot policy, if a directory is specified more than once, the snapshot appears to fail with an error message on the Snapshot page. However, in the HDFS Browser, the snapshot is shown as having been created successfully.
Severity: Low
Workaround: Remove the duplicate directory specification from the policy.
Hive replication fails if "Force Overwrite" is not set.
The Force Overwrite option, if checked, forces overwriting data in the target metastore if there are incompatible changes detected. For example, if the target metastore was modified and a new partition was added to a table, this option would force deletion of that partition, overwriting the table with the version found on the source. If the Force Overwrite option is not set, recurring replications may fail.
Severity: Med
Workaround: Set the Force Overwrite option.
Cloudera Manager set cataloged default JVM memory to 4G can cause an out of memory error during upgrade to Cloudera Manager 5.7 and higher
When upgrading from Cloudera Manager 5.7 or higher to Cloudera Manager 5.8.2, if the Impala Catalog Server Java Heap Size is set at the default (4GB), it is automatically changed to either 1/4 of the physical RAM on that host, or 32GB, whichever is lower. This can result in a higher or a lower heap, which could cause additional resource contention or out of memory errors, respectively.
Host Monitor Cannot Monitor NTP Status on SLES Hosts
In Cloudera Manager 5.11.0 and lower, Host Monitor queries the NTP status using the ntpdc command. Later versions of SLES disable this command by default.
Workaround: Add the following line to /etc/ntp.conf:
enable mode7
Affected versions: Cloudera Manager 5.11.0 and lower
Fixed versions: Cloudera Manager 5.11.1 and higher, Cloudera Manager 5.12.0 and higher
Cloudera Issue: OPSAPS-38268