In some cases, after the QM webapp and config-service are
restarted Cloudera Manager reports a healthy status after some time. However, the
configuration service might not be ready and usable yet when this status is reported.
Therefore, QM (webapp) is not ready and usable yet.
Wait for a minute or two after restart and then try
QM.
Known Issues identified in Cloudera Runtime 7.3.1.100 CHF 1
There are no new known issues identified in this release.
Known Issues in Cloudera Runtime 7.3.1
COMPX-14820: Delete Queue and its Children throws "Queue
capacity was reduced to zero, but failed to delete queue."
When trying to perform the operation "Delete Queue and its
Children" on a queue that has one or more siblings, the operation fails as YARN has some
constraints. If the queue performing the operation "Delete Queue and its Children" is a
leaf node, then the operations succeeds.
When setting the capacity scheduler configuration through the YARN/Cloudera Manager configuration, there may be capacity values that use
multiple decimal places. This results in rounding/floating point precision
discrepancies in the UI when trying to validate that all sibling capacities equal
100%. The UI looks like all the numbers add up to 100, but the validation still
displays an error and does not allow to save the capacities. It is also observed that
the capacity is being calculated as, for example, 99.9999999991 in the backend.
Create queues within the UI, or
Ensure that capacities configured through the Capacity Scheduler safety valve do
not have more than one decimal place.
20202 Database migration after enabling opt-in migration
When migrating from an H2 database to a PostgreSQL database in
YARN Queue Manager after installation or upgrade, you might encounter an issue only when
you have followed the following specific scenario:
New install or upgrade to Cloudera 7.1.9, forcing
migration from H2 to PostgreSQL database.
Upgrade to Cloudera 7.1.9 CHF2, moving back to H2
database.
Upgrade to Cloudera 7.1.9 SP1 with valid
PostgreSQL connection details in Queue Manager configurations.
To avoid any issues during the upgrade to version Cloudera 7.1.9 SP1, ensure that PostgreSQL connection
details are removed from the YARN database configuration if you prefer to continue using
the H2 database.
CDPD-56559: MapReduce jobs can intermittently fail during a
rolling upgrade.
During a rolling upgrade between Cloudera versions 7.1.8 and 7.1.9, MapReduce jobs may fail
with message, RuntimeException: native snappy library not available.
Although the native Snappy compression library is not loaded, a checkmark displays to
indicate that the Snappy compression library is loading for NodeManagers that are
pending upgrades. This causes the MapReduce jobs that are associated with the
NodeManagers to fail. After the upgrade, the jobs work as expected. This issue only
impacts rolling upgrades from before Cloudera 7.1.9 to a
higher version.
None.
COMPX-12021 Queue Manager configurations on Scheduler
Configuration page are not working
When setting the following properties on the YARN Queue Manager
UI, the properties are set in the capacity-scheduler.xml file which
does not have any effect on YARN. The properties need to be set in the
yarn-site.xml file, which does not happen when you set them
through YARN Queue Manager.
"Preemption: Maximum Wait Before Kill (ms)" –
"yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill"
"Preemption: Total Resources Per Round" –
"yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round"
"Preemption: Over Capacity Tolerance" –
"yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity"
"Preemption: Maximum Termination Factor" –
"yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor"
"Enable Intra Queue Preemption" –
"yarn.resourcemanager.monitor.capacity.preemption.intra-queue-preemption.enabled"
In Cloudera Manager, select the YARN
service.
Click the Configuration tab.
Search for yarn-site.xml.
Under YARN Service Advanced Configuration Snippet (Safety Valve) for
yarn-site.xml, add the corresponding parameter and value you
need.
Click Save Changes.
Restart the YARN services.
COMPX-6214: When there are more than 600 queues in a cluster,
potential timeouts occur due to performance reasons that are visible in the
Configuration Service.
The Cloudera Manager proxy timeout
configuration field is added now. This issue is tracked in OPSAPS-60554. For this
release, the timeout is increased from 20 seconds to 5 minutes. However, if this problem
occurs, Cloudera recommends you to increase the proxy
timeout value.
COMPX-5817: YARN Queue Manager UI is not able to present a view
of pre-upgrade queue structure. Cloudera Manager Store is not
supported and therefore YARN does not have any of the pre-upgrade queue structure
preserved.
When Data Hub cluster is deleted, all saved configurations are
also deleted. All YARN configurations are saved in Cloudera Manager
Store and this is yet to be supported in Data Hub and Cloudera Manager. Hence, the YARN queue structure is also lost when a
Data Hub Clusters is deleted or upgraded or restored.
COMPX-6628: Unable to delete single leaf queue assigned to a
partition.
When a dynamic auto child creation enabled parent queue is
stopped in weight mode, its static and dynamically created child queues are also
stopped. However, when the dynamic auto child creation enabled parent queue is
restarted, its child queues remain stopped. In addition, the dynamically created child
queues cannot be restarted manually through the YARN Queue Manager UI either.
Delete the dynamic auto child creation enabled parent
queue. This action also deletes all its child queues, both static and dynamically
created child queues, including the stopped dynamic queues. Then recreate the parent
queue, enable the dynamic auto child creation feature for it and add the required static
child queues.
COMPX-5589: Unable to add new queue to leaf queue with partition
capacity in Weight/Absolute mode
In the current implemention, if there is an existing managed
queue in Relative mode, then conversion to Weight mode is not be allowed.
To proceed with the conversion from Relative mode to
Weight mode, there should not be any managed queues. You must first delete the managed
queues before conversion. In Weight mode, a parent queue can be converted into managed
parent queue.
COMPX-5549: Queue Manager UI sets
maximum-capacity to null when you switch mode with multiple
partitions
If you associate a partition with one or more queues and then
switch the allocation mode before assigning capacities to the queues, an
Operation Failed error is displayed because
max-capacity is set to null.
After you associate a partition with one or more queues, in the YARN Queue Manager
UI, click Overview[***PARTITION NAME***] from the drop-down list and distribute
capacity to the queues before switching allocation mode or creating placement
rules.
COMPX-4992: Unable to switch to absolute mode after deleting a
partition using YARN Queue Manager
If you delete a partition (node label) which has been
associated with queues and those queues have capacities configured for that partition
(node label), the CS.xml still contains the partition (node label) information. Hence,
you cannot switch to absolute mode after deleting the partition (node label).
It is recommended not to delete a partition (node label)
which has been associated with queues and those queues have capacities configured for
that partition (node label).
COMPX-1445: YARN Queue Manager operations are failing when Queue
Manager is installed separately from YARN
If Queue Manager is not selected during YARN installation, Queue
Manager operations are failing. Queue Manager says 0 queues are configured and several
failures are present. That is because ZooKeeper configuration store is not enabled.
In Cloudera Manager, select the YARN
service.
Click the Configuration tab.
Find the Queue Manager Service property.
Select the Queue Manager service that the YARN service instance depends on.
Click Save Changes.
Restart all services that are marked stale in Cloudera Manager.
COMPX-3329: Autorestart is not enabled for Queue Manager in Data
Hub
MapReduce application framework is loaded from HDFS instead of
being present on the NodeManagers. By default the
mapreduce.application.framework.path property is set to the
appropriate value, but third-party applications with their own configurations does not
launch.
Set the
mapreduce.application.framework.path property to the appropriate
configuration for third-party applications.
After moving the JobHistory Server to a new host, the URLs
listed for the JobHistory Server on the ResourceManager web UI still point to the old
JobHistory Server. This affects existing jobs only. New jobs started after the move are
not affected.
For any existing jobs that have the incorrect JobHistory
Server URL, there is no option other than to allow the jobs to roll off the history over
time. For new jobs, make sure that all clients have the updated the
mapred-site.xml file that references the correct JobHistory
Server.
CDH-49165: History link in ResourceManager web UI broken for
killed Spark applications
ResourceManager requires routable host:port
addresses for yarn.resourcemanager.scheduler.address, and does not
support using the wildcard 0.0.0.0 address.
Set the address, in the form host:port,
either in the client-side configuration, or on the command line when you submit the
job.
YARN cannot start if Kerberos principal name is changed
If the Kerberos principal name is changed in Cloudera Manager after launch, YARN does not start. In such cases, the
keytabs can be correctly generated but YARN cannot access ZooKeeper with the new
Kerberos principal name and old ACLs.
There are two possible workarounds:
Delete the znode and restart the YARN service.
Use the reset ZK ACLs command. This also sets the znodes below
/rmstore/ZKRMStateRoot to world:anyone:cdrwa
which is less secure.
Queue Manager does not open on using a custom user with a default Kerberos
principal
When the Job ACL feature is enabled using Cloudera Manager (YARN > Configuration > Enablg JOB ACLproperty), the mapreduce.cluster.acls.enabled property is
not generated to all configuration files, including the yarn-site.xml
configuration file. As a result, the ResourceManager process uses the default
value of this property. The default property of
mapreduce.cluster.acls.enabled is false.
Enable the Job ACL feature using the Advanced
Configuration Snippet:
In Cloudera Manager select the YARN
service.
Click Configuration.
Find the YARN Service MapReduce Advanced Configuration Snippet (Safety
Valve) property.
When the parameter
mapreduce.cluster.acl.enabled is set to
true, the Yarn RM UI does display and logs and the Logs
are not available message is displayed.
Set mapreduce.cluster.acl.enabled to
false.
Unsupported Features
The following YARN features are currently not supported in Cloudera:
Application Timeline Server v2 (ATSv2)
Auxiliary Services
Container Resizing
Distributed or Centralized Allocation of Opportunistic Containers