Known Issues in YARN, YARN Queue Manager and MapReduce
Learn about the known issues in YARN, YARN Queue Manager and MapReduce, the impact or changes to the functionality, and the workaround.
Known Issues
- COMPX-14820: Delete Queue and its Children throws "Queue capacity was reduced to zero, but failed to delete queue."
- When trying to perform the operation "Delete Queue and its Children" on a queue that has one or more siblings, the operation fails as YARN has some constraints. If the queue performing the operation "Delete Queue and its Children" is a leaf node, then the operations succeeds.
- COMPX-13177: QueueManager webapp requests fail with 'HTTP ERROR 400 java.net.ConnectException: Unsupported ciphersuite TLS_EDH_RSA_WITH_3DES_EDE_CBC_SHA'
- Products:
- Cloudera Manager for CDP Private
- Cloud Base Cloudera Manager for CDP Public Cloud
- Centos 7.8 and Redhat 7.8 operating systems, when FIPS support is enabled.
- When attempting to display the Yarn Queue Manager interface, Cloudera Manager displays an error: "HTTP ERROR 400 java.net.ConnectException: Unsupported ciphersuite TLS_EDH_RSA_WITH_3DES_EDE_CBC_SHA".
- COMPX-4644: Queue capacity rounding problem when configuration is initially set via YARN
-
When setting the capacity scheduler configuration through the YARN/Cloudera Manager configuration, there may be capacity values that use multiple decimal places. This results in rounding/floating point precision discrepancies in the UI when trying to validate that all sibling capacities equal 100%. The UI looks like all the numbers add up to 100, but the validation still displays an error and does not allow to save the capacities. It is also observed that the capacity is being calculated as, for example, 99.9999999991 in the backend.
- 20202 Database migration after enabling opt-in migration
- When migrating from an H2 database to a PostgreSQL database in
YARN Queue Manager after installation or upgrade, you might encounter an issue only when
you have followed the following specific scenario:
- New install or upgrade to CDP 7.1.9, forcing migration from H2 to PostgreSQL database.
- Upgrade to CDP 7.1.9 CHF2, moving back to H2 database.
- Upgrade to CDP 7.1.9 SP1 with valid PostgreSQL connection details in Queue Manager configurations.
- CDPD-56559: MapReduce jobs can intermittently fail during a rolling upgrade.
- During a rolling upgrade between CDP versions 7.1.8 and 7.1.9, MapReduce jobs may fail with message, RuntimeException: native snappy library not available. Although the native Snappy compression library is not loaded, a checkmark displays to indicate that the Snappy compression library is loading for NodeManagers that are pending upgrades. This causes the MapReduce jobs that are associated with the NodeManagers to fail. After the upgrade, the jobs work as expected. This issue only impacts rolling upgrades from before CDP 7.1.9 to a higher version.
- COMPX-12021 Queue Manager configurations on Scheduler Configuration page are not working
- When setting the following properties on the YARN Queue Manager
UI, the properties are set in the capacity-scheduler.xml file which
does not have any effect on YARN. The properties need to be set in the
yarn-site.xml file, which does not happen when you set them
through YARN Queue Manager.
- "Maximum Application Priority" – "yarn.cluster.max-application-priority"
- "Enable Monitoring Policies" – "yarn.resourcemanager.scheduler.monitor.enable"
- "Monitoring Policies" – "yarn.resourcemanager.scheduler.monitor.policies"
- "Preemption: Observe Only" – "yarn.resourcemanager.monitor.capacity.preemption.observe_only"
- "Preemption: Monitoring Interval (ms)" – "yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval"
- "Preemption: Maximum Wait Before Kill (ms)" – "yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill"
- "Preemption: Total Resources Per Round" – "yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round"
- "Preemption: Over Capacity Tolerance" – "yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity"
- "Preemption: Maximum Termination Factor" – "yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor"
- "Enable Intra Queue Preemption" – "yarn.resourcemanager.monitor.capacity.preemption.intra-queue-preemption.enabled"
- COMPX-6214: When there are more than 600 queues in a cluster, potential timeouts occur due to performance reasons that are visible in the Configuration Service.
- The Cloudera Manager proxy timeout configuration field is added now. This issue is tracked in OPSAPS-60554. For this release, the timeout is increased from 20 seconds to 5 minutes. However, if this problem occurs, Cloudera recommends you to increase the proxy timeout value.
- COMPX-5817: YARN Queue Manager UI is not able to present a view of pre-upgrade queue structure. Cloudera Manager Store is not supported and therefore YARN does not have any of the pre-upgrade queue structure preserved.
- When a Data Hub cluster is deleted, all saved configurations are also deleted. All YARN configurations are saved in Cloudera Manager Store and this is yet to be supported in Data Hub and Cloudera Manager. Hence, the YARN queue structure is also lost when a Data Hub cluster is deleted or upgraded or restored.
- COMPX-6628: Unable to delete single leaf queue assigned to a partition.
-
In the current implementation, you cannot delete a single leaf queue assigned to a partition.
- COMPX-5240: Restarting parent queue does not restart child queues in weight mode
- When a dynamic auto child creation enabled parent queue is stopped in weight mode, its static and dynamically created child queues are also stopped. However, when the dynamic auto child creation enabled parent queue is restarted, its child queues remain stopped. In addition, the dynamically created child queues cannot be restarted manually through the YARN Queue Manager UI either.
- COMPX-5589: Unable to add new queue to leaf queue with partition capacity in Weight/Absolute mode
- Scenario
- You create one or more partitions.
- Assign a partition to a parent with children
- Switch to the partition to distribute the capacities
- Create a new child queue under one of the leaf queues but the following error is displayed:
Error : 2021-03-05 17:21:26,734 ERROR com.cloudera.cpx.server.api.repositories.SchedulerRepository: Validation failed for Add queue operation. Error message: CapacityScheduler configuration validation failed:java.io.IOException: Failed to re-init queues : Parent queue 'root.test2' have children queue used mixed of weight mode, percentage and absolute mode, it is not allowed, please double check, details: {Queue=root.test2.test2childNew, label= uses weight mode}. {Queue=root.test2.test2childNew, label=partition uses percentage mode}
- COMPX-5264: Unable to switch to Weight mode on creating a managed parent queue in Relative mode
- In the current implemention, if there is an existing managed queue in Relative mode, then conversion to Weight mode is not be allowed.
- COMPX-5549: Queue Manager UI sets maximum-capacity to null when you switch mode with multiple partitions
- If you associate a partition with one or more queues and then switch the allocation mode before assigning capacities to the queues, an Operation Failed error is displayed because max-capacity is set to null.
- COMPX-4992: Unable to switch to absolute mode after deleting a partition using YARN Queue Manager
- If you delete a partition (node label) which has been associated with queues and those queues have capacities configured for that partition (node label), the CS.xml still contains the partition (node label) information. Hence, you cannot switch to absolute mode after deleting the partition (node label).
- COMPX-1445: YARN Queue Manager operations are failing when Queue Manager is installed separately from YARN
- If Queue Manager is not selected during YARN installation, Queue Manager operations are failing. Queue Manager says 0 queues are configured and several failures are present. That is because ZooKeeper configuration store is not enabled.
- COMPX-3329: Autorestart is not enabled for Queue Manager in Data Hub
- In a Data Hub cluster, Queue Manager is installed with autorestart disabled. Hence, if Queue Manager goes down, It does not restart automatically.
- Third-party applications do not launch if MapReduce framework path is not included in the client configuration
- MapReduce application framework is loaded from HDFS instead of
being present on the NodeManagers. By default the
mapreduce.application.framework.path
property is set to the appropriate value, but third-party applications with their own configurations does not launch. - JobHistory URL mismatch after server relocation
- After moving the JobHistory Server to a new host, the URLs listed for the JobHistory Server on the ResourceManager web UI still point to the old JobHistory Server. This affects existing jobs only. New jobs started after the move are not affected.
- CDH-49165: History link in ResourceManager web UI broken for killed Spark applications
- When a Spark application is killed, the history link in the ResourceManager web UI does not work.
- CDH-6808: Routable IP address required by ResourceManager
- ResourceManager requires routable
host:port
addresses foryarn.resourcemanager.scheduler.address
, and does not support using the wildcard 0.0.0.0 address. - YARN cannot start if Kerberos principal name is changed
- If the Kerberos principal name is changed in Cloudera Manager after launch, YARN does not start. In such cases, the keytabs can be correctly generated but YARN cannot access ZooKeeper with the new Kerberos principal name and old ACLs.
- Queue Manager does not open on using a custom user with a default Kerberos principal
- If a custom user is used with the default Kerberos principal, the Queue Manager web UI displays an HTTP ERROR 400 error.
- COMPX-8687: Missing access check for getAppAttemps
- When the Job ACL feature is enabled using Cloudera Manager (
mapreduce.cluster.acls.enabled
property is not generated to all configuration files, including theyarn-site.xml
configuration file. As a result, the ResourceManager process uses the default value of this property. The default property ofmapreduce.cluster.acls.enabled
isfalse
.
property), the
Unsupported Features
-
The following YARN features are currently not supported in Cloudera Data Platform:
- Application Timeline Server v2 (ATSv2)
- Auxiliary Services
- Container Resizing
- Distributed or Centralized Allocation of Opportunistic Containers
- Distributed Scheduling
- Docker on YARN (DockerContainerExecutor) on Data Hub clusters
- Fair Scheduler
- GPU support for Docker
- Hadoop Pipes
- Moving jobs between queues
- Native Services
- Pluggable Scheduler Configuration
- Queue Priority Support
- Reservation REST APIs
- Resource Estimator Service
- Resource Profiles
- (non-ZooKeeper) ResourceManager State Store
- Rolling Log Aggregation
- Shared Cache
- YARN Federation