Fixed Issues in Apache YARN

Review the list of YARN issues that are resolved in Cloudera Runtime 7.1.9.

COMPX-14340: YARN-11490 JMX QueueMetrics breaks after mutable config validation in CS
Fix: JMX metrics broke after 2 or more configuration validation.
COMPX-13959: Applications submitted to ambiguous queue fail during recovery if "Specified" Placement Rule is used
Fixed the issue of app killed, if specified placement is used and rm is restarted while the app is still running.
COMPX-13773: YARN-11461 NPE in determineMissingParents when the queue is invalid
Fix NPE log warning when submitting to invalid queue.
COMPX-14120: Backport YARN-11463: Node Labels root directory creation doesn't have a retry logic
Retry logic is implemented and backported for root directory creation during RM node label store inititalization.
COMPX-10909: Investigate if placement rules are working fine if username contains dot, and default queue is set to that queue
Usernames with dot now will work well with CS placement rules
COMPX-13554: Backport YARN-10178 to 7.1.9 CHFx : Crash in global async scheduler thread
With this fix the Capacity Scheduler Global Scheduler AsyncThread won't crash when multi async thread concurrently compares queue usage statistics and ResourceCommitterService applies leaf queue change statistics.
COMPX-12661: YARN-11075 Explicitly declare serialVersionUID in LogMutation class
The serialVersionUID field is explicitly set for the LogMutation class.
COMPX-13392: HADOOP-18602 Remove netty3 dependency - CDH-7.1.9
netty3 is removed
COMPX-12815: Backport YARN-10178 to 7.1.8 CHFx : Crash in global async scheduler thread
With this fix the Capacity Scheduler Global Scheduler AsyncThread won't crash when multi async thread concurrently compares queue usage statistics and ResourceCommitterService applies leaf queue change statistics.
COMPX-12783: Backport YARN-11079 (Make an AbstractParentQueue to store common ParentQueue and ManagedParentQueue functionality)
Made an AbstractParentQueue to store common ParentQueue and ManagedParentQueue functionality
COMPX-14124: Backport YARN-10739 GenericEventHandler.printEventQueueDetails causes RM recovery to take too much time
GenericEventHandler.printEventQueueDetails causes RM recovery to take too much time so added thread pool for async print event details ,to prevent wasting too much time for RM.
COMPX-14122: Backport YARN-11286: Make AsyncDispatcher#printEventDetailsExecutor thread pool parameter configurable
Made AsyncDispatcher#printEventDetailsExecutor thread pool parameter configurable
CDPD-41982: Yarn - Upgrade Guava: Google Core Libraries for Java to v28.2/31.1-jre due to CVEs
Upgraded Guava Google Core Libraries for Java to v28.2 due to CVEs
CDPD-57948: [7.1.9 ZDU Simulation] Hive Query is failing when YARN is into rolling restart
YARN-side fix is implemented and backported to cdpd-master and 7.1.9.x
COMPX-6054: PlacementPolicy Rules(default rule) is not honoured in case limit 2 is breached for AQC
This issue is resolved.
COMPX-5244: Root queue should not be enabled for auto-queue creation
This issue is resolved.
COMPX-3181: Application logs does not work for AZURE and AWS cluster
Support of automatically fetching Delegation Token for YARN Log Aggregation Path (S3 or Azure) in YarnClient.
OPSAPS-52066: Stacks under Logs Directory for Hadoop daemons are not accessible from Knox Gateway.
Issue was due to wrong URL being displayed. Both jstacks log viewer and download URLs have been fixed.
OPSAPS-57067: Yarn Service in Cloudera Manager reports stale configuration yarn.cluster.scaling.recommendation.enable.
This issue is resolved.
CDPD-2936: Application logs are not accessible in WebUI2 or Cloudera Manager
This issue is resolved.
OPSAPS-50291: Environment variables HADOOP_HOME, PATH, LANG, and TZ are not getting whitelisted
"HADOOP_HOME,PATH,LANG,TZ" are now added by default to the yarn.nodemanager.env-whitelist Yarn configuration option.
COMPX-3303: Auto queue deletion is not supported in relative and absolute resource allocation mode
This issue is resolved.
OPSAPS-68058: [CKP-4] YARN allowed system users are hardcoded
Allowed system users are now generated dynamically, based on the Kerberos principals, process users and auth-to-local rules.
OPSAPS-67682: [CKP-3, 4(unequal)] Yarn failed to start the resource manager
The permissions of the node label directory were eased to allow the process users group members to access it.
OPSAPS-67860: [BLOCKER] 718CHF9 to 719 | During rolling upgrade Delete the confstore on YARN Zookeeper nodes failed
The script was fixed to use Kerberos auth instead of relying on digest.
OPSAPS-68108: Upgrade failures from CDH6 to 7.1.9 because ACL is not the expected for znode after OPSAPS-67993
Fixed issue with the ACL validator.
OPSAPS-67993: Upgrade failures from CDH6 to 7.1.9 because ACL is not the expected for znode after OPSAPS-63187
The bash script was updated to work in a secured environment.

Apache patch information

  • MAPREDUCE-7237
  • MAPREDUCE-7268
  • MAPREDUCE-7434
  • MAPREDUCE-7433
  • MAPREDUCE-7431
  • YARN-10930
  • YARN-11286
  • YARN-10739
  • YARN-10178
  • HADOOP-18602
  • YARN-11190
  • YARN-11463
  • YARN-11461
  • YARN-11513
  • YARN-10888
  • YARN-11533
  • YARN-11490