Fixed Issues in Apache YARN
Review the list of YARN issues that are resolved in Cloudera Runtime 7.1.9.
- COMPX-14340: YARN-11490 JMX QueueMetrics breaks after mutable config validation in CS
- Fix: JMX metrics broke after 2 or more configuration validation.
- COMPX-13959: Applications submitted to ambiguous queue fail during recovery if "Specified" Placement Rule is used
- Fixed the issue of app killed, if specified placement is used and rm is restarted while the app is still running.
- COMPX-13773: YARN-11461 NPE in determineMissingParents when the queue is invalid
- Fix NPE log warning when submitting to invalid queue.
- COMPX-14120: Backport YARN-11463: Node Labels root directory creation doesn't have a retry logic
- Retry logic is implemented and backported for root directory creation during RM node label store inititalization.
- COMPX-10909: Investigate if placement rules are working fine if username contains dot, and default queue is set to that queue
- Usernames with dot now will work well with CS placement rules
- COMPX-13554: Backport YARN-10178 to 7.1.9 CHFx : Crash in global async scheduler thread
- With this fix the Capacity Scheduler Global Scheduler AsyncThread won't crash when multi async thread concurrently compares queue usage statistics and ResourceCommitterService applies leaf queue change statistics.
- COMPX-12661: YARN-11075 Explicitly declare serialVersionUID in LogMutation class
- The serialVersionUID field is explicitly set for the LogMutation class.
- COMPX-13392: HADOOP-18602 Remove netty3 dependency - CDH-7.1.9
- netty3 is removed
- COMPX-12815: Backport YARN-10178 to 7.1.8 CHFx : Crash in global async scheduler thread
- With this fix the Capacity Scheduler Global Scheduler AsyncThread won't crash when multi async thread concurrently compares queue usage statistics and ResourceCommitterService applies leaf queue change statistics.
- COMPX-12783: Backport YARN-11079 (Make an AbstractParentQueue to store common ParentQueue and ManagedParentQueue functionality)
- Made an AbstractParentQueue to store common ParentQueue and ManagedParentQueue functionality
- COMPX-14124: Backport YARN-10739 GenericEventHandler.printEventQueueDetails causes RM recovery to take too much time
- GenericEventHandler.printEventQueueDetails causes RM recovery to take too much time so added thread pool for async print event details ,to prevent wasting too much time for RM.
- COMPX-14122: Backport YARN-11286: Make AsyncDispatcher#printEventDetailsExecutor thread pool parameter configurable
- Made AsyncDispatcher#printEventDetailsExecutor thread pool parameter configurable
- CDPD-41982: Yarn - Upgrade Guava: Google Core Libraries for Java to v28.2/31.1-jre due to CVEs
- Upgraded Guava Google Core Libraries for Java to v28.2 due to CVEs
- CDPD-57948: [7.1.9 ZDU Simulation] Hive Query is failing when YARN is into rolling restart
- YARN-side fix is implemented and backported to cdpd-master and 7.1.9.x
- COMPX-6054: PlacementPolicy Rules(default rule) is not honoured in case limit 2 is breached for AQC
- This issue is resolved.
- COMPX-5244: Root queue should not be enabled for auto-queue creation
- This issue is resolved.
- COMPX-3181: Application logs does not work for AZURE and AWS cluster
- Support of automatically fetching Delegation Token for YARN Log Aggregation Path (S3 or Azure) in YarnClient.
- OPSAPS-52066: Stacks under Logs Directory for Hadoop daemons are not accessible from Knox Gateway.
- Issue was due to wrong URL being displayed. Both jstacks log viewer and download URLs have been fixed.
- OPSAPS-57067: Yarn Service in Cloudera Manager reports stale configuration yarn.cluster.scaling.recommendation.enable.
- This issue is resolved.
- CDPD-2936: Application logs are not accessible in WebUI2 or Cloudera Manager
- This issue is resolved.
- OPSAPS-50291: Environment variables HADOOP_HOME, PATH, LANG, and TZ are not getting whitelisted
- "HADOOP_HOME,PATH,LANG,TZ" are now added by default to the yarn.nodemanager.env-whitelist Yarn configuration option.
- COMPX-3303: Auto queue deletion is not supported in relative and absolute resource allocation mode
- This issue is resolved.
- OPSAPS-68058: [CKP-4] YARN allowed system users are hardcoded
- Allowed system users are now generated dynamically, based on the Kerberos principals, process users and auth-to-local rules.
- OPSAPS-67682: [CKP-3, 4(unequal)] Yarn failed to start the resource manager
- The permissions of the node label directory were eased to allow the process users group members to access it.
- OPSAPS-67860: [BLOCKER] 718CHF9 to 719 | During rolling upgrade Delete the confstore on YARN Zookeeper nodes failed
- The script was fixed to use Kerberos auth instead of relying on digest.
- OPSAPS-68108: Upgrade failures from CDH6 to 7.1.9 because ACL is not the expected for znode after OPSAPS-67993
- Fixed issue with the ACL validator.
- OPSAPS-67993: Upgrade failures from CDH6 to 7.1.9 because ACL is not the expected for znode after OPSAPS-63187
- The bash script was updated to work in a secured environment.
Apache patch information
- MAPREDUCE-7237
- MAPREDUCE-7268
- MAPREDUCE-7434
- MAPREDUCE-7433
- MAPREDUCE-7431
- YARN-10930
- YARN-11286
- YARN-10739
- YARN-10178
- HADOOP-18602
- YARN-11190
- YARN-11463
- YARN-11461
- YARN-11513
- YARN-10888
- YARN-11533
- YARN-11490