Fixed Issues in Yarn and Yarn Queue Manager

Review the list of Yarn and Yarn Queue Manager issues that are resolved in Cloudera Runtime 7.3.1, its service packs and cumulative hotfixes.

Cloudera Runtime 7.3.1.400 SP2

There are no fixed issues in this release.

Cloudera Runtime 7.3.1.300 SP1 CHF 1

There are no fixed issues in this release.

Cloudera Runtime 7.3.1.200 SP1

COMPX-18663: Cgroup v2 support must fall back to v1 when there are no v2 controllers
7.3.1.200
Cgroup v1/v2 mixed mode support was introduced previously. If the cgroup v2 support was enabled (using yarn.nodemanager.linux-container-executor.cgroups.v2.enabled set to true), with only cgroup v1 controllers mounted on the Node Managers, the cgroup v2 support did not fall back to v1. This issue is now resolved and cgroup v2 support now falls back to v1 when there are no v2 controllers.

Apache Jira: YARN-11743

COMPX-18909: NodeManager marked as unhealthy if an application is terminated
7.3.1.200
Node Managers are marked unhealthy if the container-executor has an unrecoverable/configuration error, that is, the container-executor script is missing. However, if the application was terminated just before one of the containers was trying to access the localizer syslog file, causing an IOException then the Node Managers was marked unhealthy incorrectly. This issue is now resolved and the error checking is more specific, thereby, removing the false positive Node Managers markings.

Apache Jira: YARN-11753

COMPX-18545: Setting maximum-application-lifetime using AQCv2 templates does not apply on the first submitted application
7.3.1.200
Setting the maximum-application-lifetime property using the AQC v2 templates did not apply to the first submitted application bu twas applied to the subsequent ones. This issue is now resolved.

Apache Jira: YARN-11708

COMPX-18589: YARN ResourceManager raised an exception during comparison of queues
7.3.1.200
YARN ResourceManager raised an exception, java.lang.IllegalArgumentException: Comparison method violates its general contract!. The RCA was with the AND condition that caused the exception of TimSort algorithm during comparison of queues. This issue is now resolved.

Apache Jira: YARN-11745

CDPD-49702: NodeManager must be shut down when the program /var/lib/yarn-ce/bin/container-executor cannot be run
7.3.1.200
Previoulsy, a job failed when NodeManager encountered the No such file or directory error when running the /var/lib/yarn-ce/bin/container-executor program. This issue is now resolved and NodeManager is marked as unhealthy and shut down when it cannot run the program.

Apache Jira: YARN-11709

Cloudera Runtime 7.3.1.100 CHF 1

Fixed the order of updating CPU controls with cgroup v1
7.3.1.100

This fix ensures that cpu.cfs_period_us is updated before cpu.cfs_quota_us, to keep the ratio between the two values and not to overcome the limit defined at parent level.

Apache Jira: YARN-11733

Cloudera Runtime 7.3.1

COMPX-17702: Backport - YARN-10345 - HsWebServices containerlogs does not honor ACLs for completed jobs
7.3.1
The following rest APIs now have ACL authorization:
  • /ws/v1/history/containerlogs/{containerid}/{filename}
  • /ws/v1/history/containers/{containerid}/logs

Apache Jira: YARN-10345

COMPX-16285: Optimize system credentials sent in node heartbeat responses
7.3.1
Previously, the heartbeat responses set all application's tokens even though all applications were not active on a node. Hence, for each node and each heartbeat too many SystemCredentialsForAppsProto objects were created. This issue is now resolved and the system credentials sent in node heartbeat responses are optimized..

Apache Jira: YARN-6523

CDPD-73754: Yarn Application Master Node web link is broken on yarnuiv2 page
7.3.1
Previously, the RM did not open the Yarn application manager node web link on the yarnuiv2 page because the URL ended with a /. This issue is now resolved and the last character / is now removed from the URL.

Apache Jira: YARN-11729