Known Issues in MapReduce and YARN

This topic describes known issues, unsupported features and limitations for using MapReduce and YARN in this release of Cloudera Runtime.

Known Issues🔗

OPSAPS-57067: Yarn Service in Cloudera Manager reports stale configuration yarn.cluster.scaling.recommendation.enable.: Workaround: This issue does not affect the functionality. Restarting Yarn service will fix this issue.

JobHistory URL mismatch after server relocation: After moving the JobHistory Server to a new host, the URLs listed for the JobHistory Server on the ResourceManager web UI still point to the old JobHistory Server. This affects existing jobs only. New jobs started after the move are not affected.; Workaround: For any existing jobs that have the incorrect JobHistory Server URL, there is no option other than to allow the jobs to roll off the history over time. For new jobs, make sure that all clients have the updated mapred-site.xml that references the correct JobHistory Server.
CDH-49165: History link in ResourceManager web UI broken for killed Spark applications: When a Spark application is killed, the history link in the ResourceManager web UI does not work.; Workaround: To view the history for a killed Spark application, see the Spark HistoryServer web UI instead.
CDH-6808: Routable IP address required by ResourceManager: ResourceManager requires routable host:port addresses for yarn.resourcemanager.scheduler.address, and does not support using the wildcard 0.0.0.0 address.; Workaround: Set the address, in the form host:port, either in the client-side configuration, or on the command line when you submit the job.

OPSAPS-52066: Stacks under Logs Directory for Hadoop daemons are not accessible from Knox Gateway.

Stacks under the Logs directory for Hadoop daemons, such as NameNode, DataNode, ResourceManager, NodeManager, and JobHistoryServer are not accessible from Knox Gateway.

Workaround: Administrators can SSH directly to the Hadoop Daemon machine to collect stacks under the Logs directory.

CDPD-2936: Application logs are not accessible in WebUI2 or Cloudera Manager

Running Containers Logs from NodeManager local directory cannot be accessed either in Cloudera Manager or in WebUI2 due to log aggregation.

Workaround: Use the YARN log CLI to access application logs. For example:

yarn logs -applicationId <Application ID>

Apache Issue: YARN-9725

OPSAPS-50291: Environment variables HADOOP_HOME, PATH, LANG, and TZ are not getting whitelisted

It is possible to whitelist the environment variables HADOOP_HOME, PATH, LANG, and TZ, but the container launch environments do not have these variables set up automatically.

Workaround: You can manually add the required environment variables to the whitelist using Cloudera Manager.

In Cloudera Manager, select the YARN service.
Click the Configuration tab.
Search for Containers Environment Variable Whitelist.
Add the environment variables (HADOOP_HOME, PATH, LANG, TZ) which are required to the list.
Click Save Changes.
Restart all NodeManagers.
Check the YARN aggregated logs to ensure that newly whitelisted environment variables are set up for container launch.

COMPX-3181: Application logs does not work for AZURE and AWS cluster

Yarn Application Log Aggregation will fail for any YARN job (MR, Tez, Spark, etc) which do not use cloud storage, or use a cloud storage location other than the one configured for YARN logs (yarn.nodemanager.remote-app-log-dir).

Workaround: Configure the following:

For MapReduce job, set mapreduce.job.hdfs-servers in the mapred-site.xml file with all filesystems required for the job including the one set in yarn.nodemanager.remote-app-log-dir such as hdfs://nn1/,hdfs://nn2/.
For Spark job, set the job level with all filesystems required for the job including the one set in yarn.nodemanager.remote-app-log-dir such as hdfs://nn1/,hdfs://nn2/ in spark.yarn.access.hadoopFileSystems and pass it through the --config option in spark-submit.
For jobs submitted using the hadoop command, place a separate core-site.xml file with fs.defaultFS set to the filesystem set in yarn.nodemanager.remote-app-log-dir in a path. Add that directory path in --config when executing the hadoop command.

COMPX-1445: Queue Manager operations are failing when Queue Manager is installed separately from YARN

If Queue Manager is not selected during YARN installation, Queue Manager operation are failing. Queue Manager says 0 queues are configured and several failures are present. That is because ZooKeeper configuration store is not enabled.

Workaround:

In Cloudera Manager, select the YARN service.
Click the Configuration tab.
Find the Queue Manager Service property.
Select the Queue Manager service that the YARN service instance depends on.
Click Save Changes.
Restart all services that are marked stale in Cloudera Manager.

COMPX-1451: Queue Manager does not support multiple Resource

When YARN High Availability is enabled there are multiple Resource Managers. Queue Manager receives multiple ResourceManager URLs for a High Availability cluster. It picks the active ResourceManager URL only when Queue Manager page is loaded. Queue Manager cannot handle it gracefully when the currently active ResourceManager goes down while the user is still using the Queue Manager UI.

Workaround: Reload the Queue Manager page manually.

COMPX-3329: Autorestart is not enabled for Queue Manager in Data Hub

In a Data Hub cluster, Queue Manager is installed with autorestart disabled. Hence, if Queue Manager goes down, it will not restart automatically.

Workaround: If Queue Manager goes down in a Data Hub cluster, you must go to the Cloudera Manager Dashboard and restart the Queue Manager service.

Third party applications do not launch if MapReduce framework path is not included in the client configuration

MapReduce application framework is loaded from HDFS instead of being present on the NodeManagers. By default the mapreduce.application.framework.path property is set to the appropriate value, but third party applications with their own configurations will not launch.

Workaround: Set the mapreduce.application.framework.path property to the appropriate configuration for third party applications.

COMPX-3181: Log aggregation fails for YARN jobs not using the cloud storage configured by yarn.nodemanager.remote-app-log-dir

Log aggregation fails for any YARN job (MapReduce, Tez, Spark, and so on) which does not use cloud storage, or does not use the cloud storage that is configured using the yarn.nodemanager.remote-app-log-dir property.

Workaround:

For MapReduce job - Set the mapreduce.job.hdfs-servers property in the mapred-site.xml configuration file at job level with all filesystems required for the job. That includes the one set by the yarn.nodemanager.remote-app-log-dir property such as hdfs://nn1/,hdfs://nn2/.
For Spark job - Set the spark.yarn.access.hadoopFileSystems through --config in spark submit at job level with all filesystems required for the job. That includes the one set by the yarn.nodemanager.remote-app-log-dir property such as hdfs://nn1/,hdfs://nn2/.
For jobs submitted using haddop command - Place a separate core-site.xml configuration file in which the fs.defaultFS property is set to the filesystem that is set by the yarn.nodemanager.remote-app-log-dir property in a path. Add that directory path in --config as part of the command.

YARN cannot start if Kerberos principal name is changed

If the Kerberos principal name is changed in Cloudera Manager after launch, YARN will not be able to start. In such case the keytabs can be correctly generated but YARN cannot access ZooKeeper with the new Kerberos principal name and old ACLs.

There are two possible workarounds:

Delete the znode and restart the YARN service.
Use the reset ZK ACLs command. This also sets the znodes below /rmstore/ZKRMStateRoot to world:anyone:cdrwa which is less secure.

COMPX-8687: Missing access check for getAppAttemps

When the Job ACL feature is enabled using Cloudera Manager (YARN > Configuration > Enablg JOB ACLproperty), the mapreduce.cluster.acls.enabled property is not generated to all configuration files, including the

yarn-site.xml

configuration file. As a result the ResourceManager process will use the default value of this property. The default property of mapreduce.cluster.acls.enabled is false.

Workaround: Enable the Job ACL feature using an advanced configuration snippet:

In Cloudera Manager select the YARN service.
Click Configuration.
Find the YARN Service MapReduce Advanced Configuration Snippet (Safety Valve) property.
Click the plus icon and add the following:
- Name: mapreduce.cluster.acls.enabled
- Value: true
Click Save Changes.

Unsupported Features🔗

The following YARN features are currently not supported in Cloudera Data Platform:

GPU support for Docker
Hadoop Pipes
Fair Scheduler
Application Timeline Server (ATS 2 and ATS 1.5)
Container Resizing
Distributed or Centralized Allocation of Opportunistic Containers
Distributed Scheduling
Native Services
Pluggable Scheduler Configuration
Queue Priority Support
Reservation REST APIs
Resource Estimator Service
Resource Profiles
(non-Zookeeper) ResourceManager State Store
Shared Cache
YARN Federation
Rolling Log Aggregation
Docker on YARN (DockerContainerExecutor) on Data Hub clusters
Moving jobs between queues
Dynamic Resource Pools

Known Issues in MapReduce and YARN

Known Issues🔗

Unsupported Features🔗

We want your opinion

How can we improve this page?