Known Issues in MapReduce and YARN

This topic describes known issues, unsupported features and limitations for using MapReduce and YARN in this release of Cloudera Runtime.

Known Issues

JobHistory URL mismatch after server relocation
After moving the JobHistory Server to a new host, the URLs listed for the JobHistory Server on the ResourceManager web UI still point to the old JobHistory Server. This affects existing jobs only. New jobs started after the move are not affected.
Workaround: For any existing jobs that have the incorrect JobHistory Server URL, there is no option other than to allow the jobs to roll off the history over time. For new jobs, make sure that all clients have the updated mapred-site.xml that references the correct JobHistory Server.
CDH-49165: History link in ResourceManager web UI broken for killed Spark applications
When a Spark application is killed, the history link in the ResourceManager web UI does not work.
Workaround: To view the history for a killed Spark application, see the Spark HistoryServer web UI instead.
CDH-6808: Routable IP address required by ResourceManager
ResourceManager requires routable host:port addresses for yarn.resourcemanager.scheduler.address, and does not support using the wildcard 0.0.0.0 address.
Workaround: Set the address, in the form host:port, either in the client-side configuration, or on the command line when you submit the job.
OPSAPS-52066: Stacks under Logs Directory for Hadoop daemons are not accessible from Knox Gateway.
Stacks under the Logs directory for Hadoop daemons, such as NameNode, DataNode, ResourceManager, NodeManager, and JobHistoryServer are not accessible from Knox Gateway.
Workaround: Administrators can SSH directly to the Hadoop Daemon machine to collect stacks under the Logs directory.
CDPD-2936: Application logs are not accessible in WebUI2 or Cloudera Manager
Running Containers Logs from NodeManager local directory cannot be accessed either in Cloudera Manager or in WebUI2 due to log aggregation.
Workaround: Use the YARN log CLI to access application logs. For example:
yarn logs -applicationId <Application ID>
Apache Issue: YARN-9725
OPSAPS-50291: Environment variables HADOOP_HOME, PATH, LANG, and TZ are not getting whitelisted
It is possible to whitelist the environment variables HADOOP_HOME, PATH, LANG, and TZ, but the container launch environments do not have these variables set up automatically.
Workaround: You can manually add the required environment variables to the whitelist using Cloudera Manager.
  1. In Cloudera Manager, select the YARN service.
  2. Click the Configuration tab.
  3. Search for Containers Environment Variable Whitelist.
  4. Add the environment variables (HADOOP_HOME, PATH, LANG, TZ) which are required to the list.
  5. Click Save Changes.
  6. Restart all NodeManagers.
  7. Check the YARN aggregated logs to ensure that newly whitelisted environment variables are set up for container launch.

Limitations

Capacity Scheduler
  • As Capacity Scheduler is the default scheduler, the Dynamic Resource Pool User Interface is not available by default.
  • Capacity Scheduler can be configured only through safety-valves in Cloudera Manager.
COMPX-8687: Missing access check for getAppAttemps
When the Job ACL feature is enabled using Cloudera Manager (YARN > Configuration > Enablg JOB ACLproperty), the mapreduce.cluster.acls.enabled property is not generated to all configuration files, including the yarn-site.xml configuration file. As a result the ResourceManager process will use the default value of this property. The default property of mapreduce.cluster.acls.enabled is false.
Workaround: Enable the Job ACL feature using an advanced configuration snippet:
  1. In Cloudera Manager select the YARN service.
  2. Click Configuration.
  3. Find the YARN Service MapReduce Advanced Configuration Snippet (Safety Valve) property.
  4. Click the plus icon and add the following:
    • Name: mapreduce.cluster.acls.enabled
    • Value: true
  5. Click Save Changes.

Unsupported Features

The following YARN features are currently not supported in Cloudera Data Platform:
  • GPU support for Docker
  • Hadoop Pipes
  • Fair Scheduler
  • Application Timeline Server (ATS 2 and ATS 1.5)
  • Container Resizing
  • Distributed or Centralized Allocation of Opportunistic Containers
  • Distributed Scheduling
  • Native Services
  • Pluggable Scheduler Configuration
  • Queue Priority Support
  • Reservation REST APIs
  • Resource Estimator Service
  • Resource Profiles
  • (non-Zookeeper) ResourceManager State Store
  • Shared Cache
  • YARN Federation
  • Rolling Log Aggregation
  • Docker on YARN (DockerContainerExecutor) on Data Hub clusters
  • Moving jobs between queues
  • Dynamic Resource Pools