Known Issues in MapReduce and YARN
This topic describes known issues, limitations and unsupported features for using MapReduce and YARN in this release of Cloudera Runtime.
- JobHistory URL mismatch after server relocation
- After moving the JobHistory Server to a new host, the URLs listed for the JobHistory Server on the ResourceManager web UI still point to the old JobHistory Server. This affects existing jobs only. New jobs started after the move are not affected.
- Workaround: For any existing jobs that have the incorrect JobHistory Server URL,
there is no option other than to allow the jobs to roll off the history over time. For new
jobs, make sure that all clients have the updated
mapred-site.xmlthat references the correct JobHistory Server.
- CDH-49165: History link in ResourceManager web UI broken for killed Spark applications
- When a Spark application is killed, the history link in the ResourceManager web UI does not work.
- Workaround: To view the history for a killed Spark application, see the Spark HistoryServer web UI instead.
- CDH-6808: Routable IP address required by ResourceManager
- ResourceManager requires routable
yarn.resourcemanager.scheduler.address, and does not support using the wildcard 0.0.0.0 address.
- Workaround: Set the address, in the form
host:port, either in the client-side configuration, or on the command line when you submit the job.
- CDH-17955: Amazon S3 copy may time out
- The Amazon S3 filesystem does not support renaming files, and performs a copy operation instead. If the file to be moved is very large, the operation can time out because S3 does not report progress during the operation.
- Workaround: Use
-Dmapred.task.timeout=15000000to increase the MR task timeout.
- Apache Issue: MAPREDUCE-972
- OPSAPS-52066: Stacks under Logs Directory for Hadoop daemons are not accessible from Knox Gateway.
- Stacks under the Logs directory for Hadoop daemons, such as NameNode, DataNode, ResourceManager, NodeManager, and JobHistoryServer are not accessible from Knox Gateway.
- Workaround: Administrators can SSH directly to the Hadoop Daemon machine to collect stacks under the Logs directory.
- CDPD-2936: Application logs are not accessible in WebUI2 or Cloudera Manager
- Application logs cannot be accessed neither in Cloudera Manager or in WebUI2 due to log aggregation.
- Workaround: Use the YARN log CLI to access application logs. For example:
yarn logs -applicationId <Application ID>
- Apache Issue: YARN-9725
- Capacity Scheduler
- As Capacity Scheduler is the default scheduler, the Dynamic Resource Pool User Interface is not available by default.
- Capacity Scheduler can be configured only through safety-valves in Cloudera Manager.
The following YARN features are currently not supported in Cloudera Data Platform:
- Docker on YARN (DockerContainerExecutor)
- GPU and other Custom Resources
- Hadoop Pipes