Apache Hadoop Known Issues
— Deprecated Properties
In Hadoop 2.0.0 and later, a number of Hadoop and HDFS properties have been deprecated. (The change dates from Hadoop 0.23.1, on which the Beta releases of CDH 4 were based). A list of deprecated properties and their replacements can be found at http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-common/DeprecatedProperties.html.
HDFS
— If you install CDH using packages, HDFS NFS gateway works only on RHEL-compatible systems
Because of a bug in native versions of portmap/rpcbind, the HDFS NFS gateway does not work on SLES, Ubuntu, or Debian systems if you install CDH from the command-line, using packages. It does work on supported versions of RHEL- compatible systems on which rpcbind-0.2.0-10.el6 or later is installed, and it does work if you use Cloudera Manager to install CDH.
Bug: 731542 (Red Hat), 823364 (SLES), 594880 (Debian)
Severity: High
- On Red Hat and similar systems, make sure rpcbind-0.2.0-10.el6 or later is installed.
- On SLES, Debian, and Ubuntu systems, do one of the following:
- Install CDH using Cloudera Manager; or
- Start the NFS gateway without using packages; or
- You can use the gateway by running rpcbind in insecure mode, using the -i option, but keep in mind that this allows anyone from a remote host to bind to the portmap.
— Upgrading from CDH 4 Requires an HDFS Upgrade
Upgrading from CDH 4 requires an HDFS Upgrade. See Upgrading to CDH 5 from CDH 4 for further information.
— Hadoop shell commands which reference the root directory ("/") do not work
— No error when changing permission to 777 on .snapshot directory
Snapshots are read-only; running chmod 777 on the .snapshots directory does not change this, but does not produce an error (though other illegal operations do).
Bug: HDFS-4981
Severity: Low
Workaround: None
— Snapshot operations are not supported by ViewFileSystem
Bug: None
Severity: Low
Workaround: None
— Snapshots do not retain directories' quotas settings
— NameNode cannot use wildcard address in a secure cluster
In a secure cluster, you cannot use a wildcard for the NameNode's RPC or HTTP bind address. For example, dfs.namenode.http-address must be a real, routable address and port, not 0.0.0.0.<port>. This should affect you only if you are running a secure cluster and your NameNode needs to bind to multiple local addresses.
Bug: HDFS-4448
Severity: Medium
Workaround: None
— Permissions for dfs.namenode.name.dir incorrectly set.
Hadoop daemons should set permissions for the dfs.namenode.name.dir (or dfs.name.dir) directories to drwx------ (700), but in fact these permissions are set to the file-system default, usually drwxr-xr-x (755).
Bug: HDFS-2470
Severity: Low
Workaround: Use chmod to set permissions to 700. See Configuring Local Storage Directories for Use by HDFS for more information and instructions.
— The default setting of dfs.client.block.write.replace-datanode-on-failure.policy can cause an unrecoverable error in small clusters
The default setting of dfs.client.block.write.replace-datanode-on-failure.policy (DEFAULT) can cause an unrecoverable error in a small cluster during HBase rolling restart.
Bug: HDFS-5131
Severity: Medium
Workaround: Set dfs.client.block.write.replace-datanode-on-failure.policy to NEVER for 1- 2- or 3-node clusters, and leave it as DEFAULT for all other clusters. Leave dfs.client.block.write.replace-datanode-on-failure.enable set to true .
— hadoop fsck -move does not work in a cluster with host-based Kerberos
Bug: None
Severity: Low
Workaround: Use hadoop fsck -delete
CDH 5 clients running releases 5.0.1 and earlier cannot use WebHDFS to connect to a CDH 4 cluster
Found 21 items ls: Invalid value for webhdfs parameter "op": No enum const class org.apache.hadoop.hdfs.web.resources.GetOpParam.Op.GETACLSTATUS
Bug: HDFS-6326
Severity: Medium
Workaround: None; note that this is fixed as of CDH 5.0.2.
— HttpFS cannot get delegation token without prior authenticated request.
A request to obtain a delegation token cannot initiate an SPNEGO authentication sequence; it must be accompanied by an authentication cookie from a prior SPNEGO authentication sequence.
Bug: HDFS-3988
Severity: Low
Workaround: Make another WebHDFS request (such as GETHOMEDIR) to initiate an SPNEGO authentication sequence and then make the delegation token request.
— DistCp does not work between a secure cluster and an insecure cluster
— Using DistCp with Hftp on a secure cluster using SPNEGO requires that the dfs.https.port property be configured
In order to DistCp using Hftp from a secure cluster using SPNEGO, you must configure the dfs.https.port property on the client to use the HTTP port (50070 by default).
Bug: HDFS-3983
Severity: Low
Workaround: Configure dfs.https.port to use the HTTP port on the client
— The same DataNodes may appear in the NameNode web UI in both the live and dead node lists
— The ulimits setting in /etc/security/limits.conf is applied to the wrong user if security is enabled.
Bug: https://issues.apache.org/jira/browse/DAEMON-192
Severity: Low
Anticipated Resolution: None
Workaround: To increase the ulimits applied to DataNodes, you must change the ulimit settings for the root user, not the hdfs user.
MapReduce
— No JobTracker becomes active if both JobTrackers are migrated to other hosts
If JobTrackers in an High Availability configuration are shut down, migrated to new hosts, then restarted, no JobTracker becomes active. The logs show a Mismatched address exception.
Bug: None
Severity: Low
$ zkCli.sh rmr /hadoop-ha/<logical name>
— Hadoop Pipes may not be usable in an MRv1 Hadoop installation done through tarballs
Under MRv1, MapReduce's C++ interface, Hadoop Pipes, may not be usable with a Hadoop installation done through tarballs unless you build the C++ code on the operating system you are using.
Bug: None
Severity: Medium
Workaround: Build the C++ code on the operating system you are using. The C++ code is present under src/c++ in the tarball.
— Task-completed percentage may be reported as slightly under 100% in the web UI, even when all of a job's tasks have successfully completed.
Bug: None
Severity: Low
Workaround: None
— Spurious warning in MRv1 jobs
The mapreduce.client.genericoptionsparser.used property is not correctly checked by JobClient and this leads to a spurious warning.
Bug: None
Severity: Low
Workaround: MapReduce jobs using GenericOptionsParser or implementing Tool can remove the warning by setting this property to true.
— Oozie workflows will not be recovered in the event of a JobTracker failover on a secure cluster
Delegation tokens created by clients (via JobClient#getDelegationToken()) do not persist when the JobTracker fails over. This limitation means that Oozie workflows will not be recovered successfully in the event of a failover on a secure cluster.
Bug: None
Severity: Medium
Workaround: Re-submit the workflow.
— Encrypted shuffle in MRv2 does not work if used with LinuxContainerExecutor and encrypted web UIs.
In MRv2, if the LinuxContainerExecutor is used (usually as part of Kerberos security), and hadoop.ssl.enabled is set to true (See Configuring Encrypted Shuffle, Encrypted Web UIs, and Encrypted HDFS Transport), then the encrypted shuffle does not work and the submitted job fails.
Bug: MAPREDUCE-4669
Severity: Medium
Workaround: Use encrypted shuffle with Kerberos security without encrypted web UIs, or use encrypted shuffle with encrypted web UIs without Kerberos security.
— Hadoop client JARs don't provide all the classes needed for clean compilation of client code
$ javac -cp '/usr/lib/hadoop/client/*' -d wordcount_classes WordCount.java org/apache/hadoop/fs/Path.class(org/apache/hadoop/fs:Path.class): warning: Cannot find annotation method 'value()' in type 'org.apache.hadoop.classification.InterfaceAudience.LimitedPrivate': class file for org.apache.hadoop.classification.InterfaceAudience not found 1 warning
Bug:
Severity: Low
Workaround: None
—Must set yarn.resourcemanager.scheduler.address to routable host:port when submitting a job from the ResourceManager
When you submit a job from the ResourceManager, yarn.resourcemanager.scheduler.address must be set to a real, routable address, not the wildcard 0.0.0.0.
Bug: MAPREDUCE-972
Severity: Low
Workaround: Set the address, in the form host:port, either in the client-side configuration, or on the command line when you submit the job.
—Amazon S3 copy may time out
The Amazon S3 filesystem does not support renaming files, and performs a copy operation instead. If the file to be moved is very large, the operation can time out because S3 does not report progress to the TaskTracker during the operation.
Bug: MAPREDUCE-972
Severity: Low
Workaround: Use -Dmapred.task.timeout=15000000 to increase the MR task timeout.
YARN
— Starting an unmanaged ApplicationMaster may fail
Starting a custom Unmanaged ApplicationMaster may fail due to a race in getting the necessary tokens.
Bug: YARN-1577
Severity: Low
Workaround: Try to get the tokens again; the custom unmanaged ApplicationMaster should be able to fetch the necessary tokens and start successfully.
— Job movement between queues does not persist across ResourceManager restart
CDH 5 adds the capability to move a submitted application to a different scheduler queue. This queue placement is not persisted across ResourceManager restart or failover, which resumes the application in the original queue.
Bug: YARN-1558
Severity: Medium
Workaround: After ResourceManager restart, re-issue previously issued move requests.
— ResourceManager High Availability with manual failover does not work on secure clusters
— Link from ResourceManager to Application Master does not work when the Web UI over HTTPS feature is enabled.
In MRv2 (YARN), if hadoop.ssl.enabled is set to true (use HTTPS for web UIs), then the link from the ResourceManager to the running MapReduce Application Master fails with an HTTP Error 500 because of a PKIX exception.
A job can still be run successfully, and, when it finishes, the link to the job history does work.
Bug: YARN-113
Severity: Low
Workaround: Don't use encrypted web UIs.
Both ResourceManagers can end up in Standby mode
After a restart, if an application fails to recover, both Resource Managers could end up in Standby mode.
Severity: High
- Stop the RM. Format the state store using yarn resourcemanager -format-state-store. Applications that were running before the RM went down will not be recovered.
- You can limit the number of completed applications the RM state-store stores (yarn.resourcemanager.state-store.max-completed-applications) to reduce the chances of running into this problem.
HTTPS does not work on the HTTPS configured port
If you enable HTTPS (SSL) for YARN services, these services (including ResourceManager, NodeManager, and Job History Server) will not continue to use non-secure HTTP, but HTTPS does not work on the HTTPS configured port.
Bug: YARN-1553
Severity: High
Workaround: None. The problem is fixed as of CDH 5.1.4.
<< Apache Flume Known Issues | Apache HBase Known Issues >> | |