Apache Hadoop Incompatible Changes
HDFS
The following incompatible changes have been introduced in CDH 5:
- The getSnapshottableDirListing() method returns null when there are no snapshottable directories. This is a change from CDH 5 Beta 2 where the method returns an empty array instead.
- HDFS-5138 - The -finalize NameNode startup option has been removed. To finalize an in-progress upgrade, you should instead use the hdfs dfsadmin -finalizeUpgrade command while your NameNode is running, or while both NameNodes are running in a High Availability setup.
- HDFS-2832 - The HDFS internal layout version has changed between CDH 5 Beta 1 and CDH 5 Beta 2, so a file system upgrade is required to move an existing Beta 1 cluster to Beta 2.
- HDFS-4997 - libhdfs functions now return correct error codes in errno in case of an error, instead of always returning 255.
- HDFS-4451: HDFS balancer command returns exit code 0 on success instead of 1.
- HDFS-4659: Support setting
execution bit for regular files.
- Impact: In CDH 5, files copied out of copyToLocal may now have the executable bit set if it was set when they were created or copied into HDFS.
- HDFS-4594: WebHDFS open sets
Content-Length header to what is specified by length parameter rather than how much data
is actually returned.
- Impact: In CDH 5, Content-Length header will contain the number of bytes actually returned, rather than the request length.
- HADOOP-10020: Disable symlinks temporarily.
- Files named .snapshot or .reserved must not exist within HDFS.
Change in High-Availability Support
In CDH 5, the only high-availability (HA) implementation is Quorum-based storage; shared storage using NFS is no longer supported.
MapReduce
Important: There is no separate tarball for MRv1. Instead,
the MRv1 binaries, examples, etc., are delivered in the Hadoop tarball itself. The scripts
for running MRv1 are in the bin-mapreduce1 directory in the tarball, and the MRv1 examples are in the
examples-mapreduce1 directory.
You need to do some additional configuration; follow the directions below.
To use MRv1 from a tarball installation, proceed as follows:
- Extract the files from the tarball.Note
: In the steps that follow, install_dir is the name of the directory into which you extracted the files. - Create a symbolic link as
follows:
ln -s install_dir/bin-mapreduce1 install_dir/share/hadoop/mapreduce1/bin
- Create a second symbolic link as
follows:
ln -s install_dir/etc/hadoop-mapreduce1 install_dir/share/hadoop/mapreduce1/conf
- Set the HADOOP_HOME and HADOOP_CONF_DIR environment variables in your execution environment as
follows:
$ export HADOOP_HOME=install_dir/share/hadoop/mapreduce1 $ export HADOOP_CONF_DIR=$HADOOP_HOME/conf
- Copy your existing start-dfs.sh and stop-dfs.sh scripts to install_dir/bin-mapreduce1
- For convenience, add install_dir/bin to the PATH variable in your execution environment .
Apache MapReduce 2.0 (YARN) Incompatible Changes
The following incompatible changes occurred for Apache MapReduce 2.0
(YARN) between CDH 4.x and CDH 5 Beta 2:
- The CATALINA_BASE variable no longer determines whether a component is configured for YARN or MRv1. Use the alternatives command instead, and make sure CATALINA_BASE is not set; see the Oozie and Sqoop2 configuration sections for instructions.
- YARN-1288 - YARN Fair Scheduler ACL change. Root queue defaults to everybody, and other queues default to nobody.
- YARN High Availability configurations have changed. Configuration keys have been renamed among other changes.
- The YARN_HOME property has been changed to HADOOP_YARN_HOME.
- Note the following changes to configuration properties in
yarn-site.xml:
- The value of yarn.nodemanager.aux-services should be changed from mapreduce.shuffle to mapreduce_shuffle.
- yarn.nodemanager.aux-services.mapreduce.shuffle.class has been renamed to yarn.nodemanager.aux-services.mapreduce_shuffle.class
- yarn.resourcemanager.resourcemanager.connect.max.wait.secs has been renamed to yarn.resourcemanager.connect.max-wait.secs
- yarn.resourcemanager.resourcemanager.connect.retry_interval.secs has been renamed to yarn.resourcemanager.connect.retry-interval.secs
- yarn.resourcemanager.am. max-retries is renamed to yarn.resourcemanager.am.max-attempts
- The YARN_HOME environment variable used in the yarn.application.classpathhas been renamed to HADOOP_YARN_HOME. Make sure you include $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/* in the classpath. For more information, see Step 2: Configure YARN daemons in the instructions for deploying CDH with YARN in the CDH 5 Installation Guide.
- A CDH 4 client cannot be used against a CDH 5 cluster and
vice-versa. Note that YARN in CDH 4 is experimental, and suffers from the following
major incompatibilities.
- Almost all of the proto files have been renamed.
- Several user-facing APIs have been modified as part of an API stabilization effort.
<< Apache Flume Incompatible Changes | Apache HBase Incompatible Changes >> | |