Configurations and configuration file names have changed between HDP 1.3.2 (Hadoop 1.2.x) and HDP 2.1 (Hadoop 2.4). To successfully upgrade to HDP 2.x, back up your current configuration files, download the new HDP 2.1 files, and compare. The following tables provide mapping information to make the comparison between releases easier.
To migrate the HDP Configurations
Back up the following HDP 1.x configurations on all nodes in your clusters.
/etc/hadoop/conf
/etc/hbase/conf
/etc/hcatalog/conf
Note With HDP 2.1,
/etc/hcatalog/conf
is divided into/etc/hive-hcatalog/conf
and/etc/hive-webhcat
.You cannot use/etc/hcatalog/conf
in HDP 2.1./etc/hive/conf
/etc/pig/conf
/etc/sqoop/conf
/etc/flume/conf
/etc/mahout/conf
/etc/oozie/conf
/etc/zookeeper/conf
Edit
/etc/hadoop/conf/core-site.xml
and sethadoop.rpc.protection
from none to authentication.Copy your
/etc/hcatalog/conf
configurations to/etc/hive-hcatalog/conf
and/etc/hive-webhcat
as appropriate.Download the your HDP 2.x companion files from Download Companion Files and migrate your HDP 1.x configuration.
Copy
log4j.properties
from the hadoop config directory of the companion files to/etc/hadoop/conf
. The file should have owners and permissions similar to other files in/etc/hadoop/conf
.Copy these configurations to all nodes in your clusters.
/etc/hadoop/conf
/etc/hbase/conf
/etc/hcatalog/conf
/etc/hive/conf
/etc/pig/conf
/etc/sqoop/conf
/etc/flume/conf
/etc/mahout/conf
/etc/oozie/conf
/etc/zookeeper/conf
Note Upgrading the repo using yum or zypper resets all configurations. Prepare to replace these configuration directories each time you perform a yum or zypper upgrade.
Review the following HDP 1.3.2 Hadoop Core configurations and the new configurations or locations in HDP 2.x
Table 24.3. HDP 1.3.2 Hadoop Core Site (core-site.xml)
HDP 1.3.2 config HDP 1.3.2 config file HDP 2.1 config HDP 2.1 config file fs.default.name
core-site.xml fs.defaultFS core-site.xml fs.checkpoint.dir core-site.xml dfs.namenode.checkpoint.dir hdfs-site.xml fs.checkpoint.edits.dir core-site.xml dfs.namenode.checkpoint.edits.dir hdfs-site.xml fs.checkpoint.period core-site.xml dfs.namenode.checkpoint.period hdfs-site.xml io.bytes.per.checksum core-site.xml dfs.bytes-per-checksum hdfs-site.xml dfs.df.interval hdfs-site fs.df.interval core-site.xml hadoop.native.lib core-site.xml io.native.lib.available core-site.xml hadoop.configured.node.mapping net.topology.configured.node.mapping core-site.xml topology.node.switch.mapping.impl core-site.xml net.topology.node.switch.mapping.impl core-site.xml topology.script.file.name core-site.xml net.topology.script.file.name core-site.xml topology.script.number.args core-site.xml net.topology.script.number.args core-site.xml
Note The
hadoop.rpc.protection
configuration property incore-site.xml
needs to specify authentication, integrity and/or privacy. No value defaults to authentication, but an invalid value such as "none" causes an error.Review the following 1.3.2 HDFS site configations and their new configurations and files in HDP 2.x.
Table 24.4. HDP 1.3.2 Hadoop Core Site (hdfs-site.xml)
HDP 1.3.2 config HDP 1.3.2 config file HDP 2.1 config HDP 2.1 config file dfs.block.size
hdfs-site.xml dfs.blocksize hdfs-site.xml dfs.write.packet.size hdfs-site.xml dfs.client-write-packet-size hdfs-site.xml dfs.https.client.keystore.resource hdfs-site.xml dfs.client.https.keystore.resource hdfs-site.xml dfs.https.need.client.auth hdfs-site.xml dfs.client.https.need-auth hdfs-site.xml dfs.read.prefetch.size hdfs-site.xml dfs.bytes-per-checksum hdfs-site.xml dfs.socket.timeout hdfs-site.xml dfs.client.socket-timeout hdfs-site.xml dfs.balance.bandwidthPerSec hdfs-site.xml dfs.datanode.balance.bandwidthPerSec hdfs-site.xml dfs.data.dir hdfs-site.xml dfs.datanode.data.dir hdfs-site.xml dfs.datanode.max.xcievers hdfs-site.xml dfs.datanode.max.transfer.threads hdfs-site.xml session.id hdfs-site.xml dfs.metrics.session-id hdfs-site.xml dfs.access.time.precision hdfs-site.xml dfs.namenode.accesstime.precision hdfs-site.xml dfs.backup.address hdfs-site.xml dfs.namenode.backup.address hdfs-site.xml dfs.backup.http.address hdfs-site.xml dfs.namenode.backup.http-address hdfs-site.xml fs.checkpoint.dir hdfs-site.xml dfs.namenode.checkpoint.dir hdfs-site.xml fs.checkpoint.edits.dir hdfs-site.xml dfs.namenode.checkpoint.edits.dir hdfs-site.xml fs.checkpoint.period hdfs-site.xml dfs.namenode.checkpoint.period hdfs-site.xml dfs.name.edits.dir hdfs-site.xml dfs.namenode.edits.dir hdfs-site.xml heartbeat.recheck.interval hdfs-site.xml dfs.namenode.heartbeat.recheck-interval hdfs-site.xml dfs.http.address hdfs-site.xml dfs.namenode.http-address hdfs-site.xml dfs.https.address hdfs-site.xml dfs.namenode.https-address hdfs-site.xml dfs.max.objects hdfs-site.xml dfs.namenode.max.objects hdfs-site.xml dfs.name.dir hdfs-site.xml dfs.namenode.name.dir hdfs-site.xml dfs.name.dir.restore hdfs-site.xml dfs.namenode.name.dir.restore hdfs-site.xml dfs.replication.considerLoad hdfs-site.xml dfs.namenode.replication.considerLoad hdfs-site.xml dfs.replication.interval hdfs-site.xml dfs.namenode.replication.interval hdfs-site.xml dfs.max-repl-streams hdfs-site.xml dfs.namenode.replication.max-streams hdfs-site.xml dfs.replication.min hdfs-site.xml dfs.namenode.replication.min hdfs-site.xml dfs.replication.pending.timeout.sec hdfs-site.xml dfs.namenode.replication.pending.timeout-sec hdfs-site.xml dfs.safemode.extension hdfs-site.xml dfs.namenode.safemode.extension hdfs-site.xml dfs.safemode.threshold.pct hdfs-site.xml dfs.namenode.safemode.threshold-pct hdfs-site.xml dfs.secondary.http.address hdfs-site.xml dfs.namenode.secondary.http-address hdfs-site.xml dfs.permissions hdfs-site.xml dfs.permissions.enabled hdfs-site.xml dfs.permissions.supergroup hdfs-site.xml dfs.permissions.superusergroup hdfs-site.xml dfs.df.interval hdfs-site.xml fs.df.interval core-site.xml dfs.umaskmode hdfs-site.xml fs.permissions.umask-mode hdfs-site.xml Review the following HDP 1.3.2 MapReduce Configs and their new HDP 2.x Mappings
Table 24.5. HDP 1.3.2 Configs now in Capacity Scheduler for HDP 2.x (mapred-site.xml)
HDP 1.3.2 config HDP 1.3.2 config file HDP 2.1 config HDP 2.1 config file mapred.map.child.java.opts
mapred-site.xml mapreduce.map.java.opts mapred-site.xml mapred.job.map.memory.mb mapred-site.xml mapreduce.map.memory.mb mapred-site.xml mapred.reduce.child.java.opts mapred-site.xml mapreduce.reduce.java.opts mapred-site.xml mapred.job.reduce.memory.mb mapred-site.xml mapreduce.reduce.memory.mb mapred-site.xml security.task.umbilical.protocol.acl mapred-site.xml security.job.task.protocol.acl mapred-site.xml
Review the following HDP 1.3.2 Configs and their new HDP 2.x Capacity Scheduler mappings.
Table 24.6. HDP 1.3.2 Configs now in Capacity Scheduler for HDP 2.x (capacity-scheduler.xml)
HDP 1.3.2 config HDP 1.3.2 config file HDP 2.1 config HDP 2.1 config file mapred.queue.names
mapred-site.xml yarn.scheduler.capacity.root.queues capacity-scheduler.xml mapred.queue.default.acl-submit-job mapred-queue-acls.xml yarn.scheduler.capacity.root.default.acl_submit_jobs capacity-scheduler.xml mapred.queue.default.acl-administer-jobs mapred-queue-acls.xml yarn.scheduler.capacity.root.default.acl_administer_jobs capacity-scheduler.xml mapred.capacity-scheduler.queue.default.capacity capacity-scheduler.xml yarn.scheduler.capacity.root.default.capacity capacity-scheduler.xml mapred.capacity-scheduler.queue.default.user-limit-factor capacity-scheduler.xml yarn.scheduler.capacity.root.default.user-limit-factor capacity-scheduler.xml mapred.capacity-scheduler.queue.default.maximum-capacity capacity-scheduler.xml yarn.scheduler.capacity.root.default.maximum-capacity capacity-scheduler.xml mapred.queue.default.state capacity-scheduler.xml yarn.scheduler.capacity.root.default.state capacity-scheduler.xml Compare the following HDP 1.3.2 configs in hadoop-env.sh with the new configs in HDP 2.x
Table 24.7. HDP 1.3.2 Configs and HDP 2.x for hadoop-env.sh
HDP 1.3.2 config HDP 2.1 config Description JAVA_HOME JAVA_HOME Java implementation to use HADOOP_HOME_WARN_SUPPRESS HADOOP_HOME_WARN_SUPPRESS HADOOP_CONF_DIR HADOOP_CONF_DIR Hadoop Configuration Directory Not in hadoop-env.sh. HADOOP_HOME Not in hadoop-env.sh. HADOOP_LIBEXEC_DIR HADOOP_NAMENODE_INIT_HEAPSIZE HADOOP_NAMENODE_INIT_HEAPSIZE HADOOP_OPTS HADOOP_OPTS Extra Java runtime options. Empty by default. HADOOP_NAMENODE_OPTS HADOOP_NAMENODE_OPTS Command specific options appended to HADOOP_OPTS. HADOOP_JOBTRACKER_OPTS Not in hadoop-env.sh. Command specific options appended to HADOOP_OPTS. HADOOP_TASKTRACKER_OPTS Not in hadoop-env.sh. Command specific options appended to HADOOP_OPTS. HADOOP_DATANODE_OPTS HADOOP_DATANODE_OPTS Command specific options appended to HADOOP_OPTS. HADOOP_BALANCER_OPTS HADOOP_BALANCER_OPTS Command specific options appended to HADOOP_OPTS. HADOOP_SECONDARYNAMENODE_OPTS HADOOP_SECONDARYNAMENODE_OPTS Command specific options appended to HADOOP_OPTS. HADOOP_CLIENT_OPTS HADOOP_CLIENT_OPTS Applies to multiple commands (fs, dfs, fsck, distcp etc). HADOOP_SECURE_DN_USER Not in hadoop-env.sh. Secure datanodes, user to run the datanode as HADOOP_SSH_OPTS HADOOP_SSH_OPTS Extra ssh options. HADOOP_LOG_DIR HADOOP_LOG_DIR Where log files are stored. $HADOOP_HOME/logs by default. HADOOP_SECURE_DN_LOG_DIR HADOOP_SECURE_DN_LOG_DIR Where log files are stored in the secure data environment. HADOOP_PID_DIR HADOOP_PID_DIR Directory where pid files are stored, /tmp by default. HADOOP_SECURE_DN_PID_DIR HADOOP_SECURE_DN_PID_DIR Directory where pid files are stored, /tmp by default. HADOOP_IDENT_STRING HADOOP_IDENT_STRING String representing this instance of hadoop. $USER by default. Not in hadoop-env.sh. HADOOP_MAPRED_LOG_DIR Not in hadoop-env.sh. HADOOP_MAPRED_PID_DIR Not in hadoop-env.sh. JAVA_LIBRARY_PATH Not in hadoop-env.sh. JSVC_HOME For starting the datanode on secure cluster. Note Some of the configuration settings refer to the variable HADOOP_HOME . The value of HADOOP_HOME is automatically inferred from the location of the startup scripts. HADOOP_HOME is the parent directory of the
bin
directory that holds the Hadoop scripts. In many instances this is$HADOOP_INSTALL
/hadoop.Add the following properties to the
yarn-site.xml
file:<property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value> </property>
<property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>$resourcemanager.full.hostname:8025</value> <description>Enter your ResourceManager hostname.</description> </property>
<property> <name>yarn.resourcemanager.scheduler.address</name> <value>$resourcemanager.full.hostname:8030</value> <description>Enter your ResourceManager hostname.</description> </property>
<property> <name>yarn.resourcemanager.address</name> <value>$resourcemanager.full.hostname:8050</value> <description>Enter your ResourceManager hostname.</description> </property>
<property> <name>yarn.resourcemanager.admin.address</name> <value>$resourcemanager.full.hostname:8141</value> <description>Enter your ResourceManager hostname.</description> </property>
<property> <name>yarn.nodemanager.local-dirs</name> <value>/grid/hadoop/yarn/local,/grid1/hadoop/yarn/local</value> <description>Comma separated list of paths. Use the list of directories from $YARN_LOCAL_DIR. For example, /grid/hadoop/yarn/local,/grid1/hadoop/yarn/local.</description> </property>
<property> <name>yarn.nodemanager.log-dirs</name> <value>/grid/hadoop/yarn/log</value> <description>Use the list of directories from $YARN_LOCAL_LOG_DIR. For example, /grid/hadoop/yarn/log,/grid1/hadoop/yarn/log,/grid2/hadoop/yarn/log</description> </property>
<property> <name>yarn.log.server.url</name> <value>http://$jobhistoryserver.full.hostname:19888/jobhistory/logs/</value> <description>URL for job history server</description> </property>
<property> <name>yarn.resourcemanager.webapp.address</name> <value>$resourcemanager.full.hostname:8088</value> <description>URL for job history server</description> </property>
<property> <name>yarn.nodemanager.admin-env</name> <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value> <description>Restrict the number of memory arenas to prevent excessive VMEM use by the glib arena allocator. For example, MALLOC_ARENA_MAX=4</description> </property>
Add the following properties to the
mapred-site.xml
file:<property> <name>mapreduce.jobhistory.address</name> <value>$jobhistoryserver.full.hostname:10020</value> <description>Enter your JobHistoryServer hostname.</description> </property>
<property> <name>mapreduce.jobhistory.webapp.address</name> <value>$jobhistoryserver.full.hostname:19888</value> <description>Enter your JobHistoryServer hostname.</description> </property>
<property> <name>mapreduce.shuffle.port</name> <value>13562</value> </property>
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
For a secure cluster, add the following properties to
mapred-site.xml
:<property> <name>mapreduce.jobhistory.principal</name> <value>jhs/_PRINCIPAL@$REALM.ACME.COM</value> <description>Kerberos principal name for the MapReduce JobHistory Server.</description> </property> <property> <name>mapreduce.jobhistory.keytab</name> <value>/etc/security/keytabs/jhs.service.keytab</value> <description>Kerberos keytab file for the MapReduce JobHistory Server.</description> </property>
For a secure cluster, you must also update hadoop.security.auth_to_local in
core-site.xml
to include a rule regarding the mapreduce.jobhistory.principal value you set in the previous step.RULE:[2:$1@$0](PRINCIPAL@$REALM.ACME.COM)s/.*/mapred/
where PRINCIPAL and REALM are the kerberos principal and realm you specified in mapreduce.jobhistory.principal.
Delete any remaining HDP1 properties in the
mapred-site.xml
file.Replace the default memory configuration settings in yarn-site.xml and mapred-site.xml with the YARN and MapReduce memory configuration settings you calculated previously.