Migrate the HDP Configurations
Configurations and configuration file names have changed between HDP 1.3.2 (Hadoop 1.2.x) and HDP 2.1 (Hadoop 2.4). To upgrade to HDP 2.x, back up your current configuration files, download the new HDP 2.1 files, and compare. The following tables provide mapping information to make the comparison between releases easier.
To migrate the HDP Configurations
Back up the following HDP 1.x configurations on all nodes in your clusters.
/etc/hadoop/conf
/etc/hbase/conf
/etc/hcatalog/conf (Note: With HDP 2.1, /etc/hcatalog/conf is divided into /etc/hive- hcatalog/conf and /etc/hive-webhcat.You cannot use /etc/ hcatalog/conf in HDP 2.1.)
/etc/hive/conf
/etc/pig/conf
/etc/sqoop/conf
/etc/flume/conf
/etc/mahout/conf
/etc/oozie/conf
Edit /etc/hadoop/conf/core-site.xml and set hadoop.rpc.protection from none to authentication.
Note Hadoop lets cluster administrators control the quality of protection in the configuration parameter “hadoop.rpc.protection” in core-site.xml. It is an optional parameter in HDP 2.2. If not present, the default QOP setting of “auth” is used, which implies “authentication only”.
Valid values for this parameter are: “authentication” : Corresponds to “auth” “integrity” : Corresponds to “auth-int” “privacy” : Corresponds to “auth-conf”
The default setting is authentication-only because integrity checks and encryption are a performance cost.
Copy your /etc/hcatalog/conf configurations to /etc/hive-hcatalog/conf and /etc/hive-webhcat as appropriate.
Copy log4j.properties from the hadoop config directory of the companion files to /etc/hadoop/conf. The file should have owners and permissions similar to other files in /etc/hadoop/conf.
Download the your HDP 2.x companion files (see "Download the Companion Files" in Chapter 1 of the Manual Install Guide) and migrate your HDP 1.x configuration.
Copy these configurations to all nodes in your clusters.
/etc/hadoop/conf
/etc/hbase/conf
/etc/hcatalog/conf
/etc/hive/conf
/etc/pig/conf
/etc/sqoop/conf
/etc/flume/conf
/etc/mahout/conf
/etc/oozie/conf
/etc/zookeeper/conf
Note Upgrading the repo using
yum
orzypper
resets all configurations. Prepare to replace these configuration directories each time you perform a yum or zypper rmgrade.Review the following HDP 1.3.2 Hadoop Core configurations and the new configurations or locations in HDP 2.x.
Table 3.4. HDP 1.3.2 Hadoop Core Site (core-site.xml)
HDP 1.3.2 config HDP 1.3.2 config file HDP 2.2 config HDP 2.2 config file fs.default.name
core-site.xml
fs.defaultFS
core-site.xml
fs.checkpoint.dir
core-site.xml
dfs.namenode. checkpoint.dir
hdfs-site.xml
fs.checkpoint.edits. dir
core-site.xml
dfs.namenode. checkpoint.edits.dir
hdfs-site.xml
fs.checkpoint.period
core-site.xml
dfs.namenode. checkpoint.period
hdfs-site.xml
io.bytes.per. checksum
core-site.xml
dfs.bytes-per-checksum
hdfs-site.xml
dfs.df.interval
hdfs-site
fs.df.interval
core-site.xml
hadoop.native.lib
core-site.xml
io.native.lib. available
core-site.xml
hadoop.configured. node.mapping
--
net.topology. configured.node. mapping
core-site.xml
topology.node. switch.mapping.impl
core-site-xml
net.topology.node. switch.mapping.impl
core-site.xml
topology-script. file.name
core-site.xml
net.topology.script. file.name
core-site.xml
topology.script. number.args
core-site.xml
net.topology.script. number.args
core-site.xml
Note The
hadoop.rpc.protection
configuration property in core- site.xml needs to specify authentication, integrity and/or privacy. No value defaults to authentication, but an invalid value such as "none" causes an error.Review the following 1.3.2 HDFS site configurations and their new configurations and files in HDP 2.x.
Table 3.5. HDP 1.3.2 Hadoop Core Site (hdfs-site.xml)
HDP 1.3.2 config HDP 1.3.2 config file HDP 2.2 config HDP 2.2 config file dfs.block.size
hdfs-site.xml
dfs.blocksize
hdfs-site.xml
dfs.write.packet.size
hdfs-site.xml
dfs.client-write-packet-size
hdfs-site.xml
dfs.https.client. keystore.resource
hdfs-site.xml
dfs.client.https. keystore.resource
hdfs-site.xml
dfs.https.need. client.auth
hdfs-site.xml
dfs.client.https. need-auth
hdfs-site.xml
dfs.read.prefetch. size
hdfs-site.xml
dfs.bytes-per-checksum
hdfs-site.xml
dfs.socket.timeout
hdfs-site.xml
dfs.client.socket-timeout
hdfs-site.xml
dfs.balance. bandwidthPerSec
hdfs-site.xml
dfs.datanode.balance. bandwidthPerSec
hdfs-site.xml
dfs.data.dir
hdfs-site.xml
dfs.datanode.data.dir
hdfs-site.xml
dfs.datanode.max. xcievers
hdfs-site.xml
dfs.datanode.max. transfer.threads
hdfs-site.xml
session.id
hdfs-site.xml
dfs.metrics.session-id
hdfs-site.xml
dfs.access.time. precision
hdfs-site.xml
dfs.namenode. accesstime.precision
hdfs-site.xml
dfs.backup.address
hdfs-site.xml
dfs.namenode.backup. address
hdfs-site.xml
dfs.backup.http. address
hdfs-site.xml
dfs.namenode.backup. http-address
hdfs-site.xml
fs.checkpoint.dir
hdfs-site.xml
dfs.namenode. checkpoint.dir
hdfs-site.xml
fs.checkpoint. edits.dir
hdfs-site.xml
dfs.namenode. checkpoint.edits.dir
hdfs-site.xml
fs.checkpoint.period
hdfs-site.xml
dfs.namenode. checkpoint.period
hdfs-site.xml
dfs.name.edits.dir
hdfs-site.xml
dfs.namenode. edits.dir
hdfs-site.xml
heartbeat.recheck. interval
hdfs-site.xml
dfs.namenode. heartbeat.recheck-interval
hdfs-site.xml
dfs.http.address
hdfs-site.xml
dfs.namenode.http-address
hdfs-site.xml
dfs.https.address
hdfs-site.xml
dfs.namenode.https-address
hdfs-site.xml
dfs.max.objects
hdfs-site.xml
dfs.namenode.max. objects
hdfs-site.xml
dfs.name.dir
hdfs-site.xml
dfs.namenode. name.dir
hdfs-site.xml
dfs.name.dir. restore
hdfs-site.xml
dfs.namenode.name. dir.restore
hdfs-site.xml
dfs.replication. considerLoad
hdfs-site.xml
dfs.namenode. replication. considerLoad
hdfs-site.xml
dfs.replication. interval
hdfs-site.xml
dfs.namenode. replication.interval
hdfs-site.xml
dfs.max-repl-streams
hdfs-site.xml
dfs.namenode. replication. max-streams
hdfs-site.xml
dfs.replication.min
hdfs-site.xml
dfs.namenode. replication. min
hdfs-site.xml
dfs.replication. pending.timeout.sec
hdfs-site.xml
dfs.namenode. replication. pending.timeout-sec
hdfs-site.xml
dfs.safemode. extension
hdfs-site.xml
dfs.namenode. safemode. extension
hdfs-site.xml
dfs.safemode. threshold.pct
hdfs-site.xml dfs.namenode. secondary. threshold-pct
dfs.secondary. http.address
hdfs-site.xml
dfs.namenode. secondary.http-address
hdfs-site.xml
dfs.permissions
hdfs-site.xml
dfs.permissions. enabled
hdfs-site.xml
dfs.permissions. supergroup
hdfs-site.xml
dfs.permissions. superusergroup
hdfs-site.xml
dfs.df.interval
hdfs-site.xml
fs.df.interval
core-site.xml
dfs.umaskmode
hdfs-site.xml
fs.permissions. umask-mode
hdfs-site.xml
Review the following HDP 1.3.2 MapReduce Configs and their new HDP 2.x mappings.
Table 3.6. HDP 1.3.2 Configs now in Capacity Scheduler for HDP 2.x (mapred-site.xml)
HDP 1.3.2 config HDP 1.3.2 config file HDP 2.2 config HDP 2.2 config file mapred.map.child. java.opts
mapred-site.xml
mapreduce.map. java.opts
mapred-site.xml
mapred.job.map. memory.mb
mapred-site.xml
mapred.job.map. memory.mb
mapred-site.xml
mapred.reduce.child. java.opts
mapred-site.xml
mapreduce.reduce. java.opts
mapred-site.xml
mapreduce.job.reduce. memory.mb
mapred-site.xml
mapreduce.reduce. memory.mb
mapred-site.xml
security.task. umbilical. protocol.acl
mapred-site.xml
security.job.task. protocol.acl
mapred-site.xml
Review the following HDP 1.3.2 Configs and their new HDP 2.x Capacity Scheduler mappings.
Table 3.7. HDP 1.3.2 Configs now in capacity scheduler for HDP 2.x (capacity-scheduler.xml)
HDP 1.3.2 config HDP 1.3.2 config file HDP 2.2 config HDP 2.2 config file mapred.queue.names
mapred-site.xml
yarn.scheduler. capacity.root.queues
capacity-scheduler.xml
mapred.queue.default. acl-submit.job
mapred-queue-acls.xml
yarn.scheduler. capacity.root. default.acl_ submit_jobs
capacity-scheduler.xml
mapred.queue.default. acl.administer-jobs
mapred-queue-acls.xml
yarn.scheduler. capacity.root.default. acl_administer_jobs
capacity-scheduler.xml
mapred.capacity-scheduler. queue.default. capacity
capacity-scheduler.xml
yarn-scheduler.capacity. root.default. capacity
capacity-scheduler.xml
mapred.capacity-scheduler. queue.default.user-limit-factor
capacity-scheduler.xml
yarn.scheduler. capacity.root.default. user-limit-factor
capacity-scheduler.xml
mapred.capacity-scheduler.queue. default.maximum-capacity
capacity-scheduler.xml
yarn.scheduler. capacity.root.default. maximum-capacity
capacity-scheduler.xml
mapred.queue. default.state
capacity-scheduler.xml
yarn.scheduler. capacity.root. default.state
capacity-scheduler.xml
Compare the following HDP 1.3.2 configs in hadoop-env.sh with the new configs in HDP 2.x.
Paths have changed in HDP 2.2 to /usr/hdp/current. You must remove lines such as:
export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-amd64-64
Table 3.8. HDP 1.3.2 Configs and HDP 2.x for hadoop-env.sh
HDP 1.3.2 config HDP 2.2 config Description JAVA_HOME
JAVA_HOME
Java implementation to use
HADOOP_HOME_WARN_SUPPRESS
HADOOP_HOME_WARN_SUPPRESS
--
HADOOP_CONF_DIR
HADOOP_CONF_DIR
Hadoop configuration directory
not in hadoop-env.sh.
HADOOP_HOME
-- not in hadoop-env.sh.
HADOOP_LIBEXEC_DIR
--
HADOOP_NAMENODE_INIT_ HEAPSIZE
HADOOP_NAMENODE_INIT_ HEAPSIZE
--
HADOOP_OPTS
HADOOP_OPTS
Extra Java runtime options; empty by default
HADOOP_NAMENODE_OPTS
HADOOP_NAMENODE_OPTS
Command-specific options appended to HADOOP-OPTS
HADOOP_JOBTRACKER_OPTS
not in hadoop-env.sh.
Command-specific options appended to HADOOP-OPTS
HADOOP_TASKTRACKER_OPTS
not in hadoop-env.sh.
Command-specific options appended to HADOOP-OPTS
HADOOP_DATANODE_OPTS
HADOOP_DATANODE_OPTS
Command-specific options appended to HADOOP-OPTS
HADOOP_BALANCER_OPTS
HADOOP_BALANCER_OPTS
Command-specific options appended to HADOOP-OPTS
HADOOP_SECONDARYNAMENODE_ OPTS
HADOOP_SECONDARYNAMENODE_ OPTS
Command-specific options appended to HADOOP-OPTS
HADOOP_CLIENT_OPTS
HADOOP_CLIENT_OPTS
Applies to multiple commands (fs, dfs, fsck, distcp, etc.)
HADOOP_SECURE_DN_USER
not in hadoop-env.sh.
Secure datanodes, user to run the datanode as
HADOOP_SSH_OPTS
HADOOP_SSH_OPTS
Extra ssh options.
HADOOP_LOG_DIR
HADOOP_LOG_DIR
Directory where log files are stored in the secure data environment.
HADOOP_SECURE_DN_LOG_DIR
HADOOP_SECURE_DN_LOG_DIR
Directory where pid files are stored; /tmp by default.
HADOOP_PID_DIR
HADOOP_PID_DIR
Directory where pid files are stored, /tmp by default.
HADOOP_SECURE_DN_PID_DIR
HADOOP_SECURE_DN_PID_DIR
Directory where pid files are stored, /tmp by default.
HADOOP_IDENT_STRING
HADOOP_IDENT_STRING
String representing this instance of hadoop. $USER by default
not in hadoop-env.sh.
HADOOP_MAPRED_LOG_DIR
--
not in hadoop-env.sh.
HADOOP_MAPRED_PID_DIR
--
not in hadoop-env.sh.
JAVA_LIBRARY_PATH
--
not in hadoop-env.sh.
JSVC_HOME
For starting the datanode on a secure cluster
Note Some of the configuration settings refer to the variable HADOOP_HOME. The value of HADOOP_HOME is automatically inferred from the location of the startup scripts. HADOOP_HOME is the parent directory of the bin directory that holds the Hadoop scripts. In many instances this is $HADOOP_INSTALL/hadoop.
Add the following properties to the yarn-site.xml file:
<property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity. CapacityScheduler</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>$resourcemanager.full.hostname:8025</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>$resourcemanager.full.hostname:8030</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.resourcemanager.address</name> <value>$resourcemanager.full.hostname:8050</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>$resourcemanager.full.hostname:8141</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/grid/hadoop/yarn/local,/grid1/hadoop/yarn/local</value> <description>Comma-separated list of paths. Use the list of directories from $YARN_LOCAL_DIR.For example, /grid/hadoop/yarn/local,/grid1/hadoop/yarn/local.</description> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/grid/hadoop/yarn/log</value> <description>Use the list of directories from $YARN_LOCAL_LOG_DIR.For example, /grid/hadoop/yarn/log,/grid1/hadoop/yarn/log,/grid2/hadoop/yarn/log</description> </property> <property> <name>yarn.log.server.url</name> <value>http://$jobhistoryserver.full.hostname:19888/jobhistory/logs/</ value> <description>URL for job history server</description> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>$resourcemanager.full.hostname:8088</value> <description>URL for job history server</description> </property> <property> <name>yarn.nodemanager.admin-env</name> <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value> <description>Restrict the number of memory arenas to prevent excessive VMEM use by the glib arena allocator. For example, MALLOC_ARENA_MAX=4</description> </property>
Add the following properties to the yarn-site.xml file:
<property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity. CapacityScheduler</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>$resourcemanager.full.hostname:8025</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>$resourcemanager.full.hostname:8030</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.resourcemanager.address</name> <value>$resourcemanager.full.hostname:8050 </value><description>Enter your ResourceManager hostname. </description></property> <property> <name>yarn.resourcemanager.admin.address</name> <value>$resourcemanager.full.hostname:8141</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/grid/hadoop/yarn/local,/grid1/hadoop/yarn/local</value> <description>Comma separated list of paths. Use the list of directories from $YARN_LOCAL_DIR. For example, /grid/hadoop/yarn/local,/grid1/hadoop/yarn/local. </description> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/grid/hadoop/yarn/log</value> <description>Use the list of directories from $YARN_LOCAL_LOG_DIR. For example, /grid/hadoop/yarn/log,/grid1/hadoop/yarn/log,/ grid2/hadoop/yarn/log </description> </property> <property> <name>yarn.log.server.url</name> <value>http://$jobhistoryserver.full.hostname:19888/jobhistory/logs/</ value> <description>URL for job history server</description> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>$resourcemanager.full.hostname:8088</value> <description>URL for job history server</description> </property> <property> <name>yarn.nodemanager.admin-env</name> <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value> <description>Restrict the number of memory arenas to prevent excessive VMEM use by the glib arena allocator. For example, MALLOC_ARENA_MAX=4</description> </property>
Add the following properties to the yarn-site.xml file:
<property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity. CapacityScheduler</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>$resourcemanager.full.hostname:8025</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>$resourcemanager.full.hostname:8030</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.resourcemanager.address</name> <value>$resourcemanager.full.hostname:8050</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>$resourcemanager.full.hostname:8141</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/grid/hadoop/yarn/local,/grid1/hadoop/yarn/local</value> <description>Comma separated list of paths. Use the list of directories from $YARN_LOCAL_DIR. For example, /grid/hadoop/yarn/local,/grid1/hadoop/yarn/local. </description> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/grid/hadoop/yarn/log</value> <description>Use the list of directories from $YARN_LOCAL_LOG_DIR. For example, /grid/hadoop/yarn/log, /grid1/hadoop/yarn/log,/grid2/hadoop/yarn/log </description> </property> <property> <name>yarn.log.server.url</name> <value>http://$jobhistoryserver.full.hostname:19888/jobhistory/logs/ </value> <description>URL for job history server</description> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>$resourcemanager.full.hostname:8088</value> <description>URL for job history server</description> </property> <property> <name>yarn.nodemanager.admin-env</name> <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value> <description>Restrict the number of memory arenas to prevent excessive VMEM use by the glib arena allocator. For example, MALLOC_ARENA_MAX=4</description> </property>
Adding the following properties to the mapred-site.xml file:
<property> <name>mapreduce.jobhistory.address</name> <value>$jobhistoryserver.full.hostname:10020</value> <description>Enter your JobHistoryServer hostname.</description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>$jobhistoryserver.full.hostname:19888</value> <description>Enter your JobHistoryServer hostname.</description> </property> <property> <name>mapreduce.shuffle.port</name> <value>13562</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
For a secure cluster, add the following properties to mapred-site.xml:
<property> <name>mapreduce.jobhistory.principal</name> <value>jhs/_PRINCIPAL@$REALM.ACME.COM</value> <description>Kerberos principal name for the MapReduce JobHistory Server. </description> </property> </property> <name>mapreduce.jobhistory.keytab</name> <value>/etc/security/keytabs/jhs.service.keytab</value> <description>Kerberos keytab file for the MapReduce JobHistory Server.</description> </property>
For a secure cluster, you must also update hadoop.security.auth_to_local in core- site.xml to include a rule regarding the mapreduce.jobhistory.principal value you set in the previous step:
RULE:[2:$1@$0](PRINCIPAL@$REALM.ACME.COM )s/.*/mapred/
where PRINCIPAL and REALM are the kerberos principal and realm you specified in mapreduce.jobhistory.principal.
Delete any remaining HDP1 properties in the mapred-site.xml file.
Replace the default memory configuration settings in yarn-site.xml and mapred-site.xml with the YARN and MapReduce memory configuration settings you calculated previously.