3. Migrate the HDP Configurations - Hortonworks Data Platform

Back up the following HDP 1.x configurations on all nodes in your clusters.

/etc/hadoop/conf
/etc/hbase/conf

/etc/hcatalog/conf

	Note
	With HDP 2.1, `/etc/hcatalog/conf` is divided into `/etc/hive-hcatalog/conf` and `/etc/hive-webhcat`.You cannot use `/etc/hcatalog/conf` in HDP 2.1.

/etc/hive/conf
/etc/pig/conf
/etc/sqoop/conf
/etc/flume/conf
/etc/mahout/conf
/etc/oozie/conf
/etc/zookeeper/conf

Edit /etc/hadoop/conf/core-site.xml and set hadoop.rpc.protection from none to authentication.

Copy your /etc/hcatalog/conf configurations to /etc/hive-hcatalog/conf and /etc/hive-webhcat as appropriate.

Download the your HDP 2.x companion files from Download Companion Files and migrate your HDP 1.x configuration.

Copy log4j.properties from the hadoop config directory of the companion files to /etc/hadoop/conf. The file should have owners and permissions similar to other files in /etc/hadoop/conf.

Copy these configurations to all nodes in your clusters.

/etc/hadoop/conf
/etc/hbase/conf
/etc/hcatalog/conf
/etc/hive/conf
/etc/pig/conf
/etc/sqoop/conf
/etc/flume/conf
/etc/mahout/conf
/etc/oozie/conf
/etc/zookeeper/conf

	Note
	Upgrading the repo using yum or zypper resets all configurations. Prepare to replace these configuration directories each time you perform a yum or zypper upgrade.

Review the following HDP 1.3.2 Hadoop Core configurations and the new configurations or locations in HDP 2.x

Table 24.3. HDP 1.3.2 Hadoop Core Site (core-site.xml)

HDP 1.3.2 config	HDP 1.3.2 config file	HDP 2.1 config	HDP 2.1 config file
fs.default.name	core-site.xml	fs.defaultFS	core-site.xml
fs.checkpoint.dir	core-site.xml	dfs.namenode.checkpoint.dir	hdfs-site.xml
fs.checkpoint.edits.dir	core-site.xml	dfs.namenode.checkpoint.edits.dir	hdfs-site.xml
fs.checkpoint.period	core-site.xml	dfs.namenode.checkpoint.period	hdfs-site.xml
io.bytes.per.checksum	core-site.xml	dfs.bytes-per-checksum	hdfs-site.xml
io.bytes.per.checksum	core-site.xml	dfs.bytes-per-checksum	hdfs-site.xml
dfs.df.interval	hdfs-site	fs.df.interval	core-site.xml
hadoop.native.lib	core-site.xml	io.native.lib.available	core-site.xml
hadoop.configured.node.mapping		net.topology.configured.node.mapping	core-site.xml
topology.node.switch.mapping.impl	core-site.xml	net.topology.node.switch.mapping.impl	core-site.xml
topology.script.file.name	core-site.xml	net.topology.script.file.name	core-site.xml
topology.script.number.args	core-site.xml	net.topology.script.number.args	core-site.xml

	Note
	The hadoop.rpc.protection config needs to specify authentication, integrity and/or privacy. No value defaults to authentication, but an invalid value such as "none" causes an error.

Review the following 1.3.2 HDFS site configations and their new configurations and files in HDP 2.x.

Table 24.4. HDP 1.3.2 Hadoop Core Site (hdfs-site.xml)

HDP 1.3.2 config	HDP 1.3.2 config file	HDP 2.1 config	HDP 2.1 config file
dfs.block.size	hdfs-site.xml	dfs.blocksize	hdfs-site.xml
dfs.write.packet.size	hdfs-site.xml	dfs.client-write-packet-size	hdfs-site.xml
dfs.https.client.keystore.resource	hdfs-site.xml	dfs.client.https.keystore.resource	hdfs-site.xml
dfs.https.need.client.auth	hdfs-site.xml	dfs.client.https.need-auth	hdfs-site.xml
dfs.read.prefetch.size	hdfs-site.xml	dfs.bytes-per-checksum	hdfs-site.xml
dfs.socket.timeout	hdfs-site.xml	dfs.client.socket-timeout	hdfs-site.xml
dfs.balance.bandwidthPerSec	hdfs-site.xml	dfs.datanode.balance.bandwidthPerSec	hdfs-site.xml
dfs.data.dir	hdfs-site.xml	dfs.datanode.data.dir	hdfs-site.xml
dfs.datanode.max.xcievers	hdfs-site.xml	dfs.datanode.max.transfer.threads	hdfs-site.xml
session.id	hdfs-site.xml	dfs.metrics.session-id	hdfs-site.xml
dfs.access.time.precision	hdfs-site.xml	dfs.namenode.accesstime.precision	hdfs-site.xml
dfs.backup.address	hdfs-site.xml	dfs.namenode.backup.address	hdfs-site.xml
dfs.backup.http.address	hdfs-site.xml	dfs.namenode.backup.http-address	hdfs-site.xml
fs.checkpoint.dir	hdfs-site.xml	dfs.namenode.checkpoint.dir	hdfs-site.xml
fs.checkpoint.edits.dir	hdfs-site.xml	dfs.namenode.checkpoint.edits.dir	hdfs-site.xml
fs.checkpoint.period	hdfs-site.xml	dfs.namenode.checkpoint.period	hdfs-site.xml
dfs.name.edits.dir	hdfs-site.xml	dfs.namenode.backup.address	hdfs-site.xml
heartbeat.recheck.interval	hdfs-site.xml	dfs.namenode.heartbeat.recheck-interval	hdfs-site.xml
dfs.http.address	hdfs-site.xml	dfs.namenode.http-address	hdfs-site.xml
dfs.https.address	hdfs-site.xml	dfs.namenode.https-address	hdfs-site.xml
dfs.max.objects	hdfs-site.xml	dfs.namenode.max.objects	hdfs-site.xml
dfs.name.dir	hdfs-site.xml	dfs.namenode.name.dir	hdfs-site.xml
dfs.name.dir.restore	hdfs-site.xml	dfs.namenode.name.dir.restore	hdfs-site.xml
dfs.replication.considerLoad	hdfs-site.xml	dfs.namenode.replication.considerLoad	hdfs-site.xml
dfs.replication.interval	hdfs-site.xml	dfs.namenode.replication.interval	hdfs-site.xml
dfs.max-repl-streams	hdfs-site.xml	dfs.namenode.replication.max-streams	hdfs-site.xml
dfs.replication.min	hdfs-site.xml	dfs.namenode.replication.min	hdfs-site.xml
dfs.replication.pending.timeout.sec	hdfs-site.xml	dfs.namenode.replication.pending.timeout-sec	hdfs-site.xml
dfs.safemode.extension	hdfs-site.xml	dfs.namenode.safemode.extension	hdfs-site.xml
dfs.safemode.threshold.pct	hdfs-site.xml	dfs.namenode.safemode.threshold-pct	hdfs-site.xml
dfs.secondary.http.address	hdfs-site.xml	dfs.namenode.secondary.http-address	hdfs-site.xml
dfs.permissions	hdfs-site.xml	dfs.permissions.enabled	hdfs-site.xml
dfs.permissions.supergroup	hdfs-site.xml	dfs.permissions.superusergroup	hdfs-site.xml
dfs.df.interval	hdfs-site.xml	fs.df.interval	core-site.xml
dfs.umaskmode	hdfs-site.xml	fs.permissions.umask-mode	hdfs-site.xml

Review the follwing HDP 1.3.2 MapReduce Configs and their new HDP 2.x Mappings

Table 24.5. HDP 1.3.2 Configs now in Capacity Scheduler for HDP 2.x (mapred-site.xml)

HDP 1.3.2 config	HDP 1.3.2 config file	HDP 2.1 config	HDP 2.1 config file
mapred.map.child.java.opts	mapred-site.xml	mapreduce.map.java.opts	mapred-site.xml
mapred.job.map.memory.mb	mapred-site.xml	mapreduce.map.memory.mb	mapred-site.xml
mapred.reduce.child.java.opts	mapred-site.xml	mapreduce.reduce.java.opts	mapred-site.xml
mapred.job.reduce.memory.mb	mapred-site.xml	mapreduce.reduce.memory.mb	mapred-site.xml
ecurity.task.umbilical.protocol.acl	mapred-site.xml	security.job.task.protocol.acl	mapred-site.xml

Review the following HDP 1.3.2 Configs and their new HDP 2.x Capacity Scheduler mappings.

Table 24.6. HDP 1.3.2 Configs now in Capacity Scheduler for HDP 2.x (capacity-scheduler.xml)

HDP 1.3.2 config	HDP 1.3.2 config file	HDP 2.1 config	HDP 2.1 config file
mapred.queue.names	mapred-site.xml	yarn.scheduler.capacity.root.queues	capacity-scheduler.xml
mapred.queue.default.acl-submit-job	mapred-queue-acls.xml	yarn.scheduler.capacity.root.default.acl_submit_jobs	capacity-scheduler.xml
mapred.queue.default.acl-administer-jobs	mapred-queue-acls.xml	yarn.scheduler.capacity.root.default.acl_administer_jobs	capacity-scheduler.xml
mapred.capacity-scheduler.queue.default.capacity	capacity-scheduler.xml	yarn.scheduler.capacity.root.default.capacity	capacity-scheduler.xml
mapred.capacity-scheduler.queue.default.user-limit-factor	capacity-scheduler.xml	yarn.scheduler.capacity.root.default.user-limit-factor	capacity-scheduler.xml
mapred.capacity-scheduler.queue.default.maximum-capacity	capacity-scheduler.xml	yarn.scheduler.capacity.root.default.maximum-capacity	capacity-scheduler.xml
mapred.queue.default.state	capacity-scheduler.xml	yarn.scheduler.capacity.root.default.state	capacity-scheduler.xml

Compare the following HDP 1.3.2 configs in hadoop-env.sh with the new configs in HDP 2.x

Table 24.7. HDP 1.3.2 Configs and HDP 2.x for hadoop-env.sh

HDP 1.3.2 config	HDP 2.1 config	Description
JAVA_HOME	JAVA_HOME	Java implementation to use
HADOOP_HOME_WARN_SUPPRESS	HADOOP_HOME_WARN_SUPPRESS
HADOOP_CONF_DIR	HADOOP_CONF_DIR	Hadoop Configuration Directory
Not in hadoop-env.sh.	HADOOP_HOME
Not in hadoop-env.sh.	HADOOP_LIBEXEC_DIR
HADOOP_NAMENODE_INIT_HEAPSIZE	HADOOP_NAMENODE_INIT_HEAPSIZE
HADOOP_OPTS	HADOOP_OPTS	Extra Java runtime options. Empty by default.
HADOOP_NAMENODE_OPTS	HADOOP_NAMENODE_OPTS	Command specific options appended to HADOOP_OPTS.
HADOOP_JOBTRACKER_OPTS	Not in hadoop-env.sh.	Command specific options appended to HADOOP_OPTS.
HADOOP_TASKTRACKER_OPTS	Not in hadoop-env.sh.	Command specific options appended to HADOOP_OPTS.
HADOOP_DATANODE_OPTS	HADOOP_DATANODE_OPTS	Command specific options appended to HADOOP_OPTS.
Not in hadoop-env.sh.	YARN_RESOURCEMANAGER_OPTS	Command specific options appended to HADOOP_OPTS.
HADOOP_BALANCER_OPTS	HADOOP_BALANCER_OPTS	Command specific options appended to HADOOP_OPTS.
HADOOP_SECONDARYNAMENODE_OPTS	HADOOP_SECONDARYNAMENODE_OPTS	Command specific options appended to HADOOP_OPTS.
HADOOP_CLIENT_OPTS	HADOOP_CLIENT_OPTS	Applies to multiple commands (fs, dfs, fsck, distcp etc).
HADOOP_SECURE_DN_USER	Not in hadoop-env.sh.	Secure datanodes, user to run the datanode as
HADOOP_SSH_OPTS	HADOOP_SSH_OPTS	Extra ssh options.
HADOOP_LOG_DIR	HADOOP_LOG_DIR	Where log files are stored. $HADOOP_HOME/logs by default.
HADOOP_SECURE_DN_LOG_DIR	HADOOP_SECURE_DN_LOG_DIR	Where log files are stored in the secure data environment.
HADOOP_PID_DIR	HADOOP_PID_DIR	Directory where pid files are stored, /tmp by default.
HADOOP_SECURE_DN_PID_DIR	HADOOP_SECURE_DN_PID_DIR	Directory where pid files are stored, /tmp by default.
HADOOP_IDENT_STRING	HADOOP_IDENT_STRING	String representing this instance of hadoop. $USER by default.
MALLOC_ARENA_MAX	MALLOC_ARENA_MAX	Newer versions of glibc use an arena memory allocator that causes virtual memory usage to explode. This interacts badly with the many threads that we use in Hadoop. Tune the variable down to prevent vmem explosion.
Not in hadoop-env.sh.	HADOOP_MAPRED_LOG_DIR
Not in hadoop-env.sh.	HADOOP_MAPRED_PID_DIR
Not in hadoop-env.sh.	JAVA_LIBRARY_PATH
Not in hadoop-env.sh.	JSVC_HOME	For starting the datanode on secure cluster.

Add the following properties to the yarn-site.xml file:

<property> 
 <name>yarn.resourcemanager.scheduler.class</name> 
 <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>  
</property>

<property> 
 <name>yarn.resourcemanager.resource-tracker.address</name> 
 <value>$resourcemanager.full.hostname:8025</value>  
 <description>Enter your ResourceManager hostname.</description>
</property>

<property> 
 <name>yarn.resourcemanager.scheduler.address</name> 
 <value>$resourcemanager.full.hostname:8030</value>  
 <description>Enter your ResourceManager hostname.</description>
</property>

<property> 
 <name>yarn.resourcemanager.address</name> 
 <value>$resourcemanager.full.hostname:8050</value>  
 <description>Enter your ResourceManager hostname.</description>
</property>

<property> 
 <name>yarn.resourcemanager.admin.address</name> 
 <value>$resourcemanager.full.hostname:8141</value>  
 <description>Enter your ResourceManager hostname.</description>
</property>

<property> 
 <name>yarn.nodemanager.local-dirs</name> 
 <value>/grid/hadoop/yarn/local,/grid1/hadoop/yarn/local</value>  
 <description>Comma separated list of paths. Use the list of directories from $YARN_LOCAL_DIR.  
                For example, /grid/hadoop/yarn/local,/grid1/hadoop/yarn/local.</description>
</property>

<property> 
 <name>yarn.nodemanager.log-dirs</name> 
 <value>/grid/hadoop/yarn/log</value>
 <description>Use the list of directories from $YARN_LOCAL_LOG_DIR.  
                For example, /grid/hadoop/yarn/log,/grid1/hadoop/yarn/log,/grid2/hadoop/yarn/log</description>
</property>

<property> 
 <name>yarn.log.server.url</name> 
 <value>http://$jobhistoryserver.full.hostname:19888/jobhistory/logs/</value>
 <description>URL for job history server</description>
</property>

<property> 
 <name>yarn.resourcemanager.webapp.address</name> 
 <value>$resourcemanager.full.hostname:8088</value>
 <description>URL for job history server</description>
</property>

Add the following properties to the mapred-site.xml file:

<property>  
 <name>mapreduce.jobhistory.address</name> 
 <value>$jobhistoryserver.full.hostname:10020</value>  
 <description>Enter your JobHistoryServer hostname.</description>
</property>

<property>  
 <name>mapreduce.jobhistory.webapp.address</name>  
 <value>$jobhistoryserver.full.hostname:19888</value>  
 <description>Enter your JobHistoryServer hostname.</description>
</property>

<property>
  <name>mapreduce.shuffle.port</name>
  <value>13562</value>
</property>

<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>

For a secure cluster, add the following properties to mapred-site.xml:

<property>
    <name>mapreduce.jobhistory.principal</name>
    <value>jhs/_PRINCIPAL@$REALM.ACME.COM</value>
    <description>Kerberos principal name for the MapReduce JobHistory Server.</description>
</property>
<property>
    <name>mapreduce.jobhistory.keytab</name>
    <value>/etc/security/keytabs/jhs.service.keytab</value>
    <description>Kerberos keytab file for the MapReduce JobHistory Server.</description>
</property>

For a secure cluster, you must also update hadoop.security.auth_to_local in core-site.xml to include a rule regarding the mapreduce.jobhistory.principal value you set in the previous step.

RULE:[2:$1@$0](PRINCIPAL@$REALM.ACME.COM)s/.*/mapred/

where PRINCIPAL and REALM are the kerberos principal and realm you specified in mapreduce.jobhistory.principal.

Delete any remaining HDP1 properties in the mapred-site.xml file.