3. Migrate the HDP Configurations

Configurations and configuration file names have changed between HDP 1.3.2 (Hadoop 1.2.x) and HDP 2.1 (Hadoop 2.4). To successfully upgrade to HDP 2.x, back up your current configuration files, download the new HDP 2.1 files, and compare. The following tables provide mapping information to make the comparison between releases easier.

To migrate the HDP Configurations

  1. Back up the following HDP 1.x configurations on all nodes in your clusters.

    • /etc/hadoop/conf

    • /etc/hbase/conf

    • /etc/hcatalog/conf

      [Note]Note

      With HDP 2.1, /etc/hcatalog/conf is divided into /etc/hive-hcatalog/conf and /etc/hive-webhcat.You cannot use /etc/hcatalog/conf in HDP 2.1.

    • /etc/hive/conf

    • /etc/pig/conf

    • /etc/sqoop/conf

    • /etc/flume/conf

    • /etc/mahout/conf

    • /etc/oozie/conf

    • /etc/zookeeper/conf

  2. Edit /etc/hadoop/conf/core-site.xml and set hadoop.rpc.protection from none to authentication.

  3. Copy your /etc/hcatalog/conf configurations to /etc/hive-hcatalog/conf and /etc/hive-webhcat as appropriate.

  4. Download the your HDP 2.x companion files from Download Companion Files and migrate your HDP 1.x configuration.

  5. Copy log4j.properties from the hadoop config directory of the companion files to /etc/hadoop/conf. The file should have owners and permissions similar to other files in /etc/hadoop/conf.

  6. Copy these configurations to all nodes in your clusters.

    • /etc/hadoop/conf

    • /etc/hbase/conf

    • /etc/hcatalog/conf

    • /etc/hive/conf

    • /etc/pig/conf

    • /etc/sqoop/conf

    • /etc/flume/conf

    • /etc/mahout/conf

    • /etc/oozie/conf

    • /etc/zookeeper/conf

    [Note]Note

    Upgrading the repo using yum or zypper resets all configurations. Prepare to replace these configuration directories each time you perform a yum or zypper upgrade.

  7. Review the following HDP 1.3.2 Hadoop Core configurations and the new configurations or locations in HDP 2.x

     

    Table 24.3. HDP 1.3.2 Hadoop Core Site (core-site.xml)

    HDP 1.3.2 configHDP 1.3.2 config fileHDP 2.1 config HDP 2.1 config file

    fs.default.name

    core-site.xml fs.defaultFS core-site.xml
    fs.checkpoint.dir core-site.xmldfs.namenode.checkpoint.dir hdfs-site.xml
    fs.checkpoint.edits.dir core-site.xmldfs.namenode.checkpoint.edits.dir hdfs-site.xml
    fs.checkpoint.period core-site.xmldfs.namenode.checkpoint.period hdfs-site.xml
    io.bytes.per.checksum core-site.xmldfs.bytes-per-checksum hdfs-site.xml
    dfs.df.interval hdfs-sitefs.df.interval core-site.xml
    hadoop.native.lib core-site.xmlio.native.lib.available core-site.xml
    hadoop.configured.node.mapping net.topology.configured.node.mapping core-site.xml
    topology.node.switch.mapping.impl core-site.xmlnet.topology.node.switch.mapping.impl core-site.xml
    topology.script.file.name core-site.xmlnet.topology.script.file.name core-site.xml
    topology.script.number.args core-site.xmlnet.topology.script.number.args core-site.xml


    [Note]Note

    The hadoop.rpc.protection configuration property in core-site.xml needs to specify authentication, integrity and/or privacy. No value defaults to authentication, but an invalid value such as "none" causes an error.

  8. Review the following 1.3.2 HDFS site configations and their new configurations and files in HDP 2.x.

     

    Table 24.4. HDP 1.3.2 Hadoop Core Site (hdfs-site.xml)

    HDP 1.3.2 configHDP 1.3.2 config fileHDP 2.1 config HDP 2.1 config file

    dfs.block.size

    hdfs-site.xml dfs.blocksize hdfs-site.xml
    dfs.write.packet.size hdfs-site.xmldfs.client-write-packet-size hdfs-site.xml
    dfs.https.client.keystore.resource hdfs-site.xmldfs.client.https.keystore.resource hdfs-site.xml
    dfs.https.need.client.auth hdfs-site.xmldfs.client.https.need-auth hdfs-site.xml
    dfs.read.prefetch.size hdfs-site.xmldfs.bytes-per-checksum hdfs-site.xml
    dfs.socket.timeout hdfs-site.xmldfs.client.socket-timeout hdfs-site.xml
    dfs.balance.bandwidthPerSec hdfs-site.xmldfs.datanode.balance.bandwidthPerSec hdfs-site.xml
    dfs.data.dir hdfs-site.xmldfs.datanode.data.dir hdfs-site.xml
    dfs.datanode.max.xcievers hdfs-site.xmldfs.datanode.max.transfer.threads hdfs-site.xml
    session.id hdfs-site.xmldfs.metrics.session-id hdfs-site.xml
    dfs.access.time.precision hdfs-site.xmldfs.namenode.accesstime.precision hdfs-site.xml
    dfs.backup.address hdfs-site.xmldfs.namenode.backup.address hdfs-site.xml
    dfs.backup.http.addresshdfs-site.xmldfs.namenode.backup.http-address hdfs-site.xml
    fs.checkpoint.dirhdfs-site.xmldfs.namenode.checkpoint.dir hdfs-site.xml
    fs.checkpoint.edits.dirhdfs-site.xmldfs.namenode.checkpoint.edits.dir hdfs-site.xml
    fs.checkpoint.periodhdfs-site.xmldfs.namenode.checkpoint.period hdfs-site.xml
    dfs.name.edits.dirhdfs-site.xml dfs.namenode.edits.dir hdfs-site.xml
    heartbeat.recheck.interval hdfs-site.xmldfs.namenode.heartbeat.recheck-interval hdfs-site.xml
    dfs.http.addresshdfs-site.xmldfs.namenode.http-address hdfs-site.xml
    dfs.https.addresshdfs-site.xmldfs.namenode.https-address hdfs-site.xml
    dfs.max.objectshdfs-site.xmldfs.namenode.max.objects hdfs-site.xml
    dfs.name.dirhdfs-site.xmldfs.namenode.name.dir hdfs-site.xml
    dfs.name.dir.restore hdfs-site.xmldfs.namenode.name.dir.restore hdfs-site.xml
    dfs.replication.considerLoadhdfs-site.xmldfs.namenode.replication.considerLoad hdfs-site.xml
    dfs.replication.intervalhdfs-site.xmldfs.namenode.replication.interval hdfs-site.xml
    dfs.max-repl-streamshdfs-site.xmldfs.namenode.replication.max-streams hdfs-site.xml
    dfs.replication.minhdfs-site.xmldfs.namenode.replication.min hdfs-site.xml
    dfs.replication.pending.timeout.sechdfs-site.xmldfs.namenode.replication.pending.timeout-sec hdfs-site.xml
    dfs.safemode.extensionhdfs-site.xmldfs.namenode.safemode.extension hdfs-site.xml
    dfs.safemode.threshold.pcthdfs-site.xmldfs.namenode.safemode.threshold-pct hdfs-site.xml
    dfs.secondary.http.addresshdfs-site.xmldfs.namenode.secondary.http-address hdfs-site.xml
    dfs.permissionshdfs-site.xmldfs.permissions.enabled hdfs-site.xml
    dfs.permissions.supergrouphdfs-site.xmldfs.permissions.superusergroup hdfs-site.xml
    dfs.df.intervalhdfs-site.xmlfs.df.intervalcore-site.xml
    dfs.umaskmodehdfs-site.xmlfs.permissions.umask-mode hdfs-site.xml


  9. Review the following HDP 1.3.2 MapReduce Configs and their new HDP 2.x Mappings

     

    Table 24.5. HDP 1.3.2 Configs now in Capacity Scheduler for HDP 2.x (mapred-site.xml)

    HDP 1.3.2 configHDP 1.3.2 config fileHDP 2.1 config HDP 2.1 config file

    mapred.map.child.java.opts

    mapred-site.xml mapreduce.map.java.optsmapred-site.xml
    mapred.job.map.memory.mbmapred-site.xmlmapreduce.map.memory.mb mapred-site.xml
    mapred.reduce.child.java.optsmapred-site.xmlmapreduce.reduce.java.optsmapred-site.xml
    mapred.job.reduce.memory.mbmapred-site.xmlmapreduce.reduce.memory.mbmapred-site.xml
    security.task.umbilical.protocol.aclmapred-site.xmlsecurity.job.task.protocol.aclmapred-site.xml


  10. Review the following HDP 1.3.2 Configs and their new HDP 2.x Capacity Scheduler mappings.

     

    Table 24.6. HDP 1.3.2 Configs now in Capacity Scheduler for HDP 2.x (capacity-scheduler.xml)

    HDP 1.3.2 configHDP 1.3.2 config fileHDP 2.1 config HDP 2.1 config file

    mapred.queue.names

    mapred-site.xml yarn.scheduler.capacity.root.queues capacity-scheduler.xml
    mapred.queue.default.acl-submit-jobmapred-queue-acls.xmlyarn.scheduler.capacity.root.default.acl_submit_jobs capacity-scheduler.xml
    mapred.queue.default.acl-administer-jobsmapred-queue-acls.xmlyarn.scheduler.capacity.root.default.acl_administer_jobscapacity-scheduler.xml
    mapred.capacity-scheduler.queue.default.capacitycapacity-scheduler.xmlyarn.scheduler.capacity.root.default.capacity capacity-scheduler.xml
    mapred.capacity-scheduler.queue.default.user-limit-factor capacity-scheduler.xml yarn.scheduler.capacity.root.default.user-limit-factor capacity-scheduler.xml
    mapred.capacity-scheduler.queue.default.maximum-capacitycapacity-scheduler.xmlyarn.scheduler.capacity.root.default.maximum-capacity capacity-scheduler.xml
    mapred.queue.default.statecapacity-scheduler.xml yarn.scheduler.capacity.root.default.statecapacity-scheduler.xml


  11. Compare the following HDP 1.3.2 configs in hadoop-env.sh with the new configs in HDP 2.x

     

    Table 24.7. HDP 1.3.2 Configs and HDP 2.x for hadoop-env.sh

    HDP 1.3.2 configHDP 2.1 config Description
    JAVA_HOME JAVA_HOMEJava implementation to use
    HADOOP_HOME_WARN_SUPPRESSHADOOP_HOME_WARN_SUPPRESS
    HADOOP_CONF_DIRHADOOP_CONF_DIRHadoop Configuration Directory
    Not in hadoop-env.sh.HADOOP_HOME
    Not in hadoop-env.sh.HADOOP_LIBEXEC_DIR
    HADOOP_NAMENODE_INIT_HEAPSIZEHADOOP_NAMENODE_INIT_HEAPSIZE 
    HADOOP_OPTSHADOOP_OPTS Extra Java runtime options. Empty by default.
    HADOOP_NAMENODE_OPTSHADOOP_NAMENODE_OPTSCommand specific options appended to HADOOP_OPTS.
    HADOOP_JOBTRACKER_OPTSNot in hadoop-env.sh.Command specific options appended to HADOOP_OPTS.
    HADOOP_TASKTRACKER_OPTSNot in hadoop-env.sh.Command specific options appended to HADOOP_OPTS.
    HADOOP_DATANODE_OPTSHADOOP_DATANODE_OPTSCommand specific options appended to HADOOP_OPTS.
    HADOOP_BALANCER_OPTSHADOOP_BALANCER_OPTSCommand specific options appended to HADOOP_OPTS.
    HADOOP_SECONDARYNAMENODE_OPTSHADOOP_SECONDARYNAMENODE_OPTSCommand specific options appended to HADOOP_OPTS.
    HADOOP_CLIENT_OPTSHADOOP_CLIENT_OPTSApplies to multiple commands (fs, dfs, fsck, distcp etc).
    HADOOP_SECURE_DN_USERNot in hadoop-env.sh.Secure datanodes, user to run the datanode as
    HADOOP_SSH_OPTSHADOOP_SSH_OPTSExtra ssh options.
    HADOOP_LOG_DIRHADOOP_LOG_DIRWhere log files are stored. $HADOOP_HOME/logs by default.
    HADOOP_SECURE_DN_LOG_DIRHADOOP_SECURE_DN_LOG_DIRWhere log files are stored in the secure data environment.
    HADOOP_PID_DIRHADOOP_PID_DIRDirectory where pid files are stored, /tmp by default.
    HADOOP_SECURE_DN_PID_DIRHADOOP_SECURE_DN_PID_DIRDirectory where pid files are stored, /tmp by default.
    HADOOP_IDENT_STRINGHADOOP_IDENT_STRINGString representing this instance of hadoop. $USER by default.
    Not in hadoop-env.sh.HADOOP_MAPRED_LOG_DIR 
    Not in hadoop-env.sh.HADOOP_MAPRED_PID_DIR 
    Not in hadoop-env.sh.JAVA_LIBRARY_PATH 
    Not in hadoop-env.sh.JSVC_HOMEFor starting the datanode on secure cluster.


    [Note]Note

    Some of the configuration settings refer to the variable  HADOOP_HOME . The value of  HADOOP_HOME  is automatically inferred from the location of the startup scripts. HADOOP_HOME  is the parent directory of the  bin  directory that holds the Hadoop scripts. In many instances this is  $HADOOP_INSTALL/hadoop.

  12. Add the following properties to the yarn-site.xml file:

    <property> 
     <name>yarn.resourcemanager.scheduler.class</name> 
     <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>  
    </property>

    <property> 
     <name>yarn.resourcemanager.resource-tracker.address</name> 
     <value>$resourcemanager.full.hostname:8025</value>  
     <description>Enter your ResourceManager hostname.</description>
    </property>

    <property> 
     <name>yarn.resourcemanager.scheduler.address</name> 
     <value>$resourcemanager.full.hostname:8030</value>  
     <description>Enter your ResourceManager hostname.</description>
    </property>

    <property> 
     <name>yarn.resourcemanager.address</name> 
     <value>$resourcemanager.full.hostname:8050</value>  
     <description>Enter your ResourceManager hostname.</description>
    </property>

    <property> 
     <name>yarn.resourcemanager.admin.address</name> 
     <value>$resourcemanager.full.hostname:8141</value>  
     <description>Enter your ResourceManager hostname.</description>
    </property>

    <property> 
     <name>yarn.nodemanager.local-dirs</name> 
     <value>/grid/hadoop/yarn/local,/grid1/hadoop/yarn/local</value>  
     <description>Comma separated list of paths. Use the list of directories from $YARN_LOCAL_DIR.  
    For example, /grid/hadoop/yarn/local,/grid1/hadoop/yarn/local.</description>
    </property>

    <property> 
     <name>yarn.nodemanager.log-dirs</name> 
     <value>/grid/hadoop/yarn/log</value>
     <description>Use the list of directories from $YARN_LOCAL_LOG_DIR.  
                    For example, /grid/hadoop/yarn/log,/grid1/hadoop/yarn/log,/grid2/hadoop/yarn/log</description>
    </property>

    <property> 
     <name>yarn.log.server.url</name> 
     <value>http://$jobhistoryserver.full.hostname:19888/jobhistory/logs/</value>
     <description>URL for job history server</description>
    </property>

    <property> 
     <name>yarn.resourcemanager.webapp.address</name> 
     <value>$resourcemanager.full.hostname:8088</value>
     <description>URL for job history server</description>
    </property>

    <property> 
     <name>yarn.nodemanager.admin-env</name> 
     <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value>
     <description>Restrict the number of memory arenas to prevent excessive VMEM use by the glib arena allocator.
                    For example, MALLOC_ARENA_MAX=4</description>
    </property>

  13. Add the following properties to the mapred-site.xml file:

    <property>  
     <name>mapreduce.jobhistory.address</name> 
     <value>$jobhistoryserver.full.hostname:10020</value>  
     <description>Enter your JobHistoryServer hostname.</description>
    </property>

    <property>  
     <name>mapreduce.jobhistory.webapp.address</name>  
     <value>$jobhistoryserver.full.hostname:19888</value>  
     <description>Enter your JobHistoryServer hostname.</description>
    </property>

    <property>
      <name>mapreduce.shuffle.port</name>
      <value>13562</value>
    </property>

    <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
    </property>                 

  14. For a secure cluster, add the following properties to mapred-site.xml:

    <property>
        <name>mapreduce.jobhistory.principal</name>
        <value>jhs/_PRINCIPAL@$REALM.ACME.COM</value>
        <description>Kerberos principal name for the MapReduce JobHistory Server.</description>
    </property>
    <property>
        <name>mapreduce.jobhistory.keytab</name>
        <value>/etc/security/keytabs/jhs.service.keytab</value>
        <description>Kerberos keytab file for the MapReduce JobHistory Server.</description>
    </property>
  15. For a secure cluster, you must also update hadoop.security.auth_to_local in core-site.xml to include a rule regarding the mapreduce.jobhistory.principal value you set in the previous step.

    RULE:[2:$1@$0](PRINCIPAL@$REALM.ACME.COM)s/.*/mapred/

    where PRINCIPAL and REALM are the kerberos principal and realm you specified in mapreduce.jobhistory.principal.

  16. Delete any remaining HDP1 properties in the mapred-site.xml file.

  17. Replace the default memory configuration settings in yarn-site.xml and mapred-site.xml with the YARN and MapReduce memory configuration settings you calculated previously.


loading table of contents...