Rolling Upgrade Guide
Also available as:
PDF
loading table of contents...

Cluster Prerequisites

To perform a manual rolling upgrade, your cluster must meet the following prerequisites:

ItemDescription
Cluster Stack VersionMust be running the HDP 2.2 Stack
Cluster Target VersionAll HDP nodes must have HDP 2.2.9 installed alongside 2.2.0. (See "Preparing the Cluster" for installation instructions.)
ServicesAll 2.2.0 services must be started and running. All previous upgrade operations must be finalized.
HDFSNameNode HA must be enabled and running with an active namenode and standby namenode. No components should be in decommissioning or decommissioned state.
Hadoop, Hive, OozieEnable client retry for Hadoop, Hive, and Oozie.
YARNEnable Work Preserving Restart (WPR). (Optional) Enable YARN ResourceManager High Availability
Shared client librariesShared client libraries must be loaded into HDFS.
Hive, TezConfirm configuration settings for rolling upgrade.

The following paragraphs describe component prerequisites in more detail. Examples assume that you are upgrading from 2.2.0 to 2.2.9.

  • Enable HDFS NameNode High Availability. See NameNode High Availability for Hadoop (in the Hadoop High Availability Guide) for more information.

  • Enable client retry properties for HDFS, Hive, and Oozie. These properties are not included by default, so you might need to add them to the site files.

    • For HDFS, set dfs.client.retry.policy.enabled to true in hdfs-site.xml on all nodes with HDFS services.

    • For Hive, specify hive.metastore.failure.retries and hive.metastore.client.connect.retry.delay in hive-site.xml (for example, /usr/hdp/2.2.0.0-2041/hive/conf/hive-site.xml). The default value for retries is 24; the default for retry delay is 5s.

    • For Oozie, export OOZIE_CLIENT_OPTS="${OOZIE_CLIENT_OPTS} -Doozie.connection.retry.count=<number of retries>" in oozie-env.sh (for example, /usr/hdp/2.2.0.0-2041/oozie/conf/oozie-env.sh). A typical value for number of retries is 5.

  • Enable work-preserving ResourceManager/NodeManager restart in the yarn-site.xml file for each node. For more information, see Work-Preserving Restart in the YARN Resource Management Guide.

    Additional notes:

    • If yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms is set, the ResourceManager will wait for the specified number of milliseconds after each restart before accepting new jobs.

    • After editing ResourceManager settings in the yarn-site.xml file, restart the ResourceManager and all NodeManagers. (Changes will not take effect until you restart the processes.)

  • (Optional) Enable YARN Resource Manager High Availability. Enabling RM HA will reduce the amount of service degradation while YARN is upgraded. If RM HA is not enabled, when the Resource Manager restarts your active jobs will pause and new job requests will wait to be scheduled. For more information, see Resource Manager High Availability for Hadoop.

  • To prevent disruption to MapReduce, Tez, and Oozie jobs, your existing jobs must reference the client libraries of the version they started with. Make sure shared client Hadoop libraries are available from distributed cache. This was probably set up during the HDP 2.2.0 installation process. For more information, see Running Multiple MapReduce Versions Using the YARN Distributed Cache in the YARN Resource Management Guide.

  • (Optional) When upgrading to HDP version 2.2.9 or later, remove the following two properties from the hive-site.xml configuration file (or set them to false):

    • fs.file.impl.disable.cache

    • fs.hdfs.impl.disable.cache

  • Make sure HiveServer2 is configured for rolling upgrade. Set or confirm the following server-side properties:

    • Set hive.server2.support.dynamic.service.discovery to true

    • Set hive.zookeeper.quorum to a comma-separated list of ZooKeeper host:port pairs in the ZooKeeper ensemble (e.g. host1:port1, host2:port2, host3:port3). By default this value is blank.

    • Add the hive.zookeeper.session.timeout property to the hive-site.xml file (if necessary), and specify the length of time that ZooKeeper will wait to hear from HiveServer2 before closing the client connection. The default value is 60 seconds.

    • Set hive.server2.zookeeper.namespace to the value for the root namespace on ZooKeeper. (The root namespace is the parent node in ZooKeeper used by HiveServer2 when supporting dynamic service discovery.) Each HiveServer2 instance with dynamic service discovery enabled will create a znode within this namespace. The default value is hiveserver2.

      Note: you can specify the location of the hive-site.xml file via a HiveServer2 startup command line --config option:

      hive --config <my_config_path> --service hiveserver2

    JDBC considerations:

    • The JDBC driver connects to ZooKeeper and selects a HiveServer2 instance at random. When a JDBC client tries to pick up a HiveServer2 instance via ZooKeeper, the following JDBC connection string should be used:

      jdbc:hive2://<zookeeper_ensemble>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=<hiveserver2_zookeeper_namespace>

      where <zookeeper_ensemble> is a comma separated list of ZooKeeper host:port pairs, as described in the hive.zookeeper.quorum property.

      <hiveserver2_zookeeper_namespace> is the namespace on ZooKeeper under which HiveServer2 znodes are added. This instance is then used by the connecting client for her entire session.

  • Check the following two settings to make sure that Tez is configured for rolling upgrade:

    • tez.lib.uris (in the tez-site.xml file) should contain only a single value pointing to a version-specific Tez tarball file. For 2.2 installations, the Tez app jars are in /hdp/apps/${hp.version}.

    • Set tez.use.cluster.hadoop-libs to false. (If true, the deployment will expect Hadoop jar files to be available on all nodes.

If Kerberos is enabled, it will continue to operate throughout the rolling upgrade process.