To perform a manual rolling upgrade, your cluster must meet the following prerequisites:
Item | Description |
Cluster Stack Version | Must be running the HDP 2.2 Stack |
Cluster Target Version | All HDP nodes must have HDP 2.2.8 installed alongside 2.2.0. (See "Preparing the Cluster" for installation instructions.) |
Services | All 2.2.0 services must be started and running. All previous upgrade operations must be finalized. |
HDFS | NameNode HA must be enabled and running with an active namenode and standby namenode. No components should be in decommissioning or decommissioned state. |
Hadoop, Hive, Oozie | Enable client retry for Hadoop, Hive, and Oozie. |
YARN | Enable Work Preserving Restart (WPR). (Optional) Enable YARN ResourceManager High Availability |
Shared client libraries | Shared client libraries must be loaded into HDFS. |
Hive, Tez | Confirm configuration settings for rolling upgrade. |
The following paragraphs describe component prerequisites in more detail. Examples assume that you are upgrading from 2.2.0 to 2.2.8.
Enable HDFS NameNode High Availability. See NameNode High Availability for Hadoop (in the Hadoop High Availability Guide) for more information.
Enable client retry properties for HDFS, Hive, and Oozie. These properties are not included by default, so you might need to add them to the site files.
For HDFS, set
dfs.client.retry.policy.enabled
to true inhdfs-site.xml
on all nodes with HDFS services.For Hive, specify
hive.metastore.failure.retries
andhive.metastore.client.connect.retry.delay
in hive-site.xml (for example,/usr/hdp/2.2.0.0-2041/hive/conf/hive-site.xml
). The default value for retries is 24; the default for retry delay is 5s.For Oozie,
export OOZIE_CLIENT_OPTS="${OOZIE_CLIENT_OPTS} -Doozie.connection.retry.count=<number of retries>"
inoozie-env.sh
(for example,/usr/hdp/2.2.0.0-2041/oozie/conf/oozie-env.sh
). A typical value for number of retries is 5.
Enable work-preserving ResourceManager/NodeManager restart in the
yarn-site.xml
file for each node. For more information, see Work-Preserving Restart in the YARN Resource Management Guide.Additional notes:
If
yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms
is set, the ResourceManager will wait for the specified number of milliseconds after each restart before accepting new jobs.After editing ResourceManager settings in the
yarn-site.xml
file, restart the ResourceManager and all NodeManagers. (Changes will not take effect until you restart the processes.)
(Optional) Enable YARN Resource Manager High Availability. Enabling RM HA will reduce the amount of service degradation while YARN is upgraded. If RM HA is not enabled, when the Resource Manager restarts your active jobs will pause and new job requests will wait to be scheduled. For more information, see Resource Manager High Availability for Hadoop.
To prevent disruption to MapReduce, Tez, and Oozie jobs, your existing jobs must reference the client libraries of the version they started with. Make sure shared client Hadoop libraries are available from distributed cache. This was probably set up during the HDP 2.2.0 installation process. For more information, see Running Multiple MapReduce Versions Using the YARN Distributed Cache in the YARN Resource Management Guide.
(Optional) When upgrading to HDP version 2.2.8 or later, remove the following two properties from the
hive-site.xml
configuration file (or set them to false):fs.file.impl.disable.cache
fs.hdfs.impl.disable.cache
Make sure HiveServer2 is configured for rolling upgrade. Set or confirm the following server-side properties:
Set
hive.server2.support.dynamic.service.discovery
totrue
Set
hive.zookeeper.quorum
to a comma-separated list of ZooKeeper host:port pairs in the Zookeeper ensemble (e.g.host1:port1, host2:port2, host3:port3
). By default this value is blank.Add the
hive.zookeeper.session.timeout
property to thehive-site.xml
file (if necessary), and specify the length of time that ZooKeeper will wait to hear from HiveServer2 before closing the client connection. The default value is 60 seconds.Set
hive.server2.zookeeper.namespace
to the value for the root namespace on ZooKeeper. (The root namespace is the parent node in ZooKeeper used by HiveServer2 when supporting dynamic service discovery.) Each HiveServer2 instance with dynamic service discovery enabled will create aznode
within this namespace. The default value ishiveserver2
.Note: you can specify the location of the
hive-site.xml
file via a HiveServer2 startup command line--config
option:hive --config <my_config_path> --service hiveserver2
JDBC considerations:
The JDBC driver connects to ZooKeeper and selects a HiveServer2 instance at random. When a JDBC client tries to pick up a HiveServer2 instance via ZooKeeper, the following JDBC connection string should be used:
jdbc:hive2://<zookeeper_ensemble>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=<hiveserver2_zookeeper_namespace>
where
<zookeeper_ensemble>
is a comma separated list of ZooKeeper host:port pairs, as described in thehive.zookeeper.quorum
property.<hiveserver2_zookeeper_namespace>
is the namespace on ZooKeeper under which HiveServer2znodes
are added. This instance is then used by the connecting client for her entire session.
Check the following two settings to make sure that Tez is configured for rolling upgrade:
tez.lib.uris
(in thetez-site.xml
file) should contain only a single value pointing to a version-specific Tez tarball file. For 2.2 installations, the Tez app jars are in/hdp/apps/${hp.version}
.Set
tez.use.cluster.hadoop-libs
tofalse
. (If true, the deployment will expect Hadoop jar files to be available on all nodes.
If Kerberos is enabled, it will continue to operate throughout the rolling upgrade process.