Upgrading HBase
To see which version of HBase is shipping in CDH 5, check the Version and Packaging Information. For important information on new and changed components, see the CDH 5 Release Notes.
Check the Known Issues and Work Arounds in CDH 5 and Incompatible Changes for HBase before proceeding.
CDH 5 HBase Compatibility
CDH 5 HBase is based on the Apache HBase 0.96 release and is not wire compatible with CDH 4 (based on 0.92 and 0.94 releases). Consequently, rolling upgrades from CDH 4 to CDH 5 are not possible because existing CDH 4 HBase clients cannot make requests to CDH 5 servers and CDH 5 HBase clients cannot make requests to CDH 4 servers. Clients of the Thrift and REST proxy servers, however, retain wire compatibility between CDH 4 and CDH 5.
The upgrade from CDH 4 HBase to CDH 5 HBase is irreversible and requires HBase to be shut down completely. Executing the upgrade script reorganizes existing HBase data stored on HDFS into new directory structures, converts CDH 3 0.90 HFile v1 files to the newer, improved and optimized CDH 4/CDH 5 HFile v2 file format and rewrites the hbase.version file. This upgrade also removes transient data stored in ZooKeeper so that it can be converted to the new data format.
These changes were made to reduce the impact in future major upgrades. Previously HBase used brittle custom data formats and this move shifts HBase's RPC and persistent data to a more evolvable Protocol Buffer data format.
The HBase User API (Get, Put, Result, Scanner etc; see Apache HBase API documentation) has evolved and attempts have been made to make sure the HBase Clients are source code compatible and thus should recompile without needing any source code modifications. This cannot be guaranteed however, since with the conversion to ProtoBufs, some relatively obscure APIs have been removed. Rudimentary efforts have also been made to preserve recompile compatibility with advanced APIs such as Filters and Coprocessors. These advanced APIs are still evolving and our guarantees for API compatibility are weaker here.
As of 0.96, the User API has been marked and all attempts at compatibility in future versions will be made. A version of the javadoc that only contains the User API can be found here.
Checksums in CDH 5
In CDH 4, HBase relied on HDFS checksums to protect against data corruption. When you upgrade to CDH 5, HBase checksums are now turned on by default. This default configuration may result in different performance characteristics (latency/throughput) and possible degradation for certain workloads.. To regain performance, modify the following configuration properties in hbase-site.xml:
<property> <name>hbase.regionserver.checksum.verify</name> <value>false</value> <description> If set to true, HBase will read data and then verify checksums for hfile blocks. Checksum verification inside HDFS will be switched off. If the hbase-checksum verification fails, then it will switch back to using HDFS checksums. </description> </property>
<property> <name>hbase.hstore.checksum.algorithm</name> <value>NULL</value> <description> Name of an algorithm that is used to compute checksums. Possible values are NULL, CRC32, CRC32C. </description> </property>
Upgrading HBase from CDH 4 to CDH 5
CDH 5.0 HBase is based on Apache HBase 0.96.1.1 Remember that once a cluster has been upgraded to CDH 5, it cannot be reverted to CDH 4. To ensure a smooth upgrade, this section guides you through the steps involved in upgrading HBase from the older CDH 4.x releases to CDH 5.
Prerequisites
HDFS and ZooKeeper should be available while upgrading HBase.
Steps to Upgrade
CDH 5 comes with an upgrade script for HBase. You can run bin/hbase --upgrade to see its Help section. The script runs in two modes: -check and -execute.
Step 1: Run the HBase upgrade script in -check mode
When you run the HBase upgrade script on your running CDH 4 cluster in -check mode, its purpose is to look for all occurrences of HFile v1 and rewrite them to HFile v2. This is because HFile v1 is not supported by HBase 0.96 and all existing files in this format must be rewritten before the upgrade.
This is done by major compaction of the region with these files. Major compaction merges and optimizes files for efficient reads. The process converts the older HFile v1 format to an improved and optimized HFile v2. It also detects corrupt files, those that have an undefined major version (neither 1 nor 2), and are most likely not readable at all. Corrupt files should be removed.
The -check mode prints statistics at the end of the run. It also prints the absolute paths of the tables it has scanned, HFile v1 files if any, regions containing such files (to major compact), and corrupted files.
$ bin/hbase upgrade -checkYour output should be similar to the following:
Tables Processed: hdfs://localhost:41020/myHBase/.META. hdfs://localhost:41020/myHBase/usertable hdfs://localhost:41020/myHBase/TestTable hdfs://localhost:41020/myHBase/t Count of HFileV1: 2 HFileV1: hdfs://localhost:41020/myHBase/usertable /fa02dac1f38d03577bd0f7e666f12812/family/249450144068442524 hdfs://localhost:41020/myHBase/usertable /ecdd3eaee2d2fcf8184ac025555bb2af/family/249450144068442512 Count of corrupted files: 1 Corrupted Files: hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812/family/1 Count of Regions with HFileV1: 2 Regions to Major Compact: hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812 hdfs://localhost:41020/myHBase/usertable/ecdd3eaee2d2fcf8184ac025555bb2af
In the example above, you can see that the script has detected two HFile v1 files, one corrupt file and the regions to major compact. By default, the script scans the root directory, as defined by hbase.rootdir. In case you want to scan a specific directory, use the --dir option. For example, the following command would scan the /myHBase/testTable directory.
bin/hbase upgrade --check --dir /myHBase/testTable
You should then major compact all the reported regions so as to transform the files from HFile v1 to HFile v2 format. Once all the HFileV1 files are rewritten, that is, the -check command returns a "No HFile v1 found" message, it is then safe to proceed with the upgrade.
Step 2: Gracefully shutdown CDH 4 HBase cluster
Shutdown your CDH 4 HBase cluster before you run the upgrade script in -execute mode.
To shutdown HBase gracefully:
- Stop the REST and Thrift server and clients, then stop the
cluster.
- Stop the Thrift server and clients:
sudo service hbase-thrift stop
Stop the REST server:sudo service hbase-rest stop
- Stop the cluster by shutting down the master and the region
servers:
- Use the following command on the master node:
sudo service hbase-master stop
- Use the following command on each node hosting a region
server:
sudo service hbase-regionserver stop
- Use the following command on the master node:
- Stop the Thrift server and clients:
- Stop the ZooKeeper Server:
$ sudo service zookeeper-server stop
Step 4: Run the HBase upgrade script in -execute mode
This step executes the actual upgrade process. It has a verification step which checks whether or not the Master, RegionServer and backup Master znodes have expired. If not, the upgrade is aborted. This ensures no upgrade occurs while an HBase process is still running. If your upgrade is aborted even after shutting down the HBase cluster, retry after some time to let the znodes expire. Default znode expiry time is 300 seconds.
As mentioned earlier, ZooKeeper and HDFS should be available. If ZooKeeper is managed by HBase, then use the following command to start ZooKeeper.
./hbase/bin/hbase-daemon.sh start zookeeper
The upgrade involves three steps:
- Upgrade Namespace: This step upgrades the directory layout of HBase files.
- Upgrade Znodes: This step upgrades /hbase/replication (znodes corresponding to peers, log queues and so on) and table znodes (keep table enable/disable information). It deletes other znodes.
- Log Splitting: In case the shutdown was not clean, there might be some Write Ahead Logs (WALs) to split. This step does the log splitting of such WAL files. It is executed in a “non distributed mode”, which could make the upgrade process longer in case there are too many logs to split. To expedite the upgrade, ensure you have completed a clean shutdown.
$ bin/hbase upgrade -execute
Your output should be similar to the following:
Starting Namespace upgrade Created version file at hdfs://localhost:41020/myHBase with version=7 Migrating table testTable to hdfs://localhost:41020/myHBase/.data/default/testTable ….. Created version file at hdfs://localhost:41020/myHBase with version=8 Successfully completed NameSpace upgrade. Starting Znode upgrade …. Successfully completed Znode upgrade Starting Log splitting … Successfully completed Log splitting
The output of the -execute command can either return a success message as in the example above, or, in case of a clean shutdown where no log splitting is required, the command would return a "No log directories to split, returning" message. Either of those messages indicates your upgrade was successful.
- If you install a newer version of a package that is already on the system, configuration files that you have modified will remain intact.
- If you uninstall a package, the package manager renames any configuration files you have modified from <file> to <file>.rpmsave. If you then re-install the package (probably to install a new version) the package manager creates a new <file> with applicable defaults. You are responsible for applying any changes captured in the original configuration file to the new configuration file. In the case of Ubuntu and Debian upgrades, you will be prompted if you have made changes to a file for which there is a new version; for details, see Automatic handling of configuration files by dpkg.
Step 5 (Optional): Move Tables to Namespaces
CDH 5 introduces namespaces for HBase tables. As a result of the upgrade, all tables are automatically assigned to namespaces. The root, meta, and acl tables are added to the hbase system namespace. All other tables are assigned to the default namespace.
To move a table to a different namespace, take a snapshot of the table and clone it to the new namespace. After the upgrade, do the snapshot and clone operations before turning the modified application back on.
FAQ
In order to prevent upgrade failures because of unexpired znodes, is there a way to check/force this before an upgrade?
The upgrade script "executes" the upgrade when it is run with the -execute option. As part of the first step, it checks for any live HBase processes (RegionServer, Master and backup Master), by looking at their znodes. If any such znode is still up, it aborts the upgrade and prompts the user to stop such processes, and wait until their znodes have expired. This can be considered an inbuilt check.
The -check option has a different use case: To check for HFile v1 files. This option is to be run on live CDH 4 clusters to detect HFile v1 and major compact any regions with such files.
What are the steps for Cloudera Manager to do the upgrade?
- Check Mode: On your running CDH 4 cluster, run the upgrade script in -check mode. Check the logs to see if any HFile v1 or corrupt files have been found. If there are any HFile v1 files, you will be prompted to major compact such regions. You will need to go to command line to perform major compaction. Re-run the upgrade script in -check mode until you see a “No HFile v1 found” message.
- Shutdown CDH 4 HBase: Once ALL HFile v1 are upgraded to HFile v2 (via major compaction), shut down the CDH 4 cluster. Try to complete a clean shutdown in order to minimize the upgrade time. Wait for znode session expiry time (default, 300 seconds) to let the znodes for the RegionServers, HMaster and backup Masters expire. (You can investigate these znodes using the HBase zkcli tool).
- Rename .snapshot folder: Rename the .snapshot folder to .hbase-snapshot. (After this step, you can upgrade HDFS/ZK)
- Execute Mode: Run the
upgrade script in -execute mode. To
ensure that all HBase processes have stopped, the script will check for any live
processes. If there is any znode still running for Master, backup Masters and
RegionServers, the upgrade is aborted. One way to avoid unwanted aborts would be to
wait till the znodes' sessions have expired before running the upgrade command.
This step upgrades the namespace, replication and table znodes, and performs log splitting of any leftover WALs due to improper shutdown of the CDH 4 cluster. This is a single threaded WAL splitting, so a proper shutdown will minimize the number of WALs to process in this step and reduce the upgrade time.
- Start HBase: Check the UI to ensure HBase is now running on CDH 5.
Upgrading HBase from an Earlier CDH 5 Release
To upgrade HBase from an earlier CDH 5 release, proceed as follows.
The instructions that follow assume that you are upgrading HBase as part of an upgrade to the latest CDH 5 release, and have already performed the steps under Upgrading from a CDH 5 Beta Release to the Latest Release .
Step 1: Perform a Graceful Cluster Shutdown
Upgrading via rolling restart is not supported.
To shut HBase down gracefully:
- Stop the Thrift server and clients, then stop the cluster.
- Stop the Thrift server and clients:
sudo service hbase-thrift stop
- Stop the cluster by shutting down the master and the region
servers:
- Use the following command on the master node:
sudo service hbase-master stop
- Use the following command on each node hosting a region
server:
sudo service hbase-regionserver stop
- Use the following command on the master node:
- Stop the Thrift server and clients:
- Stop the ZooKeeper Server:
$ sudo service zookeeper-server stop
Step 2: Install the new version of HBase
You may want to take this opportunity to upgrade ZooKeeper, but you do not have to upgrade Zookeeper before upgrading HBase; the new version of HBase will run with the older version of Zookeeper. For instructions on upgrading ZooKeeper, see Upgrading ZooKeeper from an Earlier CDH 5 Release.
It is a good idea to back up the /hbase znode before proceeding. By default, this is in /var/lib/zookeeper.
To install the new version of HBase, follow directions in the next section, Installing HBase.
- If you install a newer version of a package that is already on the system, configuration files that you have modified will remain intact.
- If you uninstall a package, the package manager renames any configuration files you have modified from <file> to <file>.rpmsave. If you then re-install the package (probably to install a new version) the package manager creates a new <file> with applicable defaults. You are responsible for applying any changes captured in the original configuration file to the new configuration file. In the case of Ubuntu and Debian upgrades, you will be prompted if you have made changes to a file for which there is a new version; for details, see Automatic handling of configuration files by dpkg.
<< HBase Installation | Installing HBase >> | |