Preparing HBase for upgrade

Before upgrading to CDP Private Cloud Base, you must ensure that you have transitioned all the data to a supported encoding type.

You must perform this task after installing the HDP intermediate bits on the hosts, but before you upgrade your cluster to HDP intermediate bits. Run these commands on the cluster that has the HDP intermediate bits. The HBase validation tool is packaged within HDP 7.1.X, you can get the HDP intermediate bits from the Cloudera Archive site, for more information to get the intermediate bits, see Register software repositories.

Place these binaries into your home directory before you run the validation command. If your cluster is Kerberized, you will have to kinit as a hbase user. Ensure that you check for any corrupted HFiles using the HFile content validator tool and rectify them before the upgrade. Note that you can no longer use the PREFIX_TREE Data Block Encoding type in the HDP intermediate bits. While the PREFIX_TREE Data Block was not previously supported in HDP, you had the choice to use it as an unsupported option.

The PREFIX_TREE data block encoding code is removed in CDP Private Cloud Base, meaning that HBase clusters with PREFIX_TREE enabled will fail. Therefore, before upgrading to CDP Private Cloud Base you must ensure that all data has been transitioned to a supported encoding type.
The following pre-upgrade commands are used for validation:
  • Data Block Encoding validation: hbase pre-upgrade validate-dbe
  • HFiles validation: hbase pre-upgrade validate-hfile
  1. Run the hbase pre-upgrade validate-dbe command using the new installation.
    For example, if you are using the HDP intermediate bits, you must run the following command:
    /usr/hdp/ pre-upgrade validate-dbe

    The commands check whether your table or snapshots use the PREFIX_TREE Data Block Encoding. This command does not take much time to run, because it validates only the table level descriptors.

    If PREFIX_TREE Data Block Encoding is not used, the following message is displayed:
    The used Data Block Encodings are compatible with HBase 2.0.

    If you see this message, your data block encodings are compatible, and you do not have to do any more steps.

    If PREFIX_TREE Data Block Encoding is used, a similar error message is displayed:
    2018-07-13 09:58:32,028 WARN  [main] tool.DataBlockEncodingValidator: Incompatible DataBlockEncoding for table: t, cf: f, encoding: PREFIX_TREE

    If you see this error message, continue to Step 2 and fix all your PREFIX_TREE encoded tables.

  2. Fix your PREFIX_TREE encoded tables using the old installation.

    You can change the Data Block Encoding type to PREFIX, DIFF, or FAST_DIFF in your source cluster.

    For example, if your validation output reported column family f of table t is invalid. Its Data Block Encoding type is changed to FAST_DIFF:

    hbase> alter 't', { NAME => 'f', DATA_BLOCK_ENCODING => 'FAST_DIFF' }
    1. Run the hbase pre-upgrade validate-hfile command using the new installation. This command checks every HFile (including snapshots) to ensure none of the snapshots have PREFIX_TREE data block encoding. It opens every HFile, so this operation can take a long time.
      If there is no HFile with PREFIX_TREE encoding, a confirmation message similar to the following is displayed:
      Checked n HFiles, none of them are corrupted. 
      There are no incompatible HFiles.

      If you have an HFile with PREFIX_TREE encoding, an error message similar to the following is displayed:

      INFO  [main] tool.HFileContentValidator: Corrupted file: hdfs://
      2018-06-05 16:20:47,383 INFO  [main] tool.HFileContentValidator: Corrupted file: hdfs://
    2. Get the table name from the output. Our example output showed the /hbase/data/default/t/… file path, which means that the HFile belongs to the t table which is in the default namespace.
    3. Rewrite the HFiles by running major compaction in the old installation using this command.
      hbase > major_compact 't'
      When the major compaction is complete, the invalid HFile disappears.
    4. Run the hbase pre-upgrade validate-hfile command to confirm if all the HFiles are rewritten in the new encoding format.
  3. Identify what kind of HFiles were reported in the error message.
    The report can contain two different kinds of HFiles, they differ in their path:
    • If an HFile is in /hbase/data then it belongs to a table.
    • If an HFile is located under /hbase/archive/data then it belongs to a snapshot.
  4. Fix the HFiles that belong to a snapshot that is needed. Run these commands in the old HDP cluster.
    Depending on the size of the table the space requirements to perform the operation will roughly be about the same size of the snapshot until the conversion is complete and the old snapshot can be dropped.
    1. Find the snapshot which refers the invalid HFile. In our example output, it was 29c641ae91c34fc3bee881f45436b6d1:
      $ for snapshot in $(hbase snapshotinfo -list-snapshots 2> /dev/null | tail -n -1 | cut -f 1 -d \|);
      echo "checking snapshot named '${snapshot}'"
       hbase snapshotinfo -snapshot "${snapshot}" -files 2> /dev/null | grep 29c641ae91c34fc3bee881f45436b6d1
      The following output means that the invalid file belongs to the t_snap snapshot:
      checking snapshot names 't__snap'
       1.0 K t/56be41796340b757eb7fff1eb5e2a905/f/29c641ae91c34fc3bee881f45436b6d1 (archive)
    2. Convert snapshot to another HFile format:
      # creating a new namespace for the cleanup process
      hbase> create_namespace 'pre_upgrade_cleanup'
      # creating a new snapshot
      hbase> clone_snapshot 't_snap', 'pre_upgrade_cleanup:t'
      hbase> alter 'pre_upgrade_cleanup:t', { NAME => 'f', DATA_BLOCK_ENCODING => 'FAST_DIFF' }
      hbase> major_compact 'pre_upgrade_cleanup:t'
      # removing the invalid snapshot
      hbase> delete_snapshot 't_snap'
      # creating a new snapshot
      hbase> snapshot 'pre_upgrade_cleanup:t', 't_snap'
      # removing temporary table
      hbase> disable 'pre_upgrade_cleanup:t'
      hbase> drop 'pre_upgrade_cleanup:t'
      hbase> drop_namespace 'pre_upgrade_cleanup'