Remove PREFIX_TREE Data Block Encoding

Before upgrading to CDP Private Cloud Base, ensure that you have transitioned all the data to a supported encoding type.

The PREFIX_TREE data block encoding code is removed in CDH 6, meaning that HBase clusters with PREFIX_TREE enabled will fail. Therefore, before upgrading to CDH 6 you must ensure that all data has been transitioned to a supported encoding type.

The following pre-upgrade commands are used for validation:
  • Data Block Encoding validation: hbase pre-upgrade validate-dbe
  • HFiles validation: hbase pre-upgrade validate-hfile
  1. Download and distribute parcels for the target version of CDH6.

    If the downloaded parcel version is higher than the current Cloudera Manager version, the following error message displayed:

    Error for parcel CDH-6.x.parcel : Parcel version 6.X is not supported by this version of Cloudera Manager. Upgrade Cloudera Manager to at least 6.X before using this version of the parcel.

    You can safely ignore this error message.

  2. Find the installed parcel at /opt/cloudera/parcels.

Use the CDH 6 parcel to run the pre-upgrade commands. Cloudera recommends that you run them on an HMaster host.

  1. Run the hbase pre-upgrade validate-dbe command using the new installation.

    The commands check whether your table or snapshots use the PREFIX_TREE Data Block Encoding.

    This command is quite fast, because it validates only the table level descriptors.

    If PREFIX_TREE Data Block Encoding is not used, the following message is displayed:
    The used Data Block Encodings are compatible with HBase 2.0.
    If PREFIX_TREE Data Block Encoding is used, a similar error message is displayed:
    2018-07-13 09:58:32,028 WARN  [main] tool.DataBlockEncodingValidator: Incompatible DataBlockEncoding for table: t, cf: f, encoding: PREFIX_TREE

    If you got an error message, continue with Step 2 otherwise skip to Step 3.

  2. Fix your PREFIX_TREE encoded tables using the old installation.

    You can change the Data Block Encoding type to PREFIX, DIFF, or FAST_DIFF in your source cluster.

    Our example validation output reported column family f of table t is invalid. Its Data Block Encoding type is changed to FAST_DIFF in this example:

    hbase> alter 't', { NAME => 'f', DATA_BLOCK_ENCODING => 'FAST_DIFF' }
  3. Run the hbase pre-upgrade validate-hfile command using the new installation.

    This command checks every HFile (including snapshots) to ensure none of the snapshots have PREFIX_TREE data block encoding. It opens every HFile, so this operation can take a long time.

    If there is no HFile with PREFIX_TREE encoding, the following message is displayed:
    Checked 3 HFiles, none of them are corrupted.
       There are no incompatible HFiles.
    If you have HFile with PREFIX_TREE encoding a similar error message is displayed:
    2018-06-05 16:20:47,322 INFO  [main] tool.HFileContentValidator: Corrupted file: hdfs://example.com:8020/hbase/data/default/t/72ea7f7d625ee30f959897d1a3e2c350/prefix/7e6b3d73263c4851bf2b8590a9b3791e
    2018-06-05 16:20:47,383 INFO  [main] tool.HFileContentValidator: Corrupted file: hdfs://example.com:8020/hbase/archive/data/default/t/56be41796340b757eb7fff1eb5e2a905/f/29c641ae91c34fc3bee881f45436b6d1
    

    If you got an error message, continue with Step 4 otherwise skip to Step 7.

  4. Identify what kind of HFiles were reported in the error message.
    The report can contain two different kinds of HFiles, they differ in their path:
    • If an HFile is in /hbase/data then it belongs to a table.
    • If an HFile is located under /hbase/archive/data then it belongs to a snapshot.
  5. Fix the HFiles that belong to a table.
    1. Get the table name from the output.
      Our example output showed the /hbase/data/default/t/… path, which means that the HFile belongs to the t table which is in the default namespace.
    2. Rewrite the HFiles by issuing a major compaction. Use the old installation.
      shell> major_compact 't'
      When compaction is finished, the invalid HFile disappears.
  6. Fix the HFiles that belong to a snapshot that is needed. Use the old installation.
    1. Find the snapshot which refers the invalid HFile. In our example output, it was 29c641ae91c34fc3bee881f45436b6d1:
      $ for snapshot in $(hbase snapshotinfo -list-snapshots 2> /dev/null | tail -n -1 | cut -f 1 -d \|);
      do
      echo "checking snapshot named '${snapshot}'"
       hbase snapshotinfo -snapshot "${snapshot}" -files 2> /dev/null | grep 29c641ae91c34fc3bee881f45436b6d1
      done
      
      The following output means that the invalid file belongs to the t_snap snapshot:
      checking snapshot names 't__snap'
       1.0 K t/56be41796340b757eb7fff1eb5e2a905/f/29c641ae91c34fc3bee881f45436b6d1 (archive)
    2. Convert snapshot to another HFile format:
      # creating a new namespace for the cleanup process
      hbase> create_namespace 'pre_upgrade_cleanup'
      # creating a new snapshot
      hbase> clone_snapshot 't_snap', 'pre_upgrade_cleanup:t'
      hbase> alter 'pre_upgrade_cleanup:t', { NAME => 'f', DATA_BLOCK_ENCODING => 'FAST_DIFF' }
      hbase> major_compact 'pre_upgrade_cleanup:t'
      # removing the invalid snapshot
      hbase> delete_snapshot 't_snap'
      # creating a new snapshot
      hbase> snapshot 'pre_upgrade_cleanup:t', 't_snap'
      # removing temporary table
      hbase> disable 'pre_upgrade_cleanup:t'
      hbase> drop 'pre_upgrade_cleanup:t'
      hbase> drop_namespace 'pre_upgrade_cleanup'
      
  7. Check the Yes, I have run HBase pre-upgrade checks upgrade checkbox.

Ensure the co-processor classes are compatible with the upgrade.