HBase migration prerequisites

Before migrating HBase data from Cloudera/HDP to Cloudera Operational Database Public Cloud, you have to understand and perform all the necessary migration preparation tasks.

  • If you are migrating from CDH, configure Ranger ACLs in Cloudera corresponding to the HBase ACLs in your existing CDH cluster.

    For more information, see Configure a resource-based service:HBase.

  • Migrate all applications to use the new HBase-Spark connector because the Spark-HBase connector that you were using in CDH or HDP is no longer supported in Cloudera.

    For more information, see Using the HBase-Spark connector.

  • Ensure that all data has been migrated to a supported encoding type before the migration. For more information, see Removing PREFIX_TREE Data Block Encoding . This step is applicable for HBase 1.x when you are migrating data from HDP 2 or CDH 5. This step is not applicable for HBase 2.x.
  • Ensure that your HFiles are compatible with the version Apache HBase in Cloudera. For more information, see Validating HFiles.
  • Ensure that you upgrade any external or third-party co-processors manually because they are not automatically upgraded during migration. Ensure that your co-processor classes are compatible with Cloudera. For more information, see Checking co-processor classes.
  • Review the deprecated APIs and incompatibilities when upgrading from HDP 2.x or CDH 5.x to Cloudera. For more information, see HBase in Deprecation notices in Cloudera Runtime.
  • Install Cloudera Replication Plugin on the source cluster.
  • On the target Cloudera Operational Database cluster, make sure ZooKeeper (2181) and RegionServer (16020) ports are opened for the IP addresses of all RegionServers in the source cluster.

    For AWS, this involves editing the Security Groups which are in use by the Cloudera Operational Database cluster.

  • Ensure that all the RegionServers' hosts in the Cloudera Operational Database cluster are resolvable by both the IP and name addresses by all the RegionServers' hosts in the source cluster.