Preparing to create a Hive replication policy

Before you create the Hive replication policies, you must prepare the clusters and verify cluster access and cloud credentials.

  • Do the source cluster and target cluster meet the requirements to create an Hive replication policy?
  • Is the source CDH cluster or source CDP Private Cloud Base cluster registered as a classic cluster on the Management Console?
    CDH clusters and CDP Private Cloud Base clusters are managed by Cloudera Manager. To enable these on-premises clusters for Replication Manager, you must register them as classic clusters on the Management Console. After registration, you can use them for data migration purposes.

    For information about registering an on-premises cluster as a classic cluster, see Adding a CDH cluster and Adding a CDP Private Cloud Base cluster.

  • Have you configured the all-database, table, column Ranger policy for the hdfs user on the target cluster to perform all the operations on all databases and tables?
    The hdfs user role is used to import Hive Metastore and must have access to all Hive datasets, including all operations. Otherwise, Hive import fails during the replication process. On the target cluster, the hive user must have Ranger admin privileges. The same hive user performs the metadata import operation.
    To provide access, navigate to the Ranger Admin UI > Service Manager > Hadoop_SQL Policies > Access section, and provide hdfs user permission to the all-database, table, column policy name.
  • Is an external account configured on the source CDH cluster's Cloudera Manager which allows the CDH cluster to access CDP cloud storage?
  • Do you have the required cluster access to create replication policies?
    Power users, the user who onboarded the source and target clusters, and users with ClassicClusterAdmin or ClassicClusterUser resource roles can create replication policies on clusters for which they have access. For more information, see Understanding account roles and resource roles.
  • Do you have the required cluster access to view the replication policies?
    Existing Hive replication policies are visible to users who have access to the source cluster in the replication policy. A warning appears if you do not have access to the source cluster.

    If you can view the policies, you can perform other actions on the policy including policy update and policy delete operations.

  • Is the required cloud credential that you want to use in the replication policy registered with the Replication Manager service?
    For more information, see Working with cloud credentials.
  • Do you need to replicate data securely? If so, ensure that the SSL/TLS certificate exchange between two Cloudera Manager instances that manage source and target clusters respectively is configured. For more information, see Configuring SSL/TLS certificate exchange between two Cloudera Manager instances.
  • Are the following ports (minimum port configuration) open and available for Replication Manager:
    Table 1. Minimum ports required for Hive replication policies
    Port Service Description
    7180 or 7183 Cloudera Manager Admin Console HTTP Open on the source cluster to enable Data lake Cloudera Manager to communicate to the on-premises Cloudera Manager. Connects to destination SDX Data Lake Cloudera Manager.
    9000 Cloudera Manager Agent Open on the source and target cluster to retrieve diagnostic and log information.
    9083 Hive Metastore Open on source and target cluster for Hive/Impala replication to query or access Hive Metastore.
    6000-6049 CCM Required for SSL connections to the Control Plane via Cluster Connectivity Manager (CCM) to communicate with Replication Manager.
    80 or 443 Data transfer from secondary node for AWS / ADLS Gen2 Open on all the HDFS nodes for AWS and ADLS Gen2.
    8443 Data Lake cluster Configure the port on the Data Lake cluster as the outgoing port for CDP Management Console to communicate with Cloudera Manager and Knox.
    8032 YARN Resource Manager Open on the source and target cluster to access the YARN ResourceManager.
    The following network security diagram shows the minimum port configuration required to create replication policies:
    Figure 1. Network security diagram for Replication Manager in CDP Public Cloud
    The image shows the network security diagram for Replication Manager in CDP Public Cloud.

    For more information, see Ports for Replication Manager on CDP Public Cloud.

After the clusters and cloud storage requirements are met, you can create a Hive replication policy.