Support matrix for Cloudera Replication Manager

Use Replication Manager to replicate HDFS, Hive external tables, HBase data, Ranger policies and roles for HDFS, Hive, and HBase services, and Iceberg tables from Cloudera Base on premises clusters to Cloudera on cloud clusters in Amazon S3 (AWS), Microsoft Azure ADLS Gen2 (ABFS), and Google Cloud Platform (GCP).

Prerequisites for creating replication policies

Before you create replication policies, you must perform the following actions:
  • Verify that the on-premises cluster versions are supported by Replication Manager.
  • Register the on-premises clusters as classic clusters in the Cloudera Management Console.
  • Register the cloud account credentials in the Replication Manager service.
  • Verify the cluster access, and configure the minimum ports for replication.
  • Verify that the Cloudera on cloud and Cloudera Manager versions of the target cluster match or are higher than the versions of the source cluster.

Supported replication policies and features in Cloudera on cloud Replication Manager

Cloudera on cloud Replication Manager provides replication policies that you can create, edit, and manage to accomplish your data replication goals. You can use other alternate replication methods for scenarios that Replication Manager does not support. Certain features are available only if the source and target clusters' Cloudera Manager versions support the feature.

Supported replication policies

You can use the following replication policies in Replication Manager:
  • HDFS replication policies

    Replicates data and metadata between the following environments:
    • From Cloudera Base on premises to cloud storage.
    • From cloud storage to classic clusters that are Cloudera Base on premises clusters.
  • Hive replication policies

    • Supports table-level replication.
    • Replicate Hive external tables from Cloudera Base on premises to cloud storage and to Data Hubs.
    • Replicate data stored in Hive tables, Hive metadata, data in Hive metastore, and Impala metadata (catalog server metadata) associated with Impala tables registered in the Hive metastore.
    • Migrate Sentry permissions to Ranger.
  • HBase replication policies

    Replicates all the data from the specified tables and continues to synchronize the changed data automatically without user intervention. The replication policies support the following data and environments:
    • HBase data from a source classic cluster that is a Cloudera Base on premises cluster, Cloudera Operational Database (COD), or Data Hub to a target Data Hub or COD cluster.
    • HBase data between different environments within a Virtual Private Cloud (VPC).
    • HBase data in SFT-enabled clusters for target clusters.
    • Phoenix tables.
    Table 1. Lowest supported cluster and runtime versions for HBase replication policies
    Source cluster type Lowest supported Runtime source version Lowest supported source Cloudera Manager version Target cluster type Lowest supported target Cloudera version Lowest supported target Cloudera Manager version
    Kerberos-enabled Cloudera Base on premises 7.1.9 7.11.3 COD (AWS/Azure) 7.2.18 -
    Kerberos-enabled Cloudera Base on premises 7.1.9 7.11.3 Data Hub in Cloudera on cloud (AWS/Azure) 7.2.18 7.12.0
    Kerberos-enabled Cloudera Base on premises 7.1.9 SP1 7.11.3 CHF7 GCP 7.2.18 -
    COD (AWS/Azure) 7.2.18 - COD (AWS/Azure) 7.2.18 -
    COD (GCP) 7.2.18 - COD (GCP) 7.2.18 -
  • Iceberg replication policies

    Replicates Iceberg tables between Data Lakes through Data Hubs in Cloudera on cloud 7.3.2 or higher versions using AWS. The Data Lakes can be located in a single AWS region or across multiple regions.

    The replication policies replicate the following components:
    • Metadata and catalog from the source cluster Hive Metastore (HMS) to the target cluster HMS.
    • Data files from the source cluster to the target cluster. The Iceberg replication policy can replicate only between AWS S3 storage in public cloud environments.
    • All snapshots incrementally from the source cluster by default. This allows you to run time travel queries on the target cluster.
    You can use Iceberg replication policies in the following use cases:
    • Implementing disaster recovery.
    • Implementing passive disaster recovery with incremental replication at regular intervals between two similar systems.
  • Ranger replication policies

    Migrates the following components from Kerberos-enabled Cloudera Base on premises 7.3.2 or higher clusters using Cloudera Manager 7.13.2 to Cloudera on cloud 7.3.2 clusters:
    • Ranger policies, and roles for HDFS, Hive, and HBase services.

      These policies include Ranger tag-based and Ranger resource-based policies. The replication policy always performs a complete export and import of Ranger policies.

    • Ranger audit logs in HDFS (using superuser credentials).

      The Ranger audit log directory on the source cluster must be snapshot-enabled. Replication Manager uses DistCp jobs to replicate Ranger HDFS audit log directories. Therefore, the first Ranger replication policy run to replicate the Ranger audit log directory is a bootstrap job and the subsequent runs are incremental.

    You can use Ranger replication policies in the following use cases:
    • When Ranger is used for file system-level access control for HDFS and Hive and you want to copy the Ranger policies to another cluster for backup purposes.
    • When you want to move or replicate Ranger policies for Hive (SQL) or HBase data to another cluster for disaster recovery purposes.

Supported features

The following table lists the features and the lowest supported Cloudera Manager versions that are required for source and target clusters to use them:
Table 2. Replication functionalities and lowest supported Cloudera Manager versions
Replication functionality Lowest supported source Cloudera Manager version Lowest supported target Cloudera Manager version
Register the GCP credentials to use in Replication Manager on the Cloud Credentials page. 7.12.0.0 and higher Supports all Cloudera on cloud Cloudera Manager versions.
Replicate HBase data simultaneously between multiple clusters.

To enable this feature, contact your Cloudera Account team.

7.12.0.0 and higher 7.12.0.0 and higher
Replicate only the HBase tables for which the replication scope is already enabled using the Select Source > Replicate only tables where replication is already enabled option during the HBase replication policy creation process.

To enable this feature, contact your Cloudera Account team.

Supports all Cloudera on cloud Cloudera Manager versions.
  • 7.9.0-h7 and higher
  • 7.11.0-h3 and higher
  • 7.12.0.0 and higher
Specify the network load balancer (NLB) Endpoint after you enable the Select Destination > Replicate via a Network Load Balancer* option during the HBase replication policy creation process if the on-premises cluster uses NLB to communicate with the COD clusters. 7.1.9 7.12.0.100
Specify the YARN queue bandwidth using the Initial Snapshot Settings > Maximum Bandwidth option during the HBase replication policy creation process to export the HBase initial snapshot.

To enable this feature, contact your Cloudera Account team.

7.12.0.100 7.12.0.100
Enter Initial Snapshot Settings > Maximum parallel snapshots to specify the maximum number of tables to process in parallel during the initial snapshot export and import step for an HBase replication policy.

If you do not enter any value, Replication Manager sets an appropriate value, depending on the resources in the source and target clusters, to optimize the performance.

To enable this feature, contact your Cloudera Account team.

Supports all Cloudera on cloud Cloudera Manager versions. 7.12.0.100
Add IDBroker credentials to use in Replication Manager on the Cloud Credentials page.

To enable this feature, contact your Cloudera Account team.

7.11.3 CHF7 7.11.3 CHF7
Enter the Select Source > Export snapshot user field during the HBase replication policy creation process to specify the username to export the initial snapshot to the target.

To enable this feature, contact your Cloudera Account team.

7.11.3 CHF7 7.11.3 CHF7

Replicate data from Cloudera Base on premises and Cloudera on cloud source clusters

Replication Manager replicates HDFS, Hive external tables, HBase data, Ranger policies and roles for HDFS, Hive, and HBase services, and Iceberg tables from Cloudera Base on premises clusters to Cloudera on cloud clusters in Amazon S3 (AWS), Microsoft Azure ADLS Gen2 (ABFS), and Google Cloud Platform (GCP).

The following tables list the lowest source and destination cluster versions, lowest Cloudera Manager versions, supported cloud providers, and supported scenarios:

Replicate data from Cloudera Base on premises source clusters

Table 3. Cloudera Base on premises source cluster support matrix
Lowest supported source Cloudera Manager version Lowest supported source Cloudera Runtime version Target cluster Supported services on Replication Manager Services that require other replication methods
7.11.3 7.1.9 Cloudera on cloud AWS/Azure HDFS HBase

To replicate HBase data, see COD replication in a Nutshell and HBase data replication.

7.11.3 7.1.9 Data Lake in Cloudera on cloud AWS/Azure Hive external tables
7.11.3 7.1.9 Data Hub in Cloudera on cloud AWS/Azure Hive external tables None
7.11.3 7.1.9 Data Hub in Cloudera on cloud AWS/Azure HBase None
7.11.3 CHF7 7.1.9 SP1 Data Hub in Cloudera on cloud GCP
  • HDFS
  • Hive external tables
  • HBase
None
7.13.2 7.3.2 Data Hub in Cloudera on cloud AWS Ranger policies, roles for HDFS, Hive, and HBase services, and audit logs in HDFS None

Replicate data from Cloudera on cloud source clusters

Consider the following limitations while using Cloudera on cloud source and Cloudera on cloud target clusters:
  • Replication across cross-cloud providers that is from AWS to Azure and vice-versa is not supported.
  • The source and target clusters must use the same account.
Table 4. Cloudera on cloud source cluster support matrix
Lowest supported source cluster version Lowest supported target cluster version Supported services on Replication Manager
Cloudera on cloud AWS*/Azure 7.2.18 Cloudera Base on premises 7.1.9 HDFS
Cloudera on cloud GCP 7.2.18 Cloudera Base on premises 7.1.9 SP1 and higher HDFS
COD version 7.2.18 - Cloudera on cloud AWS*/Azure/GCP AWS HBase
Iceberg Replication Data Hub 7.3.2 on Cloudera on cloud AWS* Iceberg Replication Data Hub 7.3.2 on Cloudera on cloud AWS Iceberg tables stored on AWS S3
*Replication Manager does not support S3 as a source or destination when S3 is configured to use SSE-KMS.

To view all supported clusters and features, including earlier and end of support (EOS) versions, see the Replication Manager support matrix.