Designating a Replication Source
Minimum Required Role: Cluster Administrator (also provided by Full Administrator)
The Cloudera Manager Server that you are logged into is the destination for replications set up using that Cloudera Manager instance. From the Admin Console of this destination Cloudera Manager instance, you can designate a peer Cloudera Manager Server as a source of HDFS and Apache Hive data for replication.
Configuring a Peer Relationship
- Go to the Peers page by selecting . If there are no existing peers, you will see only an Add Peer button in addition to a short message. If peers already exist, they display in the Peers list.
- Click the Add Peer button.
- In the Add Peer dialog box, provide a name, the URL (including the port) of the Cloudera Manager Server source for the data to be replicated, and the login credentials for that server. Cloudera recommends that TLS/SSL be used. A warning is shown if the URL scheme is http instead of https. After configuring both peers to use TLS/SSL, add the remote source Cloudera Manager TLS/SSL certificate to the local Cloudera Manager truststore, and vice versa. See Configuring TLS Encryption for Cloudera Manager.
- Click the Add Peer button in the dialog box to create the peer relationship.
The peer is added to the Peers list. Cloudera Manager automatically tests the connection between the Cloudera Manager Server and the peer. You can also click Test Connectivity to test the connection. Test Connectivity also tests the Kerberos configuration for the clusters. For more information about this part of the test, see Kerberos Connectivity Test.
Modifying Peers
- Go to the Peers page by selecting . If there are no existing peers, you will see only an Add Peer button in addition to a short message. If peers already exist, they display in the Peers list.
- Do one of the following:
- Edit
- In the row for the peer, select Edit.
- Make your changes.
- Click Update Peer to save your changes.
- Delete - In the row for the peer, click Delete.
- Edit
HDFS and Hive/Impala Replication To and From Cloud Storage
Minimum Required Role: User Administrator (also provided by Full Administrator)
To configure Amazon S3 as a source or destination for HDFS or Hive/Impala replication, you configure AWS Credentials that specify the type of authentication to use, the Access Key ID, and Secret Key. See How to Configure AWS Credentials.
To configure Microsoft ADLS as a source or destination for HDFS or Hive/Imapla replication, you configure the service principal for ADLS. See Configuring ADLS Gen1 Connectivity.
After configuring S3 or ADLS, you can click the Replication Schedules link to define a replication schedule. See HDFS Replication or Hive/Impala Replication for details about creating replication schedules. You can also click Close and create the replication schedules later. Select the AWS Credentials account in the Source or Destination drop-down lists when creating the schedules.
Configuring Peers with SAML Authentication
- Create a Cloudera Manager user account that has the User Administrator or
Full Administrator role.
You can also use an existing user that has one of these roles. Since you will only use this user to create the peer relationship, you can delete the user account after adding the peer.
- Create or modify the peer, as described in this topic.
- (Optional) Delete the Cloudera Manager user account you just created.