Prerequisites
Before you begin the installation process, verify the following:
- You must have
root
access to the nodes on which the DLM App and DLM Engine will be installed. -
Ensure required services Knox, Ranger, HDFS, YARN, and Hive are installed.
-
Before you install DLM, make sure to verify if you are able to copy files between the Hadoop clusters/endpoints. Depending on various factors, your cluster environment might vary. It is recommended to use distributed copy command
distcp
to verify if the data between the clusters can be copied successfully. For more information, see Using DistCp. - Global LDAP is configured to share user-group mappings across clusters
- If using Kerberos with different KDCs, two-way trust is configured between the KDCs
- If using AD, there is no support for trust relationships across multiple domains or forests through domain and forest
-
Ensure to have one of the following external databases installed: MySQL or Postgres.
See the Hortonworks Support Matrix for the compatible versions of DataPlane (DP) Platform, HDP, and DLM.
- Knox SSODP Platform and the DLM leverage Knox SSO to provide users and services with simplified and consistent access to clusters, data and other services. You must configure Knox SSO on the HDP clusters that you plan to use with DLM.Note
The Knox SSO of your cluster must be configured to use the same LDAP/AD as your DP instance for user identity to match and propagate between the systems.
Refer to the following documentation on how to configure your cluster for Knox SSO:Resource Documentation Install Knox and enable in Ambari HDP Security Guide, Install Knox Configure SSO topology HDP Security Guide, Identity Providers Configure Knox SSO for Ambari HDP Security Guide, Setting up Knox SSO for Ambari Configure LDAP with Ambari Ambari Security Guide, Configuring Ambari Authentication with LDAP or Active Directory Authentication - Perform the DataPlane Platform pre-installation tasks. For more information, see Prepare your clusters.
- Install or upgrade to the supported version of Ambari. See Support Matrix for details of the supported Ambari versions. See Apache Ambari installation for more details.
- Install or upgrade to the supported versions of HDP on your cluster using Ambari. See DLM Support Matrix for details of the supported HDP versions. See the HDP installation documentation for more details.
- RangerRanger enables you to create services for specific Hadoop resources (HDFS, HBase, Hive) and add access policies to those services. If you use Ranger for authorisation in your cluster for LDAP users:
- Configure LDAP for Ranger usersync. For more information, see Advanced Usersync Settings.
- Configure LDAP Hadoop group mapping. For more information, see Setting Up Hadoop Group Mapping.
- Knox Gateway
Configuring Knox Gateway is required if your cluster is configured with Kerberos or with wire encryption. This simplifies certificate management for DP and cross-cluster communication, as the only security certificate that needs to be managed is for Knox.
Refer to the following documentation on how to configure your cluster for Knox Gateway:
Resource Documentation Configure a reverse proxy with Knox HDP Security Guide, Configuring the Knox Gateway Configure LDAP with Knox for proxy authentication HDP Security Guide, Setting Up LDAP Authentication - Hive
You must configure Hive with Ranger authoriser. For more information, see
Authorization using Apache Ranger Policies and hive.server2.enable.doAs=false
- YARN
DLM runs the replication jobs using YARN. For on-premise to on-premise replication, the replication job runs on the target cluster. For on-premise to cloud replication, the replication job runs on the source cluster. Make sure YARN is installed on the cluster where the replication job runs.
-
Ensure HDP clusters that are involved in replication have symmetric configuration. Each cluster in a replication relationship must be configured exactly the same for security (Kerberos), user management (LDAP/AD), and Knox Proxy. Cluster services like HDFS, HIVE, Knox, Ranger, and Atlas can have different configurations for High Availability (HA) i.e., source and target clusters have HA and non-HA setup respectively.
-
See the Hortonworks Support Matrix for the compatible versions of DP, HDP, and DLM.