Kerberos setup guidelines for Distcp between secure clusters
There are specific guidelines to consider while setting up Kerberos on secure CDP
        clusters for successfully performing distcp between them.
- You have two clusters, each in a different Kerberos realm
                        (
SOURCEandDESTINATIONin this example) - You have data that needs to be copied from 
SOURCEtoDESTINATION - A Kerberos realm trust exists, either between 
SOURCEandDESTINATION(in either direction), or between bothSOURCEandDESTINATIONand a common third realm (such as an Active Directory domain). 
If your environment matches the one described above, use the following table to configure
            Kerberos delegation tokens on your cluster so that you can successfully
                distcp across two secure clusters. Based on the direction of the
            trust between the SOURCE and DESTINATION clusters, you
            can use the mapreduce.job.hdfs-servers.token-renewal.exclude property
            to instruct ResourceManagers on either cluster to skip or perform delegation token
            renewal for NameNode hosts.
| Environment Type | Kerberos Delegation Token Setting | |
|---|---|---|
SOURCE trusts
                                DESTINATION | 
                        Distcp job runs on the DESTINATION cluster | 
                        You do not need to set the
                                mapreduce.job.hdfs-servers.token-renewal.exclude
                            property.  | 
                    
Distcp job runs on the SOURCE cluster | 
                        Set the
                                mapreduce.job.hdfs-servers.token-renewal.exclude
                            property to a comma-separated list of the hostnames of the NameNodes of
                            the DESTINATION cluster.  | 
                    |
DESTINATION trusts
                                SOURCE | 
                        Distcp job runs on the DESTINATION cluster | 
                        Set the
                                mapreduce.job.hdfs-servers.token-renewal.exclude
                            property to a comma-separated list of the hostnames of the NameNodes of
                            the SOURCE cluster. | 
                    
Distcp job runs on the SOURCE cluster | 
                        You do not need to set the
                                mapreduce.job.hdfs-servers.token-renewal.exclude
                            property. | 
                    |
Both SOURCE and DESTINATION trust
                            each other | 
                        Set the
                                mapreduce.job.hdfs-servers.token-renewal.exclude
                            property to a comma-separated list of the hostnames of the NameNodes of
                            the DESTINATION cluster. | 
                    |
Neither SOURCE nor DESTINATION
                            trusts the other | 
                        If a common realm is usable (such as
                            Active Directory), set the
                                mapreduce.job.hdfs-servers.token-renewal.exclude
                            property to a comma-separated list of hostnames of the NameNodes of the
                            cluster that is not running the distcp job. For example, if you
                            are running the job on the DESTINATION cluster:
  | 
                    |
