Copying Data between a Secure and an Insecure Cluster using DistCp and WebHDFS

You can use DistCp and WebHDFS to copy data between a secure cluster and an insecure cluster. Note that when doing this, the distcp commands should be run from the secure cluster. by doing the following:
  1. On the secure cluster, set ipc.client.fallback-to-simple-auth-allowed to true in core-site.xml:
    <property>
      <name>ipc.client.fallback-to-simple-auth-allowed</name>
      <value>true</value> 
    </property>

    Alternatively, you can also pass this as a parameter when you run the distcp command. If you want to do that, move onto step 2.

  2. On the insecure cluster, add the secured cluster's realm name to the insecure cluster's configuration:
    1. In the Cloudera Manager Admin Console for the insecure cluster, navigate to Clusters > <HDFS cluster>.
    2. On the Configuration tab, search for Trusted Kerberos Realms and add the secured cluster's realm name.

      Note that his does not require Kerberos to be enabled but is a necessary step to allow the simple auth fallback to happen in the hdfs:// protocol.

    3. Save the change.
  3. Use commands such as the following from the secure cluster side only:
    #This example uses the insecure cluster as the source and the secure cluster as the destination
    distcp webhdfs://<insecure_namenode>:50070 webhdfs://<secure_namenode>:50470
    
    #This example uses the sefcure cluster as the source and the insecure cluster as the destination
    distcp webhdfs://<secure_namenode>:500470 webhdfs://<insecure_namenode>:50070

    If TLS is enabled, replace webhdfs with swebhdfs.

    If you did not configure ipc.client.fallback-to-simple-auth-allowed and want to pass it as a parameter, run commands such as the following from the secure cluster:
    #This example uses the insecure cluster as the source and the secure cluster (with TLS enabled) as the destination cluster. swebhdfs is used instead of webhdfs when TLS is enabled.
    hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true webhdfs://<insecure_namenode>:50070 swebhdfs://<secure_namenode>:50470
    
    #This example uses the secure cluster (with TLS enabled) as the source cluster and the insecure cluster as the destination. swebhdfs is used instead of webhdfs when TLS is enabled.
    hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true swebhdfs://<secure_namenode>:50470 webhdfs://<insecure_namenode>:50070