Copying data between a secure and an insecure cluster using DistCp and WebHDFS

You can use distcp and WebHDFS to copy data between a secure cluster and an insecure cluster.

When copying data, ensure that you run the distcp commands from the secure cluster.
  1. On the secure cluster, set ipc.client.fallback-to-simple-auth-allowed to true in core-site.xml.
    <property>
      <name>ipc.client.fallback-to-simple-auth-allowed</name>
      <value>true</value> 
    </property>
    Alternatively, you can also pass this as a parameter when you run the distcp command. If you want to do that, move onto step 2.
  2. On the insecure cluster, add the secured cluster's realm name to the insecure cluster's configuration.
    1. In the Cloudera Manager Admin Console for the insecure cluster, navigate to Clusters > <HDFS cluster>.
    2. On the Configuration tab, search for Trusted Kerberos Realms and add the secured cluster's realm name.
    3. Save the change.
  3. Use commands such as the following only from the secure cluster side.
    #This example uses the insecure cluster as the source and the secure cluster as the destination
    distcp webhdfs://<insecure_namenode>:9870 webhdfs://<secure_namenode>:9871
    
    #This example uses the sefcure cluster as the source and the insecure cluster as the destination
    distcp webhdfs://<secure_namenode>:9871 webhdfs://<insecure_namenode>:9870

    If TLS is enabled, replace webhdfs with swebhdfs.

    If you did not configure ipc.client.fallback-to-simple-auth-allowed and want to pass it as a parameter, run commands such as the following from the secure cluster:

    #This example uses the insecure cluster as the source and the secure cluster (with TLS enabled) as the destination cluster. swebhdfs is used instead of webhdfs when TLS is enabled.
    hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true webhdfs://<insecure_namenode>:9870 swebhdfs://<secure_namenode>:9871
    
    #This example uses the secure cluster (with TLS enabled) as the source cluster and the insecure cluster as the destination. swebhdfs is used instead of webhdfs when TLS is enabled.
    hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true swebhdfs://<secure_namenode>:9871 webhdfs://<insecure_namenode>:9870