This is the documentation for CDH 5.1.x. Documentation for other versions is available at Cloudera Documentation.

Requirements and Restrictions

  1. The CDH 5 cluster must have a MapReduce service running on it (MRv1 or YARN (MRv2)).
  2. All the MapReduce nodes in the CDH 5 cluster should have full network access to all the nodes of the source cluster. This allows you to perform the copy in a distributed manner.
  3. To copy data between a secure and an insecure cluster, you must run the distcp command on the secure cluster.
  4. To copy data from a CDH 4 to a CDH 5 cluster, you can do one of the following:
      Note:

    The term source in this case refers to the CDH 4 (or other Hadoop) cluster you want to migrate or copy data from; and destination refers to the CDH 5 cluster.

The following restrictions currently apply (see Apache Hadoop Known Issues):
  • DistCp does not work between a secure cluster and an insecure cluster in some cases.

    As of CDH 5.1.3, DistCp does work between a secure and an insecure cluster if you use the webHDFS protocol and run the command from the secure cluster side after setting ipc.client.fallback-to-simple-auth-allowed to true, as described under Copying Data between a Secure and an Insecure Cluster using DistCp and webHDFS.

  • To use DistCp using Hftp from a secure cluster using SPNEGO, you must configure the dfs.https.port property on the client to use the HTTP port (50070 by default).
Page generated September 3, 2015.