Administering HDFS
Also available as:
PDF

DistCp Driver

The DistCp driver parses the arguments passed to the DistCp command on the command line.

The DistCp Driver components are responsible for:
  • Parsing the arguments passed to the DistCp command on the command-line, via:

  • OptionsParser

  • DistCpOptionsSwitch

Assembling the command arguments into an appropriate DistCpOptions object, and initializing DistCp. These arguments include:
  • Source-paths

  • Target location

  • Copy options (e.g. whether to update-copy, overwrite, which file attributes to preserve, etc.)

Orchestrating the copy operation by:

  • Invoking the copy-listing generator to create the list of files to be copied.

  • Setting up and launching the Hadoop MapReduce job to carry out the copy.

  • Based on the options, either returning a handle to the Hadoop MapReduce job immediately, or waiting until completion.

The parser elements are executed only from the command-line (or if DistCp::run() is invoked). The DistCp class may also be used programmatically, by constructing the DistCpOptions object and initializing a DistCp object appropriately.