Flag | Description | Notes |
-p[rbugpca] |
Preserve r: replication number b: block size u: user g: group p: permission c: checksum-type a: ACL | Modification times are not preserved.
Also, when -update is specified, status updates will not be synchronized unless the file sizes also
differ (i.e. unless the file is recreated). If -pa is
specified, DistCp also preserves the permissions because ACLs are a
super-set of permissions. |
-i |
Ignore failures | This option will keep more accurate statistics about the copy than the default case. It also preserves logs from failed copies, which can be valuable for debugging. Finally, a failing map will not cause the job to fail before all splits are attempted. |
-log <logdir>
|
Write logs to
<logdir> |
DistCp keeps logs of each file it attempts to copy as map output. If a map fails, the log output will not be retained if it is re-executed. |
-m <num_maps> |
Maximum number of simultaneous copies | Specify the number of maps to copy data. Note that more maps may not necessarily improve throughput. |
-overwrite |
Overwrite destination | If a map fails and -i is
not specified, all the files in the split, not only those that failed, will
be recopied. As discussed in the Usage documentation, it also changes the
semantics for generating destination paths, so users should use this
carefully. |
-update |
Overwrite if src size different from dst size | As noted in the preceding, this is not a “sync” operation. The only criterion examined is the source and destination file sizes; if they differ, the source file replaces the destination file. As discussed in the Usage documentation, it also changes the semantics for generating destination paths, so users should use this carefully. |
-f <urilist_uri> |
Use list at
<urilist_uri> as src list |
This is equivalent to listing each
source on the command line. The urilist_uri list should be a
fully qualified URI. |
-filelimit <n> |
Limit the total number of files to be <= n | Deprecated! Ignored in DistCp v2. |
-sizelimit <n> |
Limit the total size to be <= n bytes | Deprecated! Ignored in DistCp v2. |
-delete |
Delete the files existing in the dst but not in src | The deletion is done by FS Shell. So the trash will be used, if it is enabled. |
-strategy {dynamic|uniformsize}
|
Choose the copy-strategy to be used in DistCp. | By default, uniformsize is
used. (i.e. Maps are balanced on the total size of files copied by each map.
Similar to legacy.) If dynamic is
specified, DynamicInputFormat is used instead. (This is described in the
Architecture section, under InputFormats.) |
-bandwidth
|
Specify bandwidth per map, in MB/second. | Each map will be restricted to consume only the specified bandwidth. This is not always exact. The map throttles back its bandwidth consumption during a copy, such that the net bandwidth used tends towards the specified value. |
-atomic {-tmp
<tmp_dir>}
|
Specify atomic commit, with optional tmp directory. | -atomic instructs DistCp
to copy the source data to a temporary target location, and then move the
temporary target to the final location atomically. Data will either be
available at final target in a complete and consistent form, or not at all.
Optionally, -tmp may be used to specify the location of the
tmp-target. If not specified, a default is chosen. Note: tmp_dir must be on the final target
cluster. |
-mapredSslConf <ssl_conf_file> |
Specify SSL Config file, to be used with HSFTP source | When using the hsftp protocol with a
source, the security-related properties may be specified in a config file
and passed to DistCp. <ssl_conf_file> needs to be in the
classpath. |
-async
|
Run DistCp asynchronously. Quits as soon as the Hadoop Job is launched. | The Hadoop Job-id is logged, for tracking. |