Examples of DistCp commands using the S3 protocol and hidden credentials
You can various distcp command options to copy files between your CDP clusters and Amazon S3.
- Copying files to Amazon S3
-
hadoop distcp /user/hdfs/mydata s3a://myBucket/mydata_backup
- Copying files from Amazon S3
-
hadoop distcp s3a://myBucket/mydata_backup //user/hdfs/mydata
- Copying files to Amazon S3 using the
-filters
option to exclude specified source files - You specify a file name with the
-filters
option. The referenced file contains regular expressions, one per line, that define file name patterns to exclude from thedistcp
job. The pattern specified in the regular expression should match the fully-qualified path of the intended files, including the scheme (hdfs
,webhdfs
,s3a
, etc.). For example, the following are valid expressions for excluding files:hdfs://x.y.z:8020/a/b/c webhdfs://x.y.z:50070/a/b/c s3a://bucket/a/b/c
Reference the file containing the filter expressions using-filters
option. For example:hadoop distcp -filters /user/joe/myFilters /user/hdfs/mydata s3a://myBucket/mydata_backup
Contents of the samplemyFilters
file:.*foo.* .*/bar/.* hdfs://x.y.z:8020/tmp/.* hdfs://x.y.z:8020/tmp1/file1
The regular expressions in themyFilters
exclude the following files:.*foo.*
– excludes paths that contain the string "foo
"..*/bar/.*
– excludes paths that include a directory namedbar
.hdfs://x.y.z:8020/tmp/.*
– excludes all files in the/tmp
directory.hdfs://x.y.z:8020/tmp1/file1
– excludes the file/tmp1/file1
.
- Copying files to Amazon S3 with the
-overwrite
option. - The
-overwrite
option overwrites destination files that already exist.hadoop distcp -overwrite /user/hdfs/mydata s3a://user/mydata_backup
For more information about the -filters
,
-overwrite
, and other options, see DistCp Guide: Command Line Options (Apache Software
Foundation).