Examples of DistCp commands using the S3 protocol and hidden credentials
You can various distcp command options to copy files between your CDP clusters and Amazon S3.
- Copying files to Amazon S3
-
hadoop distcp /user/hdfs/mydata s3a://myBucket/mydata_backup - Copying files from Amazon S3
-
hadoop distcp s3a://myBucket/mydata_backup //user/hdfs/mydata - Copying files to Amazon S3 using the
-filtersoption to exclude specified source files - You specify a file name with the
-filtersoption. The referenced file contains regular expressions, one per line, that define file name patterns to exclude from thedistcpjob. The pattern specified in the regular expression should match the fully-qualified path of the intended files, including the scheme (hdfs,webhdfs,s3a, etc.). For example, the following are valid expressions for excluding files:hdfs://x.y.z:8020/a/b/c webhdfs://x.y.z:50070/a/b/c s3a://bucket/a/b/cReference the file containing the filter expressions using-filtersoption. For example:hadoop distcp -filters /user/joe/myFilters /user/hdfs/mydata s3a://myBucket/mydata_backupContents of the samplemyFiltersfile:.*foo.* .*/bar/.* hdfs://x.y.z:8020/tmp/.* hdfs://x.y.z:8020/tmp1/file1The regular expressions in themyFiltersexclude the following files:.*foo.*– excludes paths that contain the string "foo"..*/bar/.*– excludes paths that include a directory namedbar.hdfs://x.y.z:8020/tmp/.*– excludes all files in the/tmpdirectory.hdfs://x.y.z:8020/tmp1/file1– excludes the file/tmp1/file1.
- Copying files to Amazon S3 with the
-overwriteoption. - The
-overwriteoption overwrites destination files that already exist.hadoop distcp -overwrite /user/hdfs/mydata s3a://user/mydata_backup
For more information about the -filters,
-overwrite, and other options, see DistCp Guide: Command Line Options (Apache Software
Foundation).
