Commands That May Be Slower with S3
Some commands tend to be significantly slower with Amazon S3 than when invoked against
HDFS or other filesystems. This includes renaming files, listing files, find
,
mv
, cp
, and rm
.
Renaming Files
Unlike in a normal filesystem, renaming files and directories in an object store usually takes time proportional to the size of the objects being manipulated. As many of the filesystem shell operations use renaming as the final stage in operations, skipping that stage can avoid long delays.
In particular, we recommend that when using the put
and
copyFromLocal
commands, you set the -d
option for a direct
upload. For
example:
# Upload a file from the cluster filesystem hadoop fs -put -d /datasets/example.orc s3a://bucket1/datasets/ # Upload a file from the local filesystem hadoop fs -copyFromLocal -d -f ~/datasets/devices.orc s3a://bucket1/datasets/ # Create a file from stdin echo "hello" | hadoop fs -put -d -f - s3a://bucket1/datasets/hello.txt
Listing Files
Commands which list many files tend to be significantly slower with Amazon S3 than when invoked against HDFS or other filesystems. For example:
hadoop fs -count s3a://bucket1/ hadoop fs -du s3a://bucket1/
Find
The find
command can be very slow on a large store with many directories
under the path
supplied.
# Enumerate all files in the bucket hadoop fs -find s3a://bucket1/ -print # List *.txt in the bucket. # Remember to escape the wildcard to stop the bash shell trying to expand it hadoop fs -find s3a://bucket1/datasets/ -name \*.txt -print
Rename
The time to rename a file depends on its size. The time to rename a directory depends on the number and size of all files beneath that directory. If the operation is interrupted, the object store will be in an undefined state.
hadoop fs -mv s3a://bucket1/datasets s3a://bucket/historical
Copy
The copy operation reads each file and then writes it back to the object store; the time to complete depends on the amount of data to copy, and on the bandwidth in both directions between the local computer and the object store.
hadoop fs -cp s3a://bucket1/datasets s3a://bucket1/historical
Note | |
---|---|
The further the VMs are from the object store, the longer the copy process takes. |