Working with Amazon S3
The Amazon S3 object store is the standard mechanism to store, retrieve, and share large quantities of data in AWS.
Cloudera recommends using unique S3 bucket names across all endpoints to avoid conflicts with other services in CDP.
-
Object store model for storing, listing, and retrieving data.
-
Support for objects up to 5 terabytes, with many petabytes of data allowed in a single "bucket".
-
Data is stored in Amazon S3 in buckets which are stored in different AWS regions.
-
Buckets can be restricted to different users or IAM roles.
-
Data stored in an Amazon S3 bucket is billed based on the size of data how long it is stored, and on operations accessing this data. In addition, you are billed when you transfer data between regions:
-
Data transfers between an Amazon S3 bucket and a cluster running in the same region are free of download charges (except in the special case of buckets in which data is served on a user-pays basis).
-
Data downloaded from an Amazon S3 bucket located outside the region in which the bucket is hosted is billed per megabyte.
-
Data downloaded from an Amazon S3 bucket to any host over the internet is also billed per-Megabyte.
-
-
Data stored in Amazon S3 can be backed up with Amazon Glacier.