Chapter 3. Getting Started with Amazon S3
The following table provides an overview of tasks related to configuring and using HDP with S3. Click on the linked topics to get more information about specific tasks.
Note | |
---|---|
If you are looking for data sets to play around, you can use Landsat 8 data sets made available by AWS in a public Amazon S3 bucket called "landsat-pds". For more information, refer to Landsat on AWS. |
Task | Description |
---|---|
Meet the prerequisites |
To use S3 storage, you must have:
|
Configure authentication |
In order for Hadoop applications to access data stored in your private S3 buckets, you must configure authentication with your Amazon S3 account. |
Configure optional features: | You can optionally configure additional features such as bucket-specific settings. |
Work with S3 data:
|
Once you've configured authentication with your S3 bucket(s), you can access S3 data from Hive (via external tables) and Spark, and perform related tasks such as copying data between HDFS and S3 when needed. |
You can optionally work with S3 data that is protected with server-side encryption: SSE-S3, SSE-KMS, or SSE-C. | |
You can optionally configure and fine-tune performance-related features to optimize HDP performance for specific tasks including accessing S3 data from Hive, Spark, and copying data with DistCp. | |
Troubleshoot | Refer to this section if you experience issues while configuring or using S3 with HDP. |