Cloudera Docs
»
3.0.0
»
Accessing Cloud Data
Accessing Cloud Data
Also available as:
Contents
1. About This Guide
2. The Cloud Storage Connectors
3. Working with Amazon S3
Limitations of Amazon S3
Configuring Access to S3
Using Instance Metadata to Authenticate
Using Configuration Properties to Authenticate
Using Per-Bucket Credentials to Authenticate
Using Environment Variables to Authenticate
Embedding Credentials in the URL to Authenticate (Deprecated)
Defining Authentication Providers
Using Temporary Session Credentials
Using Anonymous Login
Protecting S3 Credentials with Credential Providers
Creating a Credential File
Configuring the Hadoop Security Credential Provider Path Property
Customizing Per-Bucket Secrets Held in Credential Files
IAM Role Permissions for Working with S3
Referencing S3 Data in Applications
Configuring Per-Bucket Settings
Configuring Per-Bucket Settings to Access Data Around the World
Using S3Guard for Consistent S3 Metadata
Introduction to S3Guard
Configuring S3Guard
Preparing the S3 Bucket
Choosing a DynamoDB Table and IO Capacity
Creating The DynamoDB Access Policy
Restricting Access to S3Guard Tables
Configuring S3Guard in Ambari
Create the S3Guard Table in DynamoDB
Monitoring and Maintaining S3Guard
Disabling S3Guard and Destroying a S3Guard Database
Pruning Old Data from S3Guard Tables
Importing a Bucket into S3Guard
Verifying that S3Guard is Enabled on a Bucket
Using the S3Guard CLI
S3Guard: Operational Issues
S3Guard: Known Issues
Safely Writing to S3 Through the S3A Committers
Introducing the S3A Committers
Enabling the Directory Committer in Hadoop
Configuring Directories for Intermediate Data
Using the Directory Committer in MapReduce
Enabling the Directory Committer in Spark
Verifying That an S3A Committer Was Used
Cleaning up After Failed Jobs
Using the S3Guard Command to List and Delete Uploads
Advanced Committer Configuration
Enabling Speculative Execution
Using Unique Filenames to Avoid File Update Inconsistency
Speeding up Job Commits by Increasing the Number of Threads
Securing the S3A Committers
The S3A Committers and Third-Party Object Stores
Limitations of the S3A Committers
Troubleshooting the S3A Committers
Security Model and Operations on S3
S3A and Checksums (Advanced Feature)
A List of S3A Configuration Properties
Encrypting Data on S3
SSE-S3: Amazon S3-Managed Encryption Keys
Enabling SSE-S3
SSE-KMS: Amazon S3-KMS Managed Encryption Keys
Enabling SSE-KMS
IAM Role permissions for working with SSE-KMS
SSE-C: Server-Side Encryption with Customer-Provided Encryption Keys
Enabling SSE-C
Configuring Encryption for Specific Buckets
Mandating Encryption for an S3 Bucket
Performance Impact of Encryption
Improving Performance for S3A
Working with Local S3 Buckets
Configuring and Tuning S3A Block Upload
Tuning S3A Uploads
Thread Tuning for S3A Data Upload
Optimizing S3A read for different file types
Improving Load-Balancing Behavior for S3
S3 Performance Checklist
Working with Third-party S3-compatible Object Stores
Troubleshooting S3
Authentication Failures
Authentication Failure Due to Signature Mismatch
Authentication Failure Due to Clock Skew
Authentication Failure When Using URLs with Embedded Secrets
Authentication Failures When Running on Java 8u60+
Classpath Related Errors
ClassNotFoundException Errors
Missing Method in com.amazonaws Class
Connectivity Problems
Unable to Execute HTTP Request: Read Timed Out
Bad Request Exception When Working with S3 Frankfurt, Seoul, or Elsewhere
Error Message "The bucket you are attempting to access must be addressed using the specified endpoint"
Errors During Delete or Rename of Files
Errors Related to Visible S3 Inconsistency
Troubleshooting Encryption
AccessDeniedException When Creating Directories and Files
AES256 Is Enabled but an Encryption Key Was Set in fs.s3a.server-side-encryption.key
Unknown Server Side Encryption Algorithm
4. Working with ADLS
Configuring Access to ADLS
Configure Access by Using Client Credential
Configure Access by Using Token-Based Authentication
Protecting the Azure Credentials for ADLS with Credential Providers
Referencing ADLS in URLs
Configuring User and Group Representation
ADLS Proxy Setup
5. Working with WASB
Configuring Access to WASB
Protecting the Azure Credentials for WASB with Credential Providers
Protecting the Azure Credentials for WASB within an Encrypted File
Referencing WASB in URLs
Configuring Page Blob Support
Configuring Atomic Folder Rename
Configuring Support for Append API
Configuring Multithread Support
Configuring WASB Secure Mode
Configuring Authorization Support in WASB
6. Working with Google Cloud Storage (Technical Preview)
Configuring Access to Google Cloud Storage
Create a GCP Service Account
Modify GCS Bucket Permissions
Configure Access to GCS from Your Cluster
Setting User-Agent Suffix for GCS
Additional Configuration Options for GCS
7. Accessing Cloud Data in Hive
Hive and S3: The Need for S3Guard
Exposing Cloud Data as Hive Tables
Populating Partition-Related Information
Analyzing Tables
Improving Hive Performance with Cloud Object Stores
8. Accessing Cloud Data in Spark
Using S3 as a Safe and Fast Destination of Work
Improving Spark Performance with Cloud Storage
Improving ORC and Parquet Read Performance
Accelerating S3 Read Performance
Accelerating Azure Read Performance
Putting it All Together: spark-defaults.conf
9. Copying Cloud Data with Hadoop
Running FS Shell Commands
Commands That May Be Slower with Cloud Object Storage
Unsupported Filesystem Operations
Deleting Files on Cloud Object Stores
Overwriting Objects on Amazon S3
Timestamps on Cloud Object Stores
Copying Data with DistCp
Using DistCp with S3
Specifying Per-Bucket DistCp Options for S3 Buckets
Using DistCp with Azure ADLS and WASB
DistCp and Proxy Settings
Improving DistCp Performance
Accelerating File Listing
Working with Local Stores
Controlling the Number of Mappers and Their Bandwidth
« Prev
Next »
Connectivity Problems
You may encounter the following S3 connectivity issues.
© 2012–2020, Cloudera, Inc.
Document licensed under the
Creative Commons Attribution ShareAlike 4.0 License
.
Cloudera.com
|
Documentation
|
Support
|
Community