Minimum Required Role:
User
Administrator (also provided by Full
Administrator) This feature is not available when using Cloudera
Manager to manage Data Hub clusters.
Amazon S3 (Simple Storage Service) can be used in a CDP cluster managed
by Cloudera Manager in the following ways:
- As storage for Impala tables
- As a source or destination for HDFS and Hive/Impala replication and for
cluster storage
- To enable Cloudera Navigator to extract metadata from Amazon S3
storage
- To browse S3 data using Hue
To provide access to Amazon S3, you configure
AWS
Credentials that specify the authentication type
(role-based, for example) and the access and secret keys. Amazon offers
two types of authentication you can use with Amazon S3:
- IAM Role-based Authentication
-
Amazon Identity and Access Management (IAM) can be used to create users, groups, and
roles for use with Amazon Web Services, such as EC2 and Amazon S3. IAM role-based
access provides the same level of access to all clients that use the role. All jobs on
the cluster will have the same level of access to Amazon S3, so this is better suited
for single-user clusters, or where all users of a cluster should have the same
privileges to data in Amazon S3.
If you are setting up a
peer to
copy data to and from Amazon S3, using Cloudera Manager Hive or HDFS replication,
select this option.
If you are configuring Amazon S3 access for a
cluster deployed to Amazon Elastic Compute Cloud (EC2) instances
using the IAM role for the EC2 instance profile, you do not need
configure IAM role-based authentication for services such as
Impala, Hive, or Spark.
- Access Key Credentials
- This type of authentication requires an AWS Access Key and an AWS Secret key that you
obtain from Amazon and is better suited for environments where you have multiple users
or multi-tenancy. You must enable Kerberos when using the S3
Connector service. Enabling these services allows you to configure
selective access for different data paths.
Cloudera Manager stores these values securely
and does not store them in world-readable locations. The credentials are masked in the
Cloudera Manager Admin console, encrypted in the configurations passed to processes
managed by Cloudera Manager, and
redacted from the
logs.
The
client
configuration files generated by Cloudera Manager based on configured services do not include
AWS credentials. These clients must manage access to these credentials outside of Cloudera
Manager. Cloudera Manager uses credentials stored in Cloudera Manager for trusted clients such
as the Impala daemon and Hue. For access from YARN, MapReduce or Spark, see Using S3 Credentials with YARN, MapReduce, or Spark.