Cloudera Altus provides mechanisms to ensure security of data and network traffic in Altus and in the clusters that Altus manages.
Access Permission to Customer Resources
Altus creates clusters and runs jobs in your cloud service provider account on your behalf. Altus requires your permission to be able to use the resources required by the clusters and jobs in your cloud provider account.
How you provide Altus access to the resources in your cloud provider account depends on the cloud service provider.
Altus Access to AWS Resources
To allow Altus to create clusters or run jobs in your AWS account, an administrator for your AWS account must create a cross-account access role and grant Altus access to the role as a trusted principal. The policy defined for the cross-account access role must include permissions to allow Altus to create and manage instances and to perform the tasks and access the resources required for the Altus clusters and jobs. The AWS administrator also must set up the roles and resources in AWS that an Altus administrator can use in an Altus environment.
For more information about the AWS cross-account access role, see Cross-Account Access Role
For more information about the permissions required in the cross-account access role for Altus, see AWS Permissions.
Altus Access to Azure Resources
Following the Azure multi-tenant application model, Cloudera has registered Altus as application in Azure to enable Altus to access resources in your Azure subscription on your behalf. The Cloudera Altus application is integrated with Azure Active Directory to ensure secure sign-in and authorization of Altus services.
The administrator for your Azure subscription must assign a role to the Cloudera Altus application to enable Altus to create and manage clusters in your subscription. With role-based access control (RBAC), the Azure administrator determines how much access Altus can have in your Azure subscription. The Azure administrator also must set up the network resources in your Azure subscription that an Altus administrator can use in an Altus environment.
For more information about setting up role-based access for Altus, see Setting up Role-Based Access for Cloudera Altus.
For more information about managing role-based access to Azure accounts, see Manage access using RBAC and the Azure portal in the Microsoft Azure documentation.
Cluster Connection to Altus Services
When Altus creates a cluster in your cloud provider account, Altus needs the ability to manage the cluster life cycle and monitor its functions. For example, Altus must communicate to the cluster to configure the cluster after it is created or to submit jobs to an Altus Data Engineering cluster.
When the Altus Data Engineering service or Altus Data Warehouse service creates a cluster, it sets up an SSH server and generates a unique SSH key for the cluster, shared by all nodes in the cluster. The Altus service sets up an agent in each cluster node that creates an outbound SSH connection from the cluster back to the Altus service, authorized by the SSH key. Any communication from the Altus service to the cluster tunnels through the SSH connection created by the agent. If the connection drops, the agent in the cluster node immediately re-establishes the connection.
The nodes in the cluster require outbound internet connectivity to establish the connection to Altus services. Altus uses a network load balancer to manage the cluster connection requests. Because Altus creates the load balancer dynamically at the time that it creates the cluster, the IP address for the connection cannot be determined at configuration time.
Access to Altus Clusters
Altus provides security mechanisms when you access an Altus Data Engineering and Data Warehouse cluster and provides a read-only user account in the Cloudera Manager instance in the cluster.
Accessing the Altus Cluster Through SSH
If you want to connect to the cluster through SSH, you must provide an SSH public key when you create a cluster. Altus adds the public key to the authorized_keys file on each node in the cluster. When you access an Altus cluster, use the private key that corresponds to the public key to connect to the cluster through SSH.
You must configure rules in your AWS security group or Azure Network Security Group (NSG) to allow access to the Impala listening port in the Data Warehouse cluster for the JDBC or ODBC connection.
To access the cluster through SSH, you must configure rules in your AWS security group or Azure Network Security Group (NSG) to allow access to the listening port on the cluster node that you want to connect to. You can configure a port in the cluster instance to allow connections from IP addresses that you specify. For example, on AWS, you can configure the security group for the instance to allow connections to the SSH port 22. Depending on the network configuration of your organization, you might have to set up public IP addresses for the connection.
Accessing an Altus Data Warehouse Cluster Through ODBC or JDBC
You can connect to an Altus Data Warehouse cluster and run your SQL queries from business intelligence or data integration client tools using JDBC or ODBC. To access the cluster securely, set up your client tool to use a secure JDBC or ODBC connection.
You must configure rules in your AWS security group or Azure NSG to allow access to the Impala listening port in the Data Warehouse cluster for the JDBC or ODBC connection. The default listening port for the Impala service in the Data Warehouse cluster is port 21050. Set up the data source connections in your client tool to point to port 21050.
Starting in version 2.6.9, the Impala JDBC Connector for Cloudera Enterprise uses the Altus credentials file to connect to an Altus Data Warehouse cluster. When you use the Impala JDBC Connector to connect to the Altus Data Warehouse cluster, the JDBC driver uses the Altus credentials file to access the Altus API and get cluster connection information and verify your login credentials.
For more information about accessing Altus Data Warehouse clusters through JDBC or ODBC, see Client Access to Altus Data Warehouse Clusters.
Accessing the Cloudera Manager Instance in the Cluster
When you create the cluster, you can specify, or Altus can create, a user name and password for a Cloudera Manager user account. Altus creates a read-only user account in the Cloudera Manager instance with the user name and password. You can log in to Cloudera Manager using the read-only user account.
You can access the Cloudera Manager instance in an Altus cluster using SOCKS over SSH. You can also connect to the Cloudera Manager instance directly if your network allows direct connectivity.
For more information about connecting to the Cloudera Manager instance, see Cloudera Manager Connection.
For more information about using a SOCKS proxy in Altus, see to the Cloudera Manager instance, see SOCKS Proxy.
Secure Cluster Option in Altus
To create secure clusters in Altus, enable the Secure Cluster option in the Altus environment.
For more information about the secure cluster option in Altus, see Enable Secure Clusters.
- Altus sets up a Kerberos realm and a KDC in the cluster and turns on Kerberos authentication and encryption for the CDH services running in the cluster.
- Altus sets up a certificate authority (CA) and generates a certificate for each node in the cluster. Altus turns on TLS encryption for the CDH services running in the cluster.
Cluster Data Encryption
Altus handles data encryption differently for clusters in AWS and clusters in Azure.
For clusters on AWS, Altus encrypts data stored in all EBS volumes in the cluster except the root volume. Altus clusters are configured so that no sensitive data is written to the root volume. This enables Altus to exclude the root volume from data encryption without compromising the security of the cluster. You can provide the AWS KMS key to encrypt the volumes. If you do not provide a key, Altus uses the default AWS managed key.
For clusters on Azure, all data stored in attached Azure Managed Disks are encrypted by default. Azure encrypts the data using an encryption key that it manages internally. You cannot disable or reconfigure the encryption for data in local disks.
- Altus Data Engineering clusters
Altus Data Engineering clusters are single-tenant clusters.
For Data Engineering clusters, Altus uses Kerberos authentication for job submission. It uses one Kerberos principal to run jobs in Data Engineering clusters. Because Altus uses the same user principal to run all jobs, Altus does not require additional authorization to run jobs. Altus runs jobs with the same permissions on secure and unsecure clusters.
- Altus Data Warehouse clusters
Altus Data Warehouse clusters are multi-tenant clusters.
For Data Warehouse clusters, Altus enables authentication of users who connect to the clusters to access data in cloud object storage. Altus sets up an LDAP directory of Altus users and groups to authenticate users who access Altus Data Warehouse clusters. As part of creating the cluster, Altus generates a user ID and a password for each user. The user name and password is unique to the cluster. A user who accesses the cluster from a client tool must use these cluster credentials to be authenticated and allowed access to the cluster.
For more information about user access to Data Warehouse clusters, see User Access to Altus Data Warehouse Clusters.
Data Encryption in Cloud Storage
If you require secure data in your cloud object storage, Altus assumes that you encrypt the data using the encryption mechanism provided by the cloud service provider. For data in AWS S3, use the S3 bucket encryption provided by AWS, which encrypts data using the key you specify as the default key for the S3 bucket. For more information, see Amazon S3 Default Encryption for S3 Buckets. For data in Azure Data Lake Store, Azure encrypts all data by default.
Altus runs your workloads the same way whether your data in object storage is encrypted or not, relying on the permissions you set up for the cluster to access encrypted data in object storage. To enable Altus jobs to read data from and write data to object storage, configure the cluster with the appropriate permission to access the encrypted data. For clusters in AWS, configure the cluster with an IAM role that can read and write the encrypted data in the S3 buckets. For clusters in Azure, configure the cluster with the MSI that can read and write the encrypted data in ADLS.