Configuring ADLS Access Using Cloudera Manager

Minimum Required Role: User Administrator (also provided by Full Administrator)

To configure access to Microsoft Azure Data Lake Store (ADLS) using Cloudera Manager, you configure ADLS credentials using the Cloudera Manager External Accounts page and then add the ADLS Connector Service to your cluster. Adding the ADLS Connector Service allows users and administrators of the cluster to seamlessly and securely access ADLS in the following ways:
  • Run Hive and Impala queries on tables backed by data stored in ADLS.
  • Browse ADLS stores using Hue.
Other cluster services may also gain access to ADLS by having their users provide their own credentials directly using the Hadoop Credential Provider mechanism.

When you configure credentials using Cloudera Manager, it provides a more secure way to access ADLS using credentials that are not stored in plain-text files. The client configuration files generated by Cloudera Manager based on configured services do not include ADLS credentials. Command-line and API clients must manage access to these credentials outside of Cloudera Manager. Cloudera Manager provides credentials directly to trusted clients such as the Impala daemon and Hue. For access from YARN, MapReduce or Spark, see Configuring ADLS Connectivity.

Configuring ADLS Credentials in Cloudera Manager

If you have already created your ADLS account and configured ADLS credentials in Cloudera Manager, skip this section and continue with Adding the ADLS Connector Service.

  1. Create your ADLS account. See the Microsoft documentation.
  2. Create the Active Directory service principal in the Azure portal. See the Microsoft documentation on creating a service principal. You will need the following to configure ADLS credentials in Cloudera Manager:
    • Client ID
    • Client secret
    • Tenant ID
  3. Grant the service principal permission to access the ADLS account. See the Microsoft documentation on Authorization and access control. Review the section, "Using ACLs for operations on file systems" for information about granting the service principal permission to access the account. The service principal should have read, write, and execute permissions.

    You can skip the section on RBAC (role-based access control) because RBAC is used for management and you only need data access.

  4. Open Cloudera Manager and go to Administration > External Accounts.
  5. Select the Azure Credentials tab.
  6. Click Add AD Service Principal.
  7. In the Name field, enter a unique name to identify the credentials in your cluster.
  8. Enter the Client ID, Client Secret Key, and Tenant ID that you obtained when creating the ADLS account and service principal.
  9. Click Save.

    The Connect to Azure Data Lake Storage dialog box displays.

  10. Click Enable for Cluster_Name to add the ADLS Connector Service, as described in the next section.

Adding the ADLS Connector Service

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

Use this procedure to add the ADLS Connector Service using Cloudera Manager. If you have not already configured ADLS Credentials in Cloudera Manager, see Configuring ADLS Credentials in Cloudera Manager before continuing.

  1. In the Cloudera Manager Admin console, go to the cluster where you want to add the ADLS Connector Service.
  2. Click Actions > Add Service.
  3. Select ADLS Connector.
  4. Click Continue.

    The Add ADLS Connector Service to Cluster Name wizard displays.

  5. Select the ADLS credential to use with this service from the Name drop-down list.
  6. Click Continue.

    The wizard checks your configuration for compatibility with ADLS and reports any issues. The wizard does not allow you to continue if you have an invalid configuration. Fix any issues, and then repeat these steps to add the ADLS Connector Service.

  7. Select a Credentials Protection Policy. Choose one of the following:
    • Less Secure

      Credentials can be stored in plain text in some configuration files for specific services (currently Impala, Hive, and Hue) in the cluster.

      This configuration is appropriate for single-user clusters or clusters where strict fine-grained access control is not required.

    • More Secure

      The More Secure option requires that you enable Kerberos and the Apache Sentry Service in the cluster.

      Cloudera Manager distributes secrets to a limited set of services (currently Impala and Hue) and enables those services to access ADLS securely, using encrypted credentials. It does not distribute these credentials to any other clients or services.

      Other ADLS configurations settings that are not sensitive are included in the configuration of all services and clients as needed.

      This configuration is appropriate for secure, multi-tenant clusters that provide fine-grained access control to data stored in ADLS. You can use the Apache Sentry Service to limit access to specific users and applications.

  8. Click Continue.
  9. If you have enabled the Hue service, the Additional Configuration for Hue screen displays. Enter the domain name of the Hue Browser Data Lake Store in the form: store_name.azuredatalakestore.net
  10. Click Continue.

    The Restart Dependent Services page displays and indicates the dependent services that need to be restarted.

  11. Select Restart Now to restart these services. You can also restart these services later.
  12. Click Continue to complete the addition of the ADLS Connector Service. If Restart Now is selected, the dependent services are restarted. The progress of the restart commands displays.
  13. When the commands finish executing, click Continue.
  14. Click Finish.

Managing ADLS Credentials in Cloudera Manager

To edit or remove ADLS credentials:
  1. Open Cloudera Manager and go to Administration > External Accounts.
  2. Select the Azure Credentials tab.
  3. To remove a credential, in the row for the credential you want to change, click Actions > Remove.

    You cannot remove a credential that is currently being used by the ADLS Connector Service; you must first remove the Connector Service from the cluster.

  4. To edit a credential, in the row for the credential you want to edit, click Actions > Edit Credential.
  5. Edit the fields of the credential as needed and click Save.

Removing the ADLS Connector Service

To remove the service:
  1. Open Cloudera Manager and go to Administration > External Accounts.
  2. Select the Azure Credentials tab.
  3. In the row for the credential used for the service, click Actions > Edit Connectivity.

    The Connect to Azure Data Lake Storage dialog box displays.

  4. Click Disable for Cluster_name.
  5. Click OK.

    A message displays saying "The configuration has been updated". You will need to restart any stale services. Click the View Stale Configurations link to open the Stale Configurations page. Click Restart Stale Services.

You can also delete the ADLS Connector Service from the Cloudera Manager home page for the cluster. See Deleting Services.