Using the External Databases for Cloudera SDX

Cloudera Shared Data Experience (SDX) for Altus enables you to share cluster metadata and security policies between workloads and clusters running in the cloud. You can set up Cloudera SDX to share data between CDH clusters in an Altus Director deployment and Altus Data Warehouse and Altus Data Engineering clusters.

Cloudera SDX requires external databases for Hive metastore and Sentry that can be accessed by clusters that need to share metadata and security policies. You can set up Cloudera SDX for Altus clusters sharing metadata in AWS or for clusters sharing metadata on Azure.

To set up Cloudera SDX, complete the following steps:
  1. In Altus Director, deploy clusters with external databases for Hive metastore and Sentry.
  2. In Altus services, create secure clusters that use a configured SDX namespace pointing to the external databases for Hive metastore and Sentry.

For more information about setting up Cloudera SDX to share cluster metadata and security policies between CDH clusters, see Cloudera Altus Director SDX Integration in the Cloudera Engineering Blog site.

Deploying Clusters with Cloudera SDX in Altus Director

When you deploy clusters in Director that would share data with Altus Data Warehouse and Altus Data Engineering clusters, you must configure the clusters to use external databases for the Hive metastore and for the Sentry service. You can set up the services to use existing databases or configure Altus Director to set up an external database server and create the databases and schemas.

For instructions on how to set up external databases for the CDH services, see Using an External Database for Cloudera Manager and CDH.

When you deploy clusters with the Sentry service, enable Kerberos so that the clusters can share Sentry access policies. For instructions on how to set up the Sentry service in an Altus Director deployment, see Enabling Sentry Service Authorization.

To deploy clusters in Altus Director to use Cloudera SDX, complete the following steps:
  1. Create a cluster that uses external databases for the Hive metastore and Sentry.

    On AWS, you can configure Altus Director to create an Amazon Relational Database Service (RDS) instance and set up databases for the Hive metastore and Sentry data. On Azure, install a database server and manually create databases for the Hive metastore and Sentry.

    Define the databases in the Altus Director configuration file. Altus Director provides an example configuration file that includes the parameters that you would set up to enable you to use Cloudera SDX to share data between clusters in the AWS.

    For more information about using external databases for CDH services in Altus Director deployments, see Using an External Database for Cloudera Manager and CDH.

    For more information about setting up a Hive metastore database using RDS, see the blog post, How To Set Up a Shared Amazon RDS as Your Hive Metastore.

  2. Note the URI and administrator credentials for the Hive metastore and Sentry databases.

    You must provide the URI and database credentials for the Hive metastore and Sentry databases when you set up the cluster in Altus services.

    You can find the database URI on the Altus Director web UI. The Cluster Details page displays the database servers that are configured for the cluster with a note about setting up SDX namespaces in Altus services. Click the Learn More link in the note to display the JDBC URLs for the Hive metastore and Sentry databases.

    The following image shows the database URIs on the details page of a cluster:

  3. Set up access to the object storage where you created the Hive metastore and Sentry databases.

    The cluster must have read and write privileges to the object storage so that workloads can create and update data and metadata in the Hive metastore and Sentry databases.

  4. Grant administrator privileges to the Sentry administrator group.

    You can use Hue, Beeline, or impala-shell to grant Sentry permissions to the group that accesses the Hive metastore. You can create a Sentry role with the appropriate permissions and assign the role to the group. Clusters that belong to the same group have the same access permissions to data and metadata in the Hive metastore.

    For more information about setting up authorization with Sentry, see Authorization with Apache Sentry.

Creating Clusters with a Configured SDX Namespace in Altus Services

You can set up a configured SDX namespace in Altus services that points to the Hive metastore and Sentry databases that you set up in Altus Director. The configured SDX namespace allows you to share cluster metadata and security policies between the Altus Director deployment and Altus Data Engineering and Altus Data Warehouse clusters.

To set up a cluster with a configured SDX namespace in Altus services, complete the following steps:
  1. Identify the Hive metastore and Sentry databases that you want to use for the configured SDX namespace.

    Get the connection URI and administrator credentials for the Hive metastore and Sentry databases that you set up for the cluster in Altus Director. See Step 2 in Deploying Clusters with Cloudera SDX in Altus Director.

  2. Create a configured SDX namespace in Altus.

    When you create the configured SDX namespace, you must provide the connection URI and administrator credentials for the Hive metastore and Sentry databases.

    For more information about creating a configured SDX namespace, see Creating an SDX Namespace in the Altus documentation.

  3. Create a secure Altus Data Engineering or Data Warehouse cluster and set up the cluster to use the configured SDX namespace.

    When you create the Altus cluster:
    • Specify the configured SDX namespace that uses the Hive metastore and Sentry databases for the clusters in Altus Director.
    • Specify an environment with the Secure Clusters option enabled.
  4. Grant administrator privileges to the Sentry administrator group.

    When you create a configured SDX namespace, Altus creates an Altus group for use as an administrator group in Sentry. Membership in the administrator group enables you to create roles and manage Hive metastore and Sentry privileges.

    For an Altus Data Engineering cluster, you can submit a Hive job to run the commands to grant privileges.

    For an Altus Data Warehouse cluster, use the Query Editor to grant the privileges.

    For more information, see Setting up a Cluster with a Configured SDX Namespace in the Altus documentation.

    If you created a Sentry role for the clusters deployed by Altus Director, you can assign the same Sentry role to the Altus groups that access the shared Hive metastore. See Step 4 in Deploying Clusters with Cloudera SDX in Altus Director.