Integrating Cloudera Data Catalog with AWS Glue Data Catalog

Integrating Cloudera Data Catalog with AWS Glue Catalog enables the users to browse and discover data as well as register data into Cloudera Shared Data Experience (through metadata translation or copy), so that it can be used with Cloudera Data Hub and other relevant experiences.

While using AWS Glue in Cloudera Data Catalog, you will be able to experience a complete snapshot metadata view, along with other visible attributes that can power your data governance capabilities.

How integration works

Assuming that the Cloudera Shared Data Experience is running in the users’ AWS account (that contains the same AWS account which has Glue Data Catalog and the data that has to be discovered), the credentials with the ExternalDataDiscoveryService (which is hosted in Cloudera Shared Data Experience) must be shared, so that these two entities can interact with each other. These credentials are used to launch Cloudera Shared Data Experience and other workload clusters on the users’ AWS account.

Prerequisites:
  • You must have full access to AWS Glue Catalog and also have access to the EMR cluster’s Hive Metastore instance.
  • You must set up the Cloudera platform.
  • You must have access to your AWS IT Admin and CDP Admin user credentials, which is required to enable the Cloudera platform to access AWS/EMR managed data in Cloudera.