Accessing AWS Glue

To access the Glue metadata in Data Catalog, you must note the following in your Data Catalog instance.

  • List the Glue Metadata by selecting the Glue data lake
  • Select one or more Glue assets and register the same with CDP
  • Verify if the registered Glue assets are listed in the Data Catalog owned data lake
  • Select the registered Glue asset and click to open the Asset Details page

Listing the Glue assets

In Data Catalog, when you select the AWS Glue data lake, you can view the list of Glue assets. These metadata assets are directly sourced from Glue.

When you click on one of the assets, the Asset Details page is displayed.

Next, on the main Data Catalog page, you must select the Glue data lake and select one of the Glue assets and register the asset to CDP. Click Register.

Optionally, you directly click on the Glue asset and register the asset on the Asset Details page.

Once the Glue asset is registered, the asset is imported into CDP.

Next, navigate back to the Data Catalog main page and select the Data Catalog owned data lake and select the type as Hive Table. The search results lists all the Hive table assets and you can view the Glue registered asset(s) as well. The registered Glue asset can be identified using

The Asset Details page for the Glue asset is populated by Atlas. While registering the Glue data, the data was written to the Hive Metastore and later Atlas synchronised the metadata.

Go back to the main Data Catalog page and select the Glue data lake. Note that the registered Glue asset(s) are greyed out or cannot be selected again.

You can still view the registered Glue assets (powered by Atlas) by clicking on the same and it navigates to the Asset Details page as seen above in the image.

Working with Ranger Authorization Service (RAZ) enabled AWS environment

For RAZ enabled AWS environment, you must employ the following permission settings to work with Data Catalog - Glue integration.