Setting up AWS Glue Catalog with Cloudera Data Catalog

You must map your Cloudera Data Catalog instance with AWS Glue Catalog.

Enable the entitlement for your Cloudera Data Catalog instance by running the following command on your Cloudera environment. For example:
```
$ cdp coreadmin grant-entitlement --entitlement-name
DATA_CATALOG_ENABLE_AWS_GLUE --account-id {account_id}
```
Add the relevant permissions in the corresponding AWS account:
1. Include permission to access Glue Catalog service by editing the policy accordingly.
  
  Make a note of the Assumer Instance Profile role that you intend to use and include full access authorization for AWS Glue.
  
  Refer to the following images as a guidance to complete the set up.
  
  note
  For Role ARN and Instance Profile ARNs, you must include the appropriate account number and role respectively.
2. Search for the role attached to the Instance Profile of the Cloudera environment. Use the Instance Profile that you have configured above with Glue related policy in your AWS Environment creation command.
  Use the following examples to setup AWS environment and AWS data lake as part of the Glue setup:
```
cdp environments create-aws-environment --profile default --cli-input-json '
{"environmentName”:”ab-ds-cli-7321”,
 "credentialName”:”cd2d-1234”,
 "Region":"us-region-2”,
 "securityAccess":{-insert the value--"},
 "Authentication":{---insert the value---"},
 "logStorage":{"storageLocationBase":"s3a://demo-e2e-test-state-bucket/ab-ds-cli-7321/logs","instanceProfile":"arn:aws:iam::<xxxxxxxxxxx>:instance-profile/<role-name>"},
 "vpcId":"vpc-0123456”,
 "subnetIds":["subnet-04fe923b902aa5cf2","subnet-099c7a631f0ebed3c"],
 "s3GuardTableName":"dc-pro-cli-7210",
 "Description":"ab-ds-cli-7321",
"enableTunnel":false,
 "workloadAnalytics":false,
 "freeIpa":{"instanceCountByGroup":1},
 }'

cdp environments set-id-broker-mappings \
--environment-name "ab-ds-cli-7321" \
--profile default \
--set-empty-mappings \
--data-access-role arn:aws:iam::<xxxxxxxxxxxx>:role/add-role \
--ranger-audit-role arn:aws:iam::<xxxxxxxxxxxx>:role/add-role
```
  Similarly, while setting up the data lake use the Instance Profile that you configured above with Glue related policy in your data lake creation command:
```
cdp datalake create-aws-datalake --profile default --runtime 7.2.12 --cli-input-json '
{"datalakeName":"ab-ds-cli-7321-sdx",
 "environmentName":"ab-ds-cli-7321",
 "cloudProviderConfiguration":{"instanceProfile":"arn:aws:iam::<xxxxxxxxxxx>:instance-profile/<role-name>","storageBucketLocation":"s3a://demo-e2e-test-state-bucket/ab-ds-cli-7321"},
 "scale":"LIGHT_DUTY",
 }'
```
  For more information, see Creating an AWS environment with a medium duty data lake using the CLI.
3. Navigate to the attached policy for the role.
4. When you manually create tables in AWS Glue Data Catalog, you must set the fully qualified path for the table location.
  For example: s3://my-aws-server-node-1/something/something.amazonaws.com/dc-pro-721-storage/glue/
3. You must set up the AWS Glue Data Catalog. For more information, see Populating the Glue Data Catalog. You must select only the CSV format which is currently supported for Cloudera Data Catalog and the delimiter which is used in the data.
While creating tables in AWS Glue Data Catalog manually, set the fully qualified path for location. For example: s3://my-aws-server-node-1/something/something/dc-pro-721-storage/glue/

AWS Glue metadata must be registered with Cloudera Data Catalog.

Setting up AWS Glue Catalog with Cloudera Data Catalog

We want your opinion

How can we improve this page?