Introduction to RAZ on AWS environments
CDP Public Cloud defaults to using cloud storage which might be challenging while managing data access across teams and individual users. The Ranger Authorization Service (RAZ) resolves this challenge by enabling Amazon S3 users to use fine-grained access policies and audit capabilities available in Apache Ranger similar to those used with HDFS files in an on-premises or IaaS deployment.
The core RAZ for AWS for Data Lakes and several Data Hub templates are available for production use platform-wide.
- Per-user home directories.
- Data engineering (Spark) efforts that require access to cloud storage objects and directories.
- Data warehouse queries (Hive/Impala) that use external tables.
- Access to Ranger's rich access control policies such as date-based access revocation, user/group/role-based controls, along with corresponding audit.
- Tag-based access control using the classification propagation feature that originates from directories.
In HDP and CDH deployments, files and directories are protected with a combination of HDFS Access Control Lists (ACLs) (in CDH, HDP) and Ranger HDFS policies (in HDP). Similarly, in an AWS CDP Public Cloud environment with RAZ for S3 enabled, Ranger's rich access control policies can be applied to CDP's access to S3 buckets, directories, and files and can be controlled with admin-level access to CDP alone.
You can backup and restore the metadata maintained in the Data Lake services of RAZ-enabled environments. For more information, see Data Lake Backup and Restore.
Limitations to use RAZ in AWS environments
- Currently, there is no automated way to enable RAZ in an existing CDP environment that does not have RAZ enabled.