About Ranger RMS for S3 (HIVE-S3 ACL Sync)

Ranger Resource Mapping Server (RMS) enables automatic translation of access policies from HIVE to S3. This feature is available on Cloudera Base on premises from the 7.3.2.0 release onwards, where RAZ on premises for Amazon S3 compatible object store installation is supported. This feature co-exists with RMS support for HDFS and Ozone file systems on Cloudera Base on premises.

Ranger RMS allows you to authorize access to S3 locations using policies defined for Hive tables and database resources. The permissions defined in these hive policies are propagated to the S3 locations of those databases/tables and all files and directories under them.

RMS periodically connects to the Hive Metastore and pulls Hive metadata (database name, table name) to S3 file-name mapping. This is done by a RAZ-chained plugin (running in the Ranger RAZ service) which has an additional HivePolicyEnforcer module. After enabling RMS for S3 authorization, the RAZ-chained plugin downloads Hive policies, tags, and roles from Ranger Admin, along with the mappings from Ranger RMS. S3 access is determined by both S3 policies and Hive policies.

Figure 1. HIVE-S3 ACL Sync using Ranger RMS

Phase I (items 1-3 above)

Ranger RMS periodically connects to the HIVE Metastore and pulls HIVE metadata (database name, table name) to S3 file-name mapping.

Phase II (items 4-9 above)

The Ranger RAZ S3 Chained Plugin (running in the RAZ service) periodically pulls S3 policies from Ranger Admin. With the introduction of Ranger RMS, the Ranger RAZ S3 Chained Plugin (running in the RAZ service) has been extended with an additional HIVEPolicyEnforcer module. It now pulls down the HIVE-S3 mappings from RMS and HIVE policies from Ranger Admin.

After phase II completes, the requested S3 access is determined in the RAZ service by the S3 and HIVE policies defined by the Ranger Administrator.

The S3 to HIVE access type mappings follow:
  • Access Type mapping for S3 to HIVE for Database:
    • _any=[_any]

    • read=[_any]

    • write=[create, drop, alter]

  • Access Type mapping for S3 to HIVE for Table:
    • _any=[_any]

    • read=[select]

    • write=[update, alter]

If you create tables under a database but the S3 location of the corresponding table does not reside under the S3 location of that database (for example, table locations are external locations), the HIVE policies (database name, table = *, column= *) translate into S3 access rules and allow the RAZ S3 chained plugin to enforce them. If the policy is created only for the database resource, the same access translates to the S3 location of that database only, not for the tables residing under that database.

Read access for any user in the database location is allowed through the default HIVE policy for all databases. This behavior is treated as _any access, which is similar to the HIVE command show tables. If a user has no HIVE policy which allows access to the database, then the access is denied to the corresponding Ozone location of that database. This access evaluation aligns with the HDFS db-level grants feature.

Ranger RMS assumptions and limitations

  • All partitions of a table are assumed to be under the location specified for the table. Therefore, table permissions will not authorize access to partitions that store data outside the location specified for the table. For example, if a table is located in a /warehouse/foo S3 directory, all partitions of the table must have locations that are under the /warehouse/foo directory.
  • The Ranger RMS ACL-sync feature supports a single logical HMS to evaluate S3 access through HIVE permissions. This is aligned with the Sentry implementation in CDH.
  • When Ranger RAZ and RMS services are installed in Cloudera Manager, the RMS HIVE-S3 ACL Sync feature to authorize S3 locations is enabled by default, as it is pre-configured with chained plugin settings.
  • Permissions granted on views (traditional and materialized) do not extend to S3 access. This is aligned with the Sentry implementation in CDH.
  • RMS ACL-sync is designed to work on a specific pair of S3<->Hive Ranger services. Ranger RMS supports only one pair of Hive and S3 services. By default, cm_s3 is configured as the source service and cm_hive as the target service.
  • If a Cloudera Base on premises deployment supports multiple logical HMS with a single Ranger, Ranger RMS (Hive-S3 ACL-Sync) works with only one logical HMS. Permissions granted on databases/tables in other logical HMS instances will not be considered to authorize S3 access.
  • Ranger RAZ memory requirements must be increased based on the number of HIVE table mappings downloaded to the S3 Ranger plugin. Additionally, maintaining HIVE policies in the memory cache will also require additional memory.
  • Ranger RMS service will use the same database as Ranger Admin to store mappings downloaded from HMS.
  • Ranger RAZ service will have an S3 chained plugin, and it will perform authorization based on the policies and mappings downloaded and stored in the policy-cache directory of the RAZ service. Even if the RMS service is stopped, authorization will continue to work based on the files available in the policy-cache directory.
  • Expect Ranger RAZ CPU load to increase due to additional access evaluation performed to enforce HIVE policies and periodic downloading and processing of the HIVE table mappings. The latter increase is proportional to the number of table mappings downloaded to the RAZ S3 chained plugin.
  • When multiple databases are mapped to a single S3 location, and if a HIVE policy allows a user to access one database, then users will be able to access its S3 location and all other files and directories under it. This may include table or database directories of other databases and tables. However, users will not be able to access other databases or tables under it through Hive queries.

    For example,

    music_a, music_b, and music_c are created at S3 path '/data'. Policy-A allows the 'sam' user 'all' access on resource = {database=music_a; table= * ; column= * ; }. Now, the 'sam' user will get all access to the S3 path '/data' and files and directories under it. Therefore, the 'sam' user will be able to access S3 location of tables under the music_b and music_c databases as long as those locations reside under the '/data' directory. However, the 'sam' user will not be able to access the music_b and music_c databases or any tables under these databases through Hive queries.

Comparison with Sentry HDFS ACL sync

The Ranger RMS (Hive-S3 ACL-Sync) feature resembles the Sentry HDFS ACL-Sync feature in the way it downloads and keeps track of the HIVE table-to-S3 location mapping.

It differs from Sentry in the way it completely and transparently supports all features that Ranger policies express. Therefore, support for tag-based policies, security zones, masking, row-filtering, and audit logging is included with this implementation.

Also, the feature is enabled or disabled by a simple configuration on the Ranger RAZ side, allowing each installation the option of turning this feature on or off.