Configuring Ranger RMS (Hive-S3 ACL Sync)

Ranger Resource Mapping Server (RMS) is fully configured after installing in 7.2.18 public cloud. This topic provides further information about RMS configuration settings and workflows.

Important configuration information - Hive S3

  • Ranger RMS enables S3 access via Ranger Hive policies. Ranger RMS is pre-configured with the names of S3 and Hive services (aka Repos). In your installation, there may be multiple Ranger services created for S3 and Hive. These can be seen from the Ranger Admin web UI. RMS ACL sync is designed to work on a specific pair of S3<->Hive- Ranger services. Therefore, it is important to identify those service names before Ranger RMS is installed. The default value for Ranger S3 service name is cm_s3, and for the Ranger Hive service the default name is cm_hive.

  • After Ranger RMS installation, ensure that the Hive service identified in the installation above allows the rangerrms user select access to all tables in all databases in default (no-zone), as well as in all security-zones for the Hive service.

  • In case of custom kerberos principal/user, ensure that the Hive service identified in the installation above allows the rangerrmsfoo0 (custom) user select access to all tables in all databases in default (no-zone), as well as in all security-zones for the Hive service.

  • In public cloud, Ranger RMS by default tracks both external and managed tables in Hive. To configure Ranger RMS to track only external Hive tables, add the following configuration setting to Ranger RMS.

    ranger-rms.HMS.map.managed.tables=false
  • To disable RMS for S3 authorization, go to Cloudera Manager > Ranger-Ranger RAZ > Configuration > Advanced Configuration Snippet (Safety Valve) for ranger-raz-conf/ranger-raz-site.xml, then add empty values to the following settings:

    ranger.raz.service-type.S3.chained.services = 
    ranger.raz.service-type.S3.chained.services.cm_hive.impl = 

Advanced configurations

S3 plugin-side configurations

  • ranger.raz.service-type.s3.mapping.hive.authorize.with.only.chained.policies
    • true: Enforce strict Sentry semantics.
    • false: If there is no applicable Hive policy, let S3 determine access.
    • Default setting: false
  • ranger.raz.service-type.s3.accesstype.mapping.read
    • A comma-separated list of hive access types that S3 "read" maps to.
    • Default setting: select
  • ranger.raz.service-type.s3.accesstype.mapping.write
    • A comma-separated list of hive access types that S3 "write" maps to.
    • Default setting: update,alter
  • ranger.raz.service-type.s3.privileged.user.names
    • Default setting: admin,dpprofiler,hue,beacon,hive,impala
  • ranger.raz.service-type.s3.mapping.source.download.interval
    • The time in milliseconds between mappings download requests from the S3 Ranger plugin to RMS.

    • Default setting: 30 seconds

      By default, the RMS plugin checks for new mapping downloads every 30 seconds, based on this configuration. If you have mapping data (found in the raz_cm_hive_resource_mapping.json file) of approximately 360MB file size; then performing this operation every 30 seconds could cause an excessive load on the NameNode. After enabling performance logs, we can observe that saveToCache takes 11 seconds and loadFromCache operations take 7 seconds to complete. The cacheing process takes approximately 18~19 seconds to complete, as shown in the following example performance logs:

      DEBUG org.apache.ranger.perf.resourcemapping.init: [PERF] RangerMappingRefresher.loadFromCache(serviceName=cm_hive): 7449
      DEBUG org.apache.ranger.perf.resourcemapping.init: [PERF] RangerMappingRefresher.saveToCache(serviceName=cm_hive): 11787

      In this case, you should adjust the frequency of download RMS mappings to at least 18*2= 36 seconds. A more conservative value = 45 seconds. In this way, you can tune RMS configurations to optimize performance in the RAZ S3 chained plugin.

Hive service configuration

  • ranger.plugin.audit.exclude.users
    • This configuration, added in the Hive service-configs, lists the users whose access to Hive or Hive Metastore does not generate audit records. There may be a large number of audit records created when "rangerrms" makes requests to the Hive Metastore when downloading Hive table data. By adding the "rangerrms" user to the comma-separated list of users in this configuration, such audit records will not be generated.

RMS side configurations

  • ranger-rms.HMS.source.service.name
    • The Ranger S3 service name (default: cm_s3).
  • ranger-rms.HMS.target.service.name
    • The Ranger Hive service name (default: cm_hive).
  • ranger-rms.HMS.map.managed.tables
    • true – Track managed and external tables.
    • false – Track only external tables.
    • Default setting: true
  • ranger-rms.polling.notifications.frequency.ms
    • The time in milliseconds between polls from RMS to HMS for changes to tables.
    • Default setting: 30 seconds
  • ranger-rms.supported.uri.scheme
    • A comma-separated list of uri schemes supported by RMS
    • Default setting : s3a