Before You Install Sentry

Before you install Sentry, verify the prerequisites and performance guidelines.

Prerequisites

Verify the following prerequisites:

  • CDH 5.1.0 or higher managed by Cloudera Manager 5.1.0 or higher.
  • If you want to configure high availability for Sentry, you must have CDH 5.13.0 or higher and Cloudera Manager 5.13.0 or higher installed.
  • If you want to enable Sentry high availability, you must use a relational database, not a flat file, for the Sentry service database.
  • You must have a Java version installed that has JDK-8055949 fixed.
  • HiveServer2 and the Hive Metastore (HMS) running with strong authentication. For HiveServer2, strong authentication is either Kerberos or LDAP. For the Hive Metastore, only Kerberos is considered strong authentication (to override, see Securing the Hive Metastore).
  • If you want to use Sentry with Impala, you must have Impala 1.4.0 or higher running with strong authentication. With Impala, either Kerberos or LDAP can be configured to achieve strong authentication.
  • If you want to use Sentry with Cloudera Search, the Sentry service must be configured with a database. You must have Cloudera Search for CDH 5.1.0 or higher installed. Solr supports using Sentry beginning with CDH 5.1.0. The following features were added at different releases:
    • Sentry with policy files was added in CDH 5.1.0. Note that you cannot configure Sentry high availability with policy files because high availability requires Sentry to use a relational database.
    • Sentry with config support was added in CDH 5.5.0.
    • Sentry with a relational database-backed Sentry service was added with CDH 5.8.0. If you want to use high availability for Sentry with Solr, you must use this version of Solr or higher because Sentry must be configured with a relational database.
  • Implement Kerberos authentication on your cluster. For instructions, see Enabling Kerberos Authentication for CDH.

Performance Guidelines

Use the following guidelines for optimal performance:
  • Creating a large number of roles in Sentry can slow all aspects of Sentry performance. Use 5,000 or fewer roles for best performance.
  • Set the HMS heap size to at least 10 GB. This is required because by default, Sentry uses 12 connections to communicate with HMS. To verify the HMS heap size, open the Hive service, click the Configuration tab, and search for the Java Heap Size of Hive Metastore Server in Bytes property.
  • Cloudera recommends that for each Sentry host, you have 2.25 GB memory per million objects in the Hive database. Hive objects include servers, databases, tables, partitions, columns, URIs, and views.

    Make sure that the JVM heap size is set to a value that is appropriate for the memory requirements. You can check the heap size in Cloudera Manager. Open the Sentry service, click the Configuration tab, and search for the Java Heap Size of Sentry Server in Bytes property. Set that property to the maximum size for the Java process heap memory.

    The amount of memory that Sentry requires increases linearly as the number of objects in the Hive database increases. The graph below shows the memory required for Sentry based on the number of Hive objects.

    Sentry Memory Usage Based on Hive Objects

  • You can configure the number of notifications Sentry fetches from HMS at a time to reduce the overhead of getting all HMS notifications at once. This is especially useful when running large DDL jobs. To configure the number of notifications, open the Sentry service in Cloudera Manager and view the Configuration tab. Search for the Sentry Service Advanced Configuration Snippet (Safety Valve) for sentry-site.xml property. Click the plus sign (+) to add a new parameter. In the Name field, enter sentry.hms.fetch.size. In the Value field, enter the number of events that you want Sentry to fetch at a time. For example, if Sentry needs to fetch 1,000 events, you can enter 100 in the Value field, and Sentry will fetch 100 events 10 times instead of all 1,000 events in the same fetch.