Row-level filtering in Hive with Ranger policies
Row-level filtering helps simplify Hive queries. By moving the access restriction logic down into the Hive layer, Hive applies the access restrictions every time data access is attempted. This helps simplify authoring of the Hive query, and provides seamless behind-the-scenes enforcement of row-level segmentation without having to add this logic to the predicate of the query.
Row-level filtering also improves the reliability and robustness of Hadoop. By providing row-level security to Hive tables and reducing the security surface area, Hive data access can be restricted to specific rows based on user characteristics (such as group membership) and the runtime context in which this request is issued.
Typical use cases where row-level filtering can be beneficial include:
- A hospital can create a security policy that allows doctors to view data rows only for their own patients, and that allows insurance claims administrators to view only specific rows for their specific site.
- A bank can create a policy to restrict access to rows of financial data based on the employee's business division, locale, or based on the employee's role (for example: only employees in the finance department are allowed to see customer invoices, payments, and accrual data; only European HR employees can see European employee data).
- A multi-tenant application can create logical separation of each tenant's data so that each tenant can see only their own data rows.
You can use Apache Ranger row-level filters to set access policies for rows in Hive tables. Row-level filter policies are similar to other Ranger access policies. You can set filters for specific users, groups, and conditions.
The following conditions apply when using row-level filters:
- The filter expression must be a valid WHERE clause for the table or view.
- Each table or view should have its own row-level filter policy.
- Wildcard matching is not supported on database or table names.
- Filters are evaluated in the order listed in the policy.
- An audit log entry is generated each time a row-level filter is applied to a table or view.
- On the Service Manager page, select an existing Hive Service.
Select the Row Level Filter tab, then click Add New Policy.
On the Create Policy page, add the following information for the row-level filter:
Table 1. Policy Details Field Description
Enter an appropriate policy name. This name cannot be duplicated across the system. The policy is enabled by default. normal/override Enables you to specify an override policy. When override is selected, the access permissions in the policy override the access permissions in existing policies. This feature can be used with Add Validity Period to create temporary access policies that override existing policies.
Type in the applicable database name. The auto-complete feature displays available databases based on the entered text.
Type in the applicable table name. The auto-complete feature displays available tables based on the entered text.
Audit Logging Audit Logging is set to Yes by default. Select No to turn off audit logging. Description Enter an optional description for the policy. Add Validity Period Specify a start and end time for the policy. Table 2. Row Filter Conditions
Select Role Specify the roles to which this policy applies. Select Group
Specify the groups to which this policy applies.
The public group contains all users, so granting access to the public group grants access to all users.
Select User Specify one or more users to which this policy applies. Access Types Currently select is the only available access type. This will be used in conjunction with the WHERE clause specified in the Row Level Filter field. Add Row Filter
- To create a row filter for the specified users, groups, and roles, Click Add Row Filter, then type a valid WHERE clause in the Enter filter expression box.
- To allow Select access for the specified users and groups without row-level restrictions, do not add a row filter (leave the setting as "Add Row Filter").
- Filters are evaluated in the order listed in the policy. The filter at the top of the Row Filter Conditions list is applied first, then the second, then the third, and so on.
To move a condition in the Row Filter Conditions list (and therefore change the
order in which it is evaluated), click the dotted rows icon at the left of the
condition row, then drag the condition to a new position in the list.
- Click Add to add the new row-level filter policy.