Dynamic tag-based column masking in Hive with Ranger policies

Where Ranger resource-based masking policy for Hive anonymizes data from a Hive column identified by the database, table, and column, Ranger tag-based masking policy anonymizes Hive column data based on tags and tag attribute values associated with Hive column (usually specified as metadata classification in Atlas).

The following conditions apply when using Ranger column masking policies to mask data returned in Hive query results:

  • A variety of masking types are available, such as show last 4 characters, show first 4 characters, Hash, Nullify, and date masks (show only year).
  • You can specify a masking type for specific users, groups, and conditions.

  • Wildcard matching is not supported.

  • If there are multiple tag masking policies applied to the same Hive column, the masking policy with the lexicographically smallest policy-name is chosen for enforcement, E.G., policy "a" is enforced before policy "aa".

  • Masks are evaluated in the order listed in the policy.

  • An audit log entry is generated each time a masking policy is applied to a column.

  1. Select Access Manager > Tag Based Policies, then select a tag-based service.
  2. Select the Masking tab, then click Add New Policy.
  3. In Create Policy, add the following information for the column-masking filter:
    Table 1. Policy Details
    Field Description

    Policy Type

    (required)

    Set to Masking by default.

    Policy Name

    (required)

    Enter an appropriate policy name. This name cannot be duplicated across the system. The policy is enabled by default.
    normal/override Enables you to specify an override policy. When override is selected, the access permissions in the policy override the access permissions in existing policies. This feature can be used with Add Validity Period to create temporary access policies that override existing policies.

    TAG

    (required)

    Enter the applicable tag name, for example MASK.

    Audit Logging Audit Logging is set to Yes by default. Select No to turn off audit logging.
    Description Enter an optional description for the policy.
    Add Validity Period Specify a start and end time for the policy.
    Policy Conditions (applied at the policy level)

    Click the + icon to add policy conditions. Currently "Accessed after expiry_date? (yes/no)" is the only available policy condition.

    "Accessed after expiry_date (yes/no)?": To set this condition, type yes in the text box, then click check mark to add the condition.

    Enter boolean expression: Available for allow or deny conditions on tag-based policies. For examples and details, see Using Tag Attributes and Values in Ranger Tag-Based Policy Conditions.

    Click Save to save the policy condition.

    Table 2. Mask Conditions

    Label

    Description

    Select Role Specify the roles to which this policy applies.
    Select Group

    Specify the groups to which this policy applies.

    The public group contains all users, so granting access to the public group grants access to all users.

    Select User Specify one or more users to which this policy applies.
    Policy Conditions (applied at the item level)

    Click Add Conditions to add policy conditions. Currently "Accessed after expiry_date? (yes/no)" is the only available policy condition.

    "Accessed after expiry_date (yes/no)?": To set this condition, type yes in the text box, then click check mark to add the condition.

    Enter boolean expression: Available for allow or deny conditions on tag-based policies. For examples and details, see Using Tag Attributes and Values in Ranger Tag-Based Policy Conditions.

    Access Types Currently select is the only available access type for the hive component.
    Select Masking Option
    To create a row filter for the specified users, groups, and roles, click Select Masking Option, then select a masking type:
    • Redact – mask all alphabetic characters with "x" and all numeric characters with "n".
    • Partial mask: show last 4 – Show only the last four characters.
    • Partial mask: show first 4 – Show only the first four characters.
    • Hash – Replace all characters with a hash of entire cell value.
    • Nullify – Replace all characters with a NULL value.
    • Unmasked (retain original value) – No masking is applied.
    • Date: show only year – Show only the year portion of a date string and default the month and day to 01/01
    • Custom – Specify a custom masked value or expression. Custom masking can use any valid Hive UDF (Hive that returns the same data type as the data type in the column being masked).

    Masking conditions are evaluated in the order listed in the policy. The condition at the top of the Masking Conditions list is applied first, then the second, then the third, and so on.

  4. Click + to add additional conditions. Conditions are evaluated in the order listed in the policy. The condition at the top of the list is applied first, then the second, then the third, and so on.
  5. Click Add to add the new policy.