Dynamic resource-based column masking in Hive with Ranger policies
You can use Apache Ranger dynamic resource-based column masking capabilities to protect sensitive data in Hive in near real-time. You can set policies that mask or anonymize sensitive data columns (such as PII, PCI, and PHI) dynamically from Hive query output. For example, you can mask sensitive data within a column to show only the first or last four characters.
Dynamic column masking policies are similar to other Ranger access policies for Hive. You can set filters for specific users, groups, and conditions. With dynamic column-level masking, sensitive information never leaves Hive, and no changes are required at the consuming application or the Hive layer. There is also no need to produce additional protected duplicate versions of datasets.
The following conditions apply when using Ranger column masking policies to mask data returned in Hive query results:
-
A variety of masking types are available, such as show last 4 characters, show first 4 characters, Hash, Nullify, and date masks (show only year).
-
You can specify a masking type for specific users, groups, and conditions.
-
Wildcard matching is not supported.
-
Each column should have its own masking policy.
-
Masks are evaluated in the order listed in the policy.
-
An audit log entry is generated each time a masking policy is applied to a column.