Configuring Data Anonymization Rules
Anonymization rules define regular expressions to anonymize sensitive data (like IP addresses, domain names, and so on). Each rule uses JSON format to define what to match and the value to replace.
Note | |
---|---|
Anonymization rule formats vary between different SmartSense versions. Make sure that you consult the documentation that matches your SmartSense version. |
To define regular expression-based rules, refer to the following sample:
{ "name":"ip_address", "path":null, "pattern": "[ :\\/]?[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}[ :\\/]?", "extract": "[ :\\/]?([0-9\\.]+)[ :\\/]?", "shared": true }
Key reference:
name
- The rule namepath
- An optional regular expression path of files on which to apply this rule (defaultnull
means all files)pattern
- A regular expression that defines the pattern to match within the fileextract
- An optional regular expression to extract the data from the matched patternEach of the extracts is marked as a regular expression group.
shared
- A flag that indicates which key to use for anonymization (theshared
orprivate
) key is used for masking)If the shared key is used, the Hortonworks support team can unmask data if needed for diagnostic purposes: for example, host names and IP addresses for resolving issues on specific hosts or communication between hosts. Note that unmasked data is not stored in Hortonworks repositories; it is discarded as soon as the analysis finishes.
value
- An optional constant value to replaceNote that the value chosen should not be matchable by the pattern specified earlier. For example, if the pattern is .“*dfs.datanode.*”, the value should not contain “dfs.datanode”. Also, note that if the value is specified, the
shared
flag is ignored.
To use property-based rules, use the following example:
{ "name":"delete_oozie_jdbc_password", "path":"oozie-site.xml", "property": "oozie.service.JPAService.jdbc.password", "operation":"DELETE" "shared": false }
name
- The rule namepath
- A regular expression path of files on which to apply this ruleproperty
- The name of a specific property within the matching filesoperation
- EitherDELETE
orREPLACE
(the default)If
DELETE
is specified, the property is removed from the configuration file, and ifREPLACE
is specified, the property value is replaced by either a constant value or a masked value.value
- An optional value for theREPLACE
operation.If a value is not specified, a private or shared key is used to mask the data to replace.
enabled
- A flag used to enable or disable rule definition, the default being true.excludes
- A set of path patterns to be excluded by the rule: for example, “excludes”: [“oozie-site.xml”, “core-site.xml”]shared
- Flag to allow anonymized data to be reversed by Hortonworks. If shared is true, anonymized data is reversible by Hortonworks, if false, that data cannot be reversed.
Note Rules configured with
shared = false
cannot be unmasked by Hortonworks (and in some cases might become a roadblock for support case analysis.)