User Guide
Also available as:
PDF

Fields Used for Defining Anonymization Rules

To define anonymization rules, use the following fields:

Table 3.1. 

FieldDescription
nameProvides a descriptive name for data anonymized by the rule. It has to be unique across all rules.
descriptionProvides a description for the rule.
rule_id

Defines the class of rules the current rule belongs to.

The supported rule IDs are: PATTERN, PROPERTY, XPATH, JSONPATH. This parameter is case-insensitive.

patterns

Defines a list of data patterns to be anonymized. It is applicable only to Pattern rules, where rule_id=PATTERN.

These patterns are matched in a case-insensitive manner, which means that the following pattern keystore.pass=([^\\s]*) matches with any of the following values:

  • keystore.pass=123

  • KeyStore.Pass=123

  • KEYSTORE.PASS=123

extract

Specifies a pattern to extract data matched through the list of patterns. The extract pattern is matched in a case-insensitive manner.

For example, in order to anonymize the oozie.https.keystore.pass password, the following pattern and extract values are used:

"patterns": ["oozie.https.keystore.pass=([^\\s]*)"]

"extract": "=([^\\s]*)",

This pattern is matched with values such as oozie.https.keystore.pass=1234.

The extract pattern is used to extract and anonymize only the values after the = (which in this example is 1234). The [^\\s]* denotes all non-whitespace characters, and the capturing group () is used the exclude = from the anonymized value.

If the extract pattern is not configured, the entire value matched with the pattern is anonymized (which in this example is oozie.https.keystore.pass=1234), regardless of capturing groups used in the patterns.

propertiesSpecifies a list of property name patterns to anonymize; these are case-insensitively matched. It is applicable only to Property rules.
parentNode

This field is applicable to property anonymization in XML files. It allows you to define the parent node of the property that you want to anonymize. By default, parentNode is set to "parentNode": "property", because typically the XML block to anonymize has the parent node property, like in the following example:

<property>
  <name>fs.s3a.proxy.password</name>
  <value>Abc7j*4$aTh</value>
  <description>Password for authenticating with proxy server.</description>
</property>

For example, you can anonymize main.ldapRealm.contextFactory.systemPassword in the following XML block that has a parent node called param by setting "parentNode": "param" in the anonymization rule:

<param>
  <name>main.ldapRealm.contextFactory.systemPassword</name>
  <value>pass</value>
</param>

The rule to anonymize the above content configures param as the root tag "parentNode": "param":

  {
    "name": "KNOX LDAP Password",
    "rule_id": "Property",
    "properties": ["main.ldapRealm.contextFactory.systemPassword"],
    "include_files": ["topologies/*.xml"],
    "action" : "REPLACE",
    "parentNode": "param",
    "replace_value": "Hidden"
  }
action

The supported actions are: ANONYMIZE, DELETE, REPLACE.

The action value is not case sensitive, so Anonymize or delete are also accepted values.

ANONYMIZE action encrypts the data using the key indicated by shared flag, DELETE deletes the data, and REPLACE replaces the data with a predefined value, which can be customized using replace_value.

replace_valueThis field is used by the REPLACE action to specify a replacement for the data to anonymize. The default value is Hidden.
shared

Indicates which key to use for anonymization (shared or private).

This value is used when the anonymization action is set to ANONYMIZE. It is a boolean type property (true/false). If set to true - the Hortonworks support team can unmask data if needed for diagnostic purposes; for example, host names and IP addresses for resolving issues on specific hosts or communication between hosts. Note that unmasked data is not stored in Hortonworks repositories; it is discarded as soon as the analysis finishes. The default value is true.

Rules configured with shared = false cannot be unmasked by Hortonworks (and in some cases might become a roadblock for support case analysis.)

include_filesSpecifies a list of glob file patterns for which the rule applies. If not configured, the rule is applicable to all files.
exclude_filesSpecifies a list of glob file patterns which are excluded from anonymization. If not configured, no file is excluded from the rule application.
enabledA flag (true/false) which specifies if the rule is enabled to be executed. By default, it is set to true.