Pattern-Based Anonymization Rules
Write pattern-based rules to anonymize data by pattern, using the extract pattern to extract content to anonymize.
Required and Optional Fields
name
rule_id (should be set to PATTERN)
patterns
extract (optional)
include_files (optional)
exclude_files (optional)
action (optional, default value is ANONYMIZE)
replace_value (optional, applicable only when action=REPLACE)
shared (optional, default value is true)
enabled (optional, default value is true)
For more information on each field, refer to Fields Used for Defining Anonymization Rules.
Rule Definition Example (without extract)
{ "name": "EMAIL", "rule_id": "Pattern", "patterns": ["(?<![a-z0-9._%+-])[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,6}(?![a-z0-9._%+-])$?", "shared": false }
The content of the input file version.txt is:
Hadoop 2.7.3.2.5.0.0-1245 Subversion git@github.com:hortonworks/hadoop.git -r cb6e514b14fb60e9995e5ad9543315cd404b4e59 Compiled by jenkins on 2016-08-26T00:55Z
The content of the output file version.txt, with anonymized email address, is:
Hadoop 2.7.3.2.5.0.0-1245 Subversion ‡qpe@unqfay.mjp‡:hortonworks/hadoop.git -r cb6e514b14fb60e9995e5ad9543315cd404b4e59 Compiled by jenkins on 2016-08-26T00:55Z
Rule Definition Example (with extract)
{ "name": "KEYSTORE", "rule_id": "Pattern", "patterns": ["oozie.https.keystore.pass=([^\\s]*)", "OOZIE_HTTPS_KEYSTORE_PASS=([^\\s]*)"], "extract": "=([^\\s]*)", "include_files": ["java_process.txt", "pid.txt", "ambari-agent.log", "java_process.txt", "oozie-env.cmd"], "shared": false }
The content of the input file oozie-env.cmd is:
oozie.https.keystore.pass=abcde set OOZIE_HTTPS_KEYSTORE_PASS=12345
To anonymize the content of the input file, the following anonymization patterns configured in the rule will be used:
"oozie.https.keystore.pass=([^\\s]*)", "OOZIE_HTTPS_KEYSTORE_PASS=([^\\s]*)"
oozie.https.keystore.pass=([^\\s]*)
and
OOZIE_HTTPS_KEYSTORE_PASS=([^\\s]*)
match with
oozie.https.keystore.pass=abcde
and
OOZIE_HTTPS_KEYSTORE_PASS=12345
respectively.
Next, the extract pattern "=([^\\s]*)
is used to identify 12345 and abcde, which are
the values to be anonymized.
The content of the output file oozie-env.cmd is:
oozie.https.keystore.pass=‡vvdwa‡ set OOZIE_HTTPS_KEYSTORE_PASS=‡zdowg‡
The values of oozie.https.keystore.pass
and
OOZIE_HTTPS_KEYSTORE_PASS
have been anonymized.
For more examples, refer to Examples of Pattern-Based Anonymization Rules.