Pattern-Based Anonymization Rules
Write pattern-based rules to anonymize data by pattern, using the extract pattern to extract content to anonymize.
Required and Optional Fields
name
rule_id (should be set to PATTERN)
patterns
extract (optional)
include_files (optional)
exclude_files (optional)
action (optional, default value is ANONYMIZE)
replace_value (optional, applicable only when action=REPLACE)
shared (optional, default value is true)
enabled (optional, default value is true)
For more information on each field, refer to Fields for Defining Anonymization Rules.
Rule Definition Example (without extract)
{ "name": "EMAIL", "rule_id": "Pattern", "patterns": ["(?<![a-z0-9._%+-])[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,6}(?![a-z0-9._%+-])$?"], "shared": false }
The content of the input file version.txt is:
Hadoop 2.7.3.2.5.0.0-1245 Subversion git@github.com:hortonworks/hadoop.git -r cb6e514b14fb60e9995e5ad9543315cd404b4e59 Compiled by jenkins on 2016-08-26T00:55Z
The content of the output file version.txt, with anonymized email address, is:
Hadoop 2.7.3.2.5.0.0-1245 Subversion ‡qpe@unqfay.mjp‡:hortonworks/hadoop.git -r cb6e514b14fb60e9995e5ad9543315cd404b4e59 Compiled by jenkins on 2016-08-26T00:55Z
Rule Definition Example (with extract)
{ "name": "KEYSTORE", "rule_id": "Pattern", "patterns": ["oozie.https.keystore.pass=([^\\s]*)", "OOZIE_HTTPS_KEYSTORE_PASS=([^\\s]*)"], "extract": "=([^\\s]*)", "include_files": ["java_process.txt", "pid.txt", "ambari-agent.log", "java_process.txt", "oozie-env.cmd"], "shared": false }
The content of the input file oozie-env.cmd is:
oozie.https.keystore.pass=abcde set OOZIE_HTTPS_KEYSTORE_PASS=12345
To anonymize the content of the input file, the following anonymization patterns configured in the rule will be used:
"oozie.https.keystore.pass=([^\\s]*)", "OOZIE_HTTPS_KEYSTORE_PASS=([^\\s]*)"
oozie.https.keystore.pass=([^\\s]*)
and
OOZIE_HTTPS_KEYSTORE_PASS=([^\\s]*)
match with
oozie.https.keystore.pass=abcde
and
OOZIE_HTTPS_KEYSTORE_PASS=12345
respectively.
Next, the extract pattern "=([^\\s]*)
is used to identify 12345 and abcde, which are the
values to be anonymized.
The content of the output file oozie-env.cmd is:
oozie.https.keystore.pass=‡vvdwa‡ set OOZIE_HTTPS_KEYSTORE_PASS=‡zdowg‡
The values of oozie.https.keystore.pass
and
OOZIE_HTTPS_KEYSTORE_PASS
have been anonymized.
For more examples, refer to Examples of Pattern-Based Anonymization Rules.