Examples of Pattern-Based Anonymization Rules
This section includes examples of commonly used pattern-based anonymization rules.
Example 1: Mask by pattern across all log files, without extract pattern
To mask all email addresses in all log files, use the following rule definition:
{ "name": "EMAIL", "rule_id": "Pattern", "patterns": ["(?<![a-z0-9._%+-])[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,6}(?![a-z0-9._%+-])"], "include_files": ["*.log*"], "shared": false }
Example 2: Mask by pattern across all log files, with extract pattern
To mask encryption keys, logged in the following format Key=12.. with a value consisting of 64 hexadecimal characters, use the following rule definition:
{ "name": "ENC_KEYS", "rule_id": "Pattern", "patterns": ["Key=[a-f\\d]{64}\\s"], "extract": "=([a-f\\d]{64})", "include_files": ["*.log*"], "shared": false }
Input data, test.log is:
encryption key=1234567890adc1234567aaabc1234567890adc1234567aaabc12345678901234 for keystore derby.system.home=null
Output data, test.log, with the encryption keys anonymized, is:
encryption key=‡8697685738fnx1736987qigyx7611731027yds0096404hlsph91727138403654‡ for keystore derby.system.home=null
Example 3: Mask by pattern across all files, except a few files
To mask email addresses in all files, except hdfs-site.xml and .property files, use the following rule definition:
{ "name": "EMAIL", "rule_id": "Pattern", "patterns": ["(?<![a-z0-9._%+-])[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,6}(?![a-z0-9._%+-])"], "exclude_files" : ["*.properties", "hdfs-site.xml"], "shared": false }
Input data, version.txt, is:
Hadoop 2.7.3.2.5.0.0-1245 Subversion git@github.com :hortonworks/hadoop.git -r cb6e514b14fb60e9995e5ad9543315cd404b4e59 Compiled by jenkins on 2016-08-26T00:55Z
Output file version.txt, with an anonymized email address, is:
Hadoop 2.7.3.2.5.0.0-1245 Subversion ‡qpe@unqfay.mjp‡ :hortonworks/hadoop.git -r cb6e514b14fb60e9995e5ad9543315cd404b4e59 Compiled by jenkins on 2016-08-26T00:55Z