User Guide
Also available as:
PDF
Examples of Pattern-Based Anonymization Rules

This section includes examples of commonly used pattern-based anonymization rules.

Example 1: Mask by pattern across all log files, without extract pattern

To mask all email addresses in all log files, use the following rule definition:

{
  "name": "EMAIL",
  "rule_id": "Pattern",
  "patterns": ["(?<![a-z0-9._%+-])[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,6}(?![a-z0-9._%+-])"],
  "include_files": ["*.log*"],
  "shared": false
}

Example 2: Mask by pattern across all log files, with extract pattern

To mask encryption keys, logged in the following format Key=12.. with a value consisting of 64 hexadecimal characters, use the following rule definition:

{
  "name": "ENC_KEYS",
  "rule_id": "Pattern",
  "patterns": ["Key=[a-f\\d]{64}\\s"],
  "extract": "=([a-f\\d]{64})",
  "include_files": ["*.log*"],
  "shared": false
}

Input data, test.log is:

encryption key=1234567890adc1234567aaabc1234567890adc1234567aaabc12345678901234 for keystore
derby.system.home=null

Output data, test.log, with the encryption keys anonymized, is:

encryption key=‡8697685738fnx1736987qigyx7611731027yds0096404hlsph91727138403654‡ for keystore
derby.system.home=null

Example 3: Mask by pattern across all files, except a few files

To mask email addresses in all files, except hdfs-site.xml and .property files, use the following rule definition:

{
  "name": "EMAIL",
  "rule_id": "Pattern",
  "patterns": ["(?<![a-z0-9._%+-])[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,6}(?![a-z0-9._%+-])"],
  "exclude_files" : ["*.properties", "hdfs-site.xml"],
  "shared": false
}

Input data, version.txt, is:

Hadoop 2.7.3.2.5.0.0-1245
Subversion git@github.com :hortonworks/hadoop.git -r cb6e514b14fb60e9995e5ad9543315cd404b4e59
Compiled by jenkins on 2016-08-26T00:55Z

Output file version.txt, with an anonymized email address, is:

Hadoop 2.7.3.2.5.0.0-1245
Subversion ‡qpe@unqfay.mjp‡ :hortonworks/hadoop.git -r cb6e514b14fb60e9995e5ad9543315cd404b4e59
Compiled by jenkins on 2016-08-26T00:55Z