Configuring Data Anonymization Rules

Anonymization rules define regular expressions to anonymize sensitive data (like IP addresses, domain names, and so on). Each rule uses JSON format to define what to match and the value to replace.

[Note]Note

Anonymization rule formats vary between different SmartSense versions. Make sure that you consult the documentation that matches your SmartSense version.

  1. To define regular expression-based rules, refer to the following sample:

      {
        "name":"ip_address",
        "path":null,
        "pattern": "[ :\\/]?[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}[ :\\/]?",
        "extract": "[ :\\/]?([0-9\\.]+)[ :\\/]?",
        "shared": true
      }

    Key reference:

    • name - The rule name

    • path - An optional regular expression path of files on which to apply this rule (default null means all files)

    • pattern - A regular expression that defines the pattern to match within the file

    • extract - An optional regular expression to extract the data from the matched pattern

      Each of the extracts is marked as a regular expression group.

    • shared - A flag that indicates which key to use for anonymization (the shared or private) key is used for masking)

      If the shared key is used, the Hortonworks support team can unmask data if needed for diagnostic purposes: for example, host names and IP addresses for resolving issues on specific hosts or communication between hosts. Note that unmasked data is not stored in Hortonworks repositories; it is discarded as soon as the analysis finishes.

    • value - An optional constant value to replace

      Note that the value chosen should not be matchable by the pattern specified earlier. For example, if the pattern is .*dfs.datanode.*, the value should not contain dfs.datanode. Also, note that if the value is specified, the shared flag is ignored.

  2. To use property-based rules, use the following example:

     {
        "name":"delete_oozie_jdbc_password",
        "path":"oozie-site.xml",
        "property": "oozie.service.JPAService.jdbc.password",
        "operation":"DELETE"
        "shared": false
       }
    • name - The rule name

    • path - A regular expression path of files on which to apply this rule

    • property - The name of a specific property within the matching files

    • operation - Either DELETE or REPLACE (the default)

      If DELETE is specified, the property is removed from the configuration file, and if REPLACE is specified, the property value is replaced by either a constant value or a masked value.

    • value - An optional value for the REPLACE operation.

      If a value is not specified, a private or shared key is used to mask the data to replace.

    • enabled - A flag used to enable or disable rule definition, the default being true.

    • excludes - A set of path patterns to be excluded by the rule: for example, “excludes”: [“oozie-site.xml”, “core-site.xml”]

    • shared - Flag to allow anonymized data to be reversed by Hortonworks. If shared is true, anonymized data is reversible by Hortonworks, if false, that data cannot be reversed.

    [Note]Note

    Rules configured with shared = false cannot be unmasked by Hortonworks (and in some cases might become a roadblock for support case analysis.)