Defining anonymization rules for Cloudera logs

Cloudera includes a set of default anonymization rules and allows you to define custom anonymization rules in order to remove sensitive information from Cloudera logs.

Use PCRE convention for writing custom anonymization rule patterns.

Anonymization rules are applied to the following logs:

Default anonymization rules

Cloudera includes a set of default anonymization rules that anonymize the following:

Anonymization rule (PCRE) Replacement Description
\b([A-Za-z0-9]|[A-Za-z0-9][A-Za-z0-9\-\._]*[A-Za-z0-9])@(([A-Za-z0-9]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])\.)+([A-Za-z0-9]|[A-Za-z0-9][A-Za-z0-9\-]*[A-Za-z0-9])\b email@redacted.host Email addresses
\d{4}[^\w]\d{4}[^\w]\d{4}[^\w]\d{4} XXXX-XXXX-XXXX-XXXX Credit card numbers
\d{3}[^\w]\d{2}[^\w]\d{4} XXX-XX-XXXX SSN
FPW\:\s+[\w|\W].* FPW: [REDACTED] FreeIPA (workload) password
cdpHashedPassword=.*['] [CDP PWD ATTRS REDACTED] Hashed FreeIPA (workload) password.

Creating anonymization rule patterns

Use PCRE convention for writing anonymization rule patterns. For each pattern, come up with a replacement string.

Define custom anonymization rules

You can define custom anonymization rules in Cloudera. The anonymization rules are only applied to environments created after the rules were added in Cloudera.

Required role: PowerUser

Steps

  1. Once you have created the rules, navigate to Cloudera web interface > Cloudera Management Console > Global Settings > Telemetry > Anonymization rules.

  2. Default rules are pre-populated.

  3. Click on New rule and add a pattern and replacement string for your rule. Repeat for multiple rules.

  4. Test the rules from the same page on the UI under Test rules:
    1. Under Input text paste an example text with sensitive content that should get anonymized by the rules that you added.
    2. Click Test all rules.
    3. The sensitive content should be removed nad replaced in the output printed in the Anonymized result text box.
  5. Click Save Changes.
  1. If you would like to add new rules, you should first prepare the patterns and replacement strings, and then test them with the following command:
    cdp environments test-account-telemetry-rules --cli-input-json {
        "testInput": "Email: myemail@cloudera.com",
        "rules": [
            {
                "value": "\\b([A-Za-z0-9]|[A-Za-z0-9][A-Za-z0-9\\-\\._]*[A-Za-z0-9])@(([A-Za-z0-9]|[A-Za-z][A-Za-z0-9\\-]*[A-Za-z0-9])\\.)+([A-Za-z0-9]|[A-Za-z0-9][A-Za-z0-9\\-]*[A-Za-z0-9])\\b",
                "replacement": "email@redacted.host"
            }
        ]
    }
  2. Run the following command to get your current telemetry settings in JSON format:
    cdp environments get-account-telemetry
  3. Copy the JSON file that you obtained in the output of this command and paste it into a text editor.

  4. Update the JSON file, updating the settings or adding new rules.
  5. Once you have the JSON file updated, run the cdp environments set-account-telemetry command. For example:
    cdp environments set-account-telemetry --cli-input-json {
        "workloadAnalytics": true,
        "cloudStorageLogging": true,
        "rules": [
            {
                "value": "\\b([A-Za-z0-9]|[A-Za-z0-9][A-Za-z0-9\\-\\._]*[A-Za-z0-9])@(([A-Za-z0-9]|[A-Za-z][A-Za-z0-9\\-]*[A-Za-z0-9])\\.)+([A-Za-z0-9]|[A-Za-z0-9][A-Za-z0-9\\-]*[A-Za-z0-9])\\b",
                "replacement": "email@redacted.host"
            }
        ]
    }