Redact Data Before Sending to Workload XM

Telemetry Publisher collects diagnostic data from logs, job configurations, and queries, and then sends this data to Workload XM. This diagnostic information might contain sensitive data so it is desirable to redact the sensitive information before Telemetry Publisher sends it to Workload XM.

Redact Logs and Queries

To redact sensitive data in the CDH cluster, such as log files, use Cloudera Manager. See Log and Query Redaction in the Cloudera Manager documentation. However, note that this only redacts data, not metadata. Sensitive data in files is redacted, but the name, owner, and other metadata about the files is not redacted. The Cloudera documentation referred to above explains what is redacted and what is not. Also see Log and Query Redaction for the Telemetry Publisher Service for additional details about log and query redaction in Workload XM.

Redact Spark Data

The Spark on YARN service in CDH enables the spark.redaction.regex configuration property by default, which redacts sensitive data from event and executor logs. Do not override this setting to ensure that Telemetry Publisher only sends redacted information to Workload XM.

Redacting MapReduce Job Properties

Set the mapreduce.job.redacted-properties configuration property for YARN to redact MapReduce job configuration properties before they are stored in HDFS. Telemetry Publisher reads the job configuration file from HDFS, so if you set this property for all the MapReduce jobs you use, only redacted job configuration information is fetched from HDFS.

To set this property in Cloudera Manager:

  1. In the Cloudera Manager Admin Console, select the YARN service, and then click the Configuration tab.
  2. Search for mapreduce.job.redacted-properties to locate this configuration property. By default, several MapReduce job properties are set. Leave these set as they are.
  3. Click the plus sign after the last property listed and add any additional properties for your MapReduce jobs.
  4. Click Save Changes and restart the YARN service.