Redaction Capabilities for Diagnostic Data
The diagnostic data collected by Telemetry Publisher might contain sensitive data in the job configurations or the logs. There are several ways you can redact sensitive data before it is sent to Telemetry Publisher. Cloudera recommends enabling the following redaction features even if you are not sending diagnostic data to Telemetry Publisher:
- Log and query redaction — This redaction feature enables you to redact information in logs and queries collected by Telemetry Publisher based on filters created with regular expressions.
- MapReduce job properties redaction — You can redact job configuration properties before they are stored in HDFS. Since Telemetry Publisher reads job configuration files from HDFS, it only fetches redacted configuration information.
- Spark event and executor log redaction — The Spark2 on YARN service has the
spark.redaction.regexconfiguration property that can be used to redact sensitive data from event and executor logs. When this configuration property is enabled, Telemetry Publisher sends only redaction data to Workload Manager. This configuration property is enabled by default, but can be overridden by using safety valves in Cloudera Manager or in the Spark application itself.