Diagnostic Metrics Collection Details
The diagnostic metrics collected by Telemetry Publisher and sent to Workload Manager include the following:
- MapReduce Jobs — Telemetry Publisher polls the YARN Job History Server for recently completed MapReduce jobs. For each of these jobs, Telemetry Publisher collects the configuration and
jhistfile, which is the job history file that contains job and task counters, from HDFS. Telemetry Publisher can be configured to collect MapReduce task logs from HDFS and send them to Workload Manager. By default, this log collection is turned off.
Spark Applications — Telemetry Publisher polls the Spark History Server for recently completed Spark applications. For each of these applications, Telemetry Publisher collects their event log from HDFS. Telemetry Publisher only collects Spark application data from Spark version 2.2 and later. Telemetry Publisher can be configured to collect the executor logs of Spark applications from HDFS and send them to Workload Manager, but this data collection is turned off by default.
- Oozie Workflows — Telemetry Publisher polls Oozie servers for recently completed Oozie workflows and sends their details to Workload Manager.
- Hive Queries — The Cloudera Manager agent periodically searches for query detail files that are generated by HiveServer2 after a query completes and then sends the details from those files to Telemetry Publisher.
- Impala Queries — A Cloudera Manager agent periodically looks for query profiles of recently completed queries and sends them to Telemetry Publisher.