Diagnostic metrics collection details

Describes the type of data provided by the Cloudera data services and collected by Telemetry Publisher and Databus WXM Client.

Telemetry Publisher and Databus WXM Client collect and send the following diagnostic metrics and data to Cloudera Observability:

  • Cloudera Manager Metrics — The Telemetry Publisher agent pulls a subset of Cloudera Manager metrics from the Cloudera Manager API endpoint in Data Hub clusters and sends it to Cloudera Observability. For more information, click the Related Information link below.
  • Cloudera Manager Events — The Telemetry Publisher agent pulls the Cloudera Manager Events from the Cloudera Manager API endpoint in Data Hub clusters and sends its diagnostic data to Cloudera Observability. For more information, click the Related Information link below.
  • Hive MetaStore (HMS) data source — The Telemetry Publisher or Databus Producer agent polls Hive and Impala for HMS metadata about your tables and their database and sends the details to Cloudera Observability. This data includes the table's schema, database location, partitions, structure and relationships, columns, column names and their data types, and the table's metadata properties that include user-defined and predefined key-value pairs.
  • Hive Queries — An agent periodically searches for query detail files that are generated by HiveServer2 after a query completes and then sends the details from those files to Telemetry Publisher or Databus Producer.
  • Impala Queries — An agent periodically looks for query profiles of recently completed queries and sends them to Telemetry Publisher and Databus Producer.
  • MapReduce Jobs — Telemetry Publisher and Databus Producer poll the YARN Job History Server for recently completed MapReduce jobs. For each of these jobs, Telemetry Publisher and Databus Producer collects the configuration and jhist file, which is the job history file that contains job and task counters, from HDFS. Telemetry Publisher and Databus Producer can be configured to collect MapReduce task logs from HDFS and send them to Cloudera Observability. By default, this log collection is turned off.
  • Oozie Workflows — Telemetry Publisher and Databus Producer polls Oozie servers for recently completed Oozie workflows and sends the details to Cloudera Observability.
  • Spark Applications — Telemetry Publisher and Databus Producer poll the Spark History Server for recently completed Spark applications. For each of these applications, Telemetry Publisher and Databus Producer collect their event log from HDFS. You can configure Telemetry Publisher to collect the executor logs of Spark applications from HDFS and send them toCloudera Observability. By default, this data collection is turned off.