Diagnostic Data Collection

To help with solving problems when using Cloudera Manager on your cluster, Cloudera Manager collects diagnostic data on a regular schedule, and automatically sends it to Cloudera.

By default Cloudera Manager is configured to collect this data weekly and to send it automatically. Cloudera analyzes this data and uses it to improve the software. If Cloudera discovers a serious issue, Cloudera searches this diagnostic data and notifies customers who might encounter problems due to the issue. You can schedule the frequency of data collection on a daily, weekly, or monthly schedule, or disable the scheduled collection of data entirely. You can also send a collected data set manually.

Automatically sending diagnostic data requires the Cloudera Manager Server host to have Internet access, and be configured for sending data automatically. If your Cloudera Manager Server does not have Internet access, you can manually send the diagnostic data.

Automatically sending diagnostic data might fail sometimes and return an error message of "Could not send data to Cloudera." To work around this issue, you can manually send the data to Cloudera Support.

What Data Does Cloudera Manager Collect?

Cloudera Manager collects and returns a significant amount of information about the health and performance of the cluster. It includes:
  • Up to 1000 Cloudera Manager audit events: Configuration changes, add/remove of users, roles, services, and so on.
  • One day's worth of Cloudera Manager events: This includes critical errors Cloudera Manager watches for and more.
  • Data about the cluster structure which includes a list of all hosts, roles, and services along with the configurations that are set through Cloudera Manager. Where passwords are set in Cloudera Manager, the passwords are not returned.
  • Cloudera Manager license and version number.
  • Current health information for hosts, service, and roles. Includes results of health tests run by Cloudera Manager.
  • Heartbeat information from each host, service, and role. These include status and some information about memory, disk, and processor usage.
  • The results of running Host Inspector.
  • One day's worth of Cloudera Manager metrics. If you are using a Cloudera trial version, host metrics are not included.
  • A download of the debug pages for Cloudera Manager roles.
  • For each host in the cluster, the result of running a number of system-level commands on that host.
  • Logs from each role on the cluster, as well as the Cloudera Manager server and agent logs.
  • Which parcels are activated for which clusters.
  • Whether there's an active trial, and if so, metadata about the trial.
  • Metadata about the Cloudera Manager Server, such as its JMX metrics, stack traces, and the database or host it's running with.
  • HDFS or Hive replication schedules (including command history) for the deployment.
  • Impala query logs.
  • Cloudera Machine Learning (CML) collects aggregate usage data by sending limited tracking events to Google Analytics and Cloudera servers. No customer data or personal information is sent as part of these bundles.
  • A copy of selected tables and columns from the Cloudera Manager database. This data is not collected by default. To configure inclusion of database tables, see :