Diagnostic Data Collection

To help with solving problems when using on your cluster, collects diagnostic data on a regular schedule, and automatically sends it to .

By default is configured to collect this data weekly and to send it automatically. analyzes this data and uses it to improve the software. If discovers a serious issue, searches this diagnostic data and notifies customers who might encounter problems due to the issue. You can schedule the frequency of data collection on a daily, weekly, or monthly schedule, or disable the scheduled collection of data entirely. You can also send a collected data set manually.

Automatically sending diagnostic data requires the Server host to have Internet access, and be configured for sending data automatically. If your Server does not have Internet access, you can manually send the diagnostic data.

Automatically sending diagnostic data might fail sometimes and return an error message of "Could not send data to ." To work around this issue, you can manually send the data to Support.

What Data Does Collect?

collects and returns a significant amount of information about the health and performance of the cluster. It includes:
  • Up to 1000 audit events: Configuration changes, add/remove of users, roles, services, and so on.
  • One day's worth of events: This includes critical errors watches for and more.
  • Data about the cluster structure which includes a list of all hosts, roles, and services along with the configurations that are set through . Where passwords are set in , the passwords are not returned.
  • license and version number.
  • Current health information for hosts, service, and roles. Includes results of health tests run by .
  • Heartbeat information from each host, service, and role. These include status and some information about memory, disk, and processor usage.
  • The results of running Host Inspector.
  • One day's worth of metrics. If you are using a trial version, host metrics are not included.
  • A download of the debug pages for roles.
  • For each host in the cluster, the result of running a number of system-level commands on that host.
  • Logs from each role on the cluster, as well as the server and agent logs.
  • Which parcels are activated for which clusters.
  • Whether there's an active trial, and if so, metadata about the trial.
  • Metadata about the Server, such as its JMX metrics, stack traces, and the database or host it's running with.
  • HDFS or Hive replication schedules (including command history) for the deployment.
  • Impala query logs.
  • collects aggregate usage data by sending limited tracking events to Google Analytics and servers. No customer data or personal information is sent as part of these bundles.
  • A copy of selected tables and columns from the database. This data is not collected by default. To configure inclusion of database tables, see :