Sending Usage and Diagnostic Data to Cloudera

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

Cloudera Manager collects anonymous usage information and takes regularly-scheduled snapshots of the state of your cluster and automatically sends them anonymously to Cloudera. This helps Cloudera improve and optimize Cloudera Manager.

If you have a Cloudera Enterprise license, you can also trigger the collection of diagnostic data and send it to Cloudera Support to aid in resolving a problem you may be having.

Configuring a Proxy Server

To configure a proxy server through which usage and diagnostic data is uploaded, follow the instructions in Configuring Network Settings.

Managing Anonymous Usage Data Collection

Cloudera Manager sends anonymous usage information using Google Analytics to Cloudera. The information helps Cloudera improve Cloudera Manager. By default anonymous usage data collection is enabled.

  1. Select Administration > Settings.
  2. Under the Other category, set the Allow Usage Data Collection property.
  3. Click Save Changes to commit the changes.

Managing Hue Analytics Data Collection

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

Hue tracks anonymized pages and application versions to collect information used to compare each application's usage levels. The data collected does not include hostnames or IDs; For example, the data has the format /2.3.0/pig, /2.5.0/beeswax/execute. You can restrict data collection as follows:
  1. Go to the Hue service.
  2. Click the Configuration tab.
  3. Select Scope > Hue.
  4. Locate the Enable Usage Data Collection property or search for it by typing its name in the Search box.
  5. Deselect the Enable Usage Data Collection checkbox.

    If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

  6. Click Save Changes to commit the changes.
  7. Restart the Hue service.

Diagnostic Data Collection

To help with solving problems when using Cloudera Manager on your cluster, Cloudera Manager collects diagnostic data on a regular schedule, and automatically sends it to Cloudera. By default Cloudera Manager is configured to collect data weekly and to send it automatically. You can schedule the frequency of data collection on a daily, weekly, or monthly schedule, or disable the scheduled collection of data entirely. You can also send a collected data set manually.

What Data Does Cloudera Manager Collect?

Cloudera Manager collects and returns a significant amount of information about the health and performance of the cluster. It includes:
  • Up to 1000 Cloudera Manager audit events: Configuration changes, add/remove of users, roles, services, and so on.
  • One day's worth of Cloudera Manager events: This includes critical errors Cloudera Manager watches for and more
  • Data about the cluster structure which includes a list of all hosts, roles, and services along with the configurations that are set through Cloudera Manager. Where passwords are set in Cloudera Manager, the passwords are not returned.
  • Cloudera Manager license and version number.
  • Current health information for hosts, service, and roles. Includes results of health tests run by Cloudera Manager.
  • Heartbeat information from each host, service, and role. These include status and some information about memory, disk, and processor usage.
  • The results of running Host Inspector.
  • One day's worth of Cloudera Manager metrics.
  • A download of the debug pages for Cloudera Manager roles.
  • For each host in the cluster, the result of running a number of system-level commands on that host.
  • Logs from each role on the cluster, as well as the Cloudera Manager server and agent logs.
  • Which parcels are activated for which clusters.
  • Whether there's an active trial, and if so, metadata about the trial.
  • Metadata about the Cloudera Manager server, such as its JMX metrics, stack traces, and the database/host it's running with.
  • HDFS/Hive replication schedules (including command history) for the deployment.
  • Impala query logs.

Configuring the Frequency of Diagnostic Data Collection

By default, Cloudera Manager collects diagnostic data on a weekly basis. You can change the frequency to daily, weekly, monthly, or never. If you are a Cloudera Enterprise customer and you set the schedule to never you can still collect and send data to Cloudera on demand. If you are a Cloudera Express customer and you set the schedule to never, data is not collected or sent to Cloudera.

  1. Select Administration > Settings.
  2. Under the Support category, click Scheduled Diagnostic Data Collection Frequency and select the frequency.
  3. To set the day and time of day that the collection will be performed, click Scheduled Diagnostic Data Collection Time and specify the date and time in the pop-up control.
  4. Click Save Changes to commit the changes.

You can see the current setting of the data collection frequency by viewing Support > Scheduled Diagnostics: in the main navigation bar.

Specifying the Diagnostic Data Directory

You can configure the directory where collected data is stored.

  1. Select Administration > Settings.
  2. Under the Support category, set the Diagnostic Data Bundle Directory to a directory on the host running Cloudera Manager Server. The directory must exist and be enabled for writing by the user cloudera-scm. If this field is left blank, the data is stored in /tmp.
  3. Click Save Changes to commit the changes.

Collecting and Sending Diagnostic Data to Cloudera

Disabling the Automatic Sending of Diagnostic Data from a Manually Triggered Collection

If you do not want data automatically sent to Cloudera after manually triggering data collection, you can disable this feature. The data you collect will be saved and can be downloaded for sending to Cloudera Support at a later time.

  1. Select Administration > Settings.
  2. Under the Support category, uncheck the box for Send Diagnostic Data to Cloudera Automatically.
  3. Click Save Changes to commit the changes.

Manually Triggering Collection and Transfer of Diagnostic Data to Cloudera

  1. Optionally change the System Identifier property:
    1. Select Administration > Settings.
    2. Under the Other category, set the System Identifier property and click Save Changes.
  2. Under the Support menu at the top right of the navigation bar, choose Send Diagnostic Data. The Send Diagnostic Data form displays.
  3. Fill in or change the information here as appropriate:
    • Optionally, you can improve performance by reducing the size of the data bundle that is sent. Click Restrict log and metrics collection to expand this section of the form. The three filters, Host, Service, and Role Type, allow you to restrict the data that will be sent. Cloudera Manager will only collect logs and metrics for roles that match all three filters.
    • Cloudera Manager populates the End Time based on the setting of the Time Range selector. You should change this to be a few minutes after you observed the problem or condition that you are trying to capture. The time range is based on the timezone of the host where Cloudera Manager Server is running.
    • If you have a support ticket open with Cloudera Support, include the support ticket number in the field provided.
  4. Depending on whether you have disabled automatic sending of data, do one of the following:
    • Click Collect and Send Diagnostic Data. A Running Commands window shows you the progress of the data collection steps. When these steps are complete, the collected data is sent to Cloudera.
    • Click Collect Diagnostic Data. A Command Details window shows you the progress of the data collection steps.
      1. In the Command Details window, click Download Result Data to download and save a zip file of the information.
      2. Send the data to Cloudera Support by doing one of the following:
        • Send the bundle using a Python script:
          1. Download the phone_home script.
          2. Copy the script and the downloaded data file to a host that has Internet access.
          3. Run the following command on that host:
            python phone_home.py --file downloaded data file
        • Attach the bundle to the SFDC case. Do not rename the bundle as this can cause a delay in processing the bundle.
        • Contact Cloudera Support and arrange to send the data file.