Generating a VM-based diagnostic bundle
CDP includes a set of CLI commands that allow you to generate an on-demand VM-based diagnostic bundle that gathers logs and command output from your Data Lake, FreeIPA, or Data Hub cluster VMs. The bundle can be used for support case troubleshooting.
This feature is available from CDP CLI. For each resource type
freeipa, datalake, or
datahub), there are two commands that
you can use:
get-<resource-type>-log-descriptors- Allows you to obtain a list of logs that get collected for diagnostics.
collect-<resource-type>-diagnostics- Triggers diagnostics collection. You need to provide a CRN of a resource (Data Lake, FreeIPA, or Data Hub cluster), a description, and a destination for the bundle.
What gets collected
The collected information includes:
- Command output for some commands (such as
ps auxand various
- Infrastructure logs (such as daemon logs)
- Service logs (such as Cloudera Manager service logs)
Furthermore, you may specify additional logs that are not part of the default log collection to be collected in the diagnostic bundle.
The information is collected from all running nodes of your cluster (Data Lake, FreeIPA, or Data Hub). It is not collected from nodes that are not running, such as nodes that are currently being provisioned or stopped or deleted nodes.
get-<resource-type>-log-descriptors commands allow you to
obtain a list of logs that get collected. These commands can also be used to obtain available
log label/path pairs that can be provided in the
collect-<resource-type>-diagnostics command via the
--labels parameter in order to limit the command output.
The output is printed in the Event History tab on the UI in Data Lake or Data Hub details. Since there is no FreeIPA Event History on the UI, in case of FreeIPA the output is saved to /var/log/salt/minion, which can be accessed from your logs location in cloud storage.
To gather available log label/path pairs, use one of the following commands:
cdp environments get-freeipa-log-descriptors
cdp datalake get-datalake-log-descriptors
cdp datahub get-datahub-log-descriptors
collect-<resource-type>-diagnostics commands trigger
log collection. You need to provide a CRN of a resource (Data Lake, FreeIPA, or Data Hub
cluster), a description, and a destination for the bundle. The generated bundle can be saved
to cloud storage or to a local file. The command progress (including the location where the
bundle is generated) is printed in the Event History tab on the UI.
- If you would like to use cloud storage as a destination, ensure that the Logger and IDBroker instance profiles (on AWS) or managed identities (on Azure) are able to write to the specified cloud storage location.
- If you would like to use Cloudera support as a destination, your network must be able to
reach the following destinations:
https://dbusapi.us-west-1.altus.cloudera.com(only required if your cloud provider is AWS) and
https://cloudera-dbus-prod.s3.amazonaws.com(required even if your cloud provider is not AWS).
To run collection with salt script
included and update cdp-telemetry package on the nodes, use one of the following commands,
<DESCRIPTION> with an actual description and the
<DATAHUBCRN> with an actual CRN.
To send the diagnostics bundle to cloud storage, use:
cdp environments collect-freeipa-diagnostics --environment-name <NAME-OR-CRN> \ --description <DESCRIPTION> \ --destination CLOUD_STORAGE \ --update-package \ --include-salt-logs
cdp datalake collect-datalake-diagnostics --crn <DATALAKE-CRN> \ --description <DESCRIPTION> \ --destination CLOUD_STORAGE \ --update-package \ --include-salt-logs
cdp datahub collect-datahub-diagnostics --crn <DATAHUB-CRN> \ --description <DESCRIPTION> \ --destination CLOUD_STORAGE \ --update-package \ --include-salt-logs
To send the diagnostic bundle directly to Cloudera support, use:
cdp datahub collect-datahub-diagnostics --crn <DATAHUB-CRN> \ --description <DESCRIPTION> \ --case-number <SUPPORT_CASE_NUMBER> --destination SUPPORT \ --update-package \ --include-salt-logs
To collect only specific custom logs (such as /var/log/audit/audit.log used in this example), use:
cdp environments collect-freeipa-diagnostics --environment-name <name> \ --description <DESCRIPTION> --destination CLOUD_STORAGE \ --additional-logs path=/var/log/audit/audit.log,label=audit \ --labels audit
If you run this command:
- Logs specified under
--additional-logswill be collected
- Due to the specified label, all other logs will be filtered out and no other logs will be included in the bundle.
(for Data Lake or Data Hub) or
--crn (for Data Hub)
a CRN of your Data Lake, FreeIPA, or Data Hub cluster. You can obtain it by running a list or
describe command on your resource.
--description Provide a
description for the diagnostics bundle.
a destination of the diagnostics collection: SUPORT, CLOUD_STORAGE, or LOCAL:
- SUPPORT: Sends diagnostic bundles directly to Cloudera support.
- CLOUD_STORAGE: Saves collected diagnostics into a tar.gzfile file and uploads it
to your logs cloud storage location specified during environment registration, to a PATH that
The “cluster logs” directory is only created if your cloud storage location is a bucket. If your cloud storage location includes a directory within a bucket, then the “cluster logs” directory is not created.
- LOCAL: Diagnostics will be collected to a compressed file, and saved to
/var/lib/filecollectoron every cluster node.
--labels You can
use this to pass a list of labels that can be used to filter collected logs. This is useful if
instead of including all logs, you would only include certain specific logs in the bundle.
This list can be obtained from the get-<cluster-type>-log-descriptors response.
--additional-logs One or more additional VM log objects that will
be included in the diagnostics bundle. A VM log object is a path/label pair. Cloudera support
may ask you to provide specific values here.
--hosts Run diagnostics
only the specified hosts (both IP addresses or fqdns are
--host-groups Run diagnostics only on the those hosts that are
included in the specified host groups.
this is set, salt minion/master logs will be included in the diagnostic bundle.
--case-number Allows you to provide a support case number.
--update-package If this is set, during diagnostics collection, the
required third party applications will be downloaded and updated on the VM nodes. This
requires internet access.
information about the command and its options, use the
--storage-validation You can use
this if using CLOUD_STORAGE as a destination. If this is set, cloud storage write validation
check will happen during an initial phase of the diagnostics collection.