Data Collection in Cloudera Data Science Workbench
Cloudera Data Science Workbench collects usage and diagnostic data in two ways, both of which can be controlled by system administrators.
Usage Tracking
Cloudera Data Science Workbench collects aggregate usage data by sending limited tracking events to Google Analytics and Cloudera servers. No customer data or personal information is sent as part of these bundles.
Diagnostic Bundles
Diagnostic bundles can be created by system administrators using the cdsw logs command. They are used to aid in debugging issues filed with Cloudera Support. Two archives result from running the cdsw logs command:
-
Redacted - Sensitive information is redacted from this archive. This is the bundle that you should attach to any case opened with Cloudera Support. The file name will be of the form, cdsw-logs-$hostname-$date-$time.redacted.tar.gz.
-
Original - The original archive is meant for internal use. It should be retained at least for the duration of the support case, in case any critical information was redacted. This archive can be shared with Cloudera at your discretion. The file name will be of the form, cdsw-logs-$hostname-$date-$time.tar.gz.
The contents of these archives are stored in text and can easily be inspected by system administrators. The original and the redacted forms are designed to be easily diff-able.
Information Collected in Diagnostic Bundles
-
System information such as hostnames, operating system, kernel modules and settings, and system logs.
-
Cloudera Data Science Workbench version, status information, and the results of install-time validation checks.
-
Details about file systems, devices, and mounts in use.
-
CDH cluster configuration, including information about Java, Kerberos, installed parcels, and CDH services such as Spark 2.
-
Network configuration and status, including interfaces, routing configuration, and reachability.
- Status information for system services such as Docker, Kubernetes, NFS, and NTP.
-
Listings for processes, open files, and network sockets.
-
Reachability, configuration, and logs for Cloudera Data Science Workbench application components.
-
Hashed Cloudera Data Science Workbench user names.