Monitoring a Data Lake
You can monitor the status of your Data Lake from the Cloudera web UI or the CDP CLI.
Required role: EnvironmentAdmin, Data Steward, or Owner of the environment
Monitoring a Data Lake cluster via UI
To access information related to your Data Lake cluster from the Cloudera web UI, navigate to . Each Data Lake cluster is represented by an entry on the Data Lakes
page. To get more information about a specific Data Lake cluster, click on the tile
representing your cluster. When a Data Lake cluster is healthy, its status should be
Running
.
To check health of specific hosts and services, navigate to Cloudera Manager.
Monitoring a Data Lake cluster via CLI
You can view and monitor your available Data Lake clusters via CDP CLI using the following commands:
cdp datalake list-datalakes cdp datalake describe-datalake cdp datalake get-cluster-host-status cdp datalake get-cluster-service-status cdp datalake get-operation
-
List all available clusters:
cdp datalake list-datalakes
Example:cdp environments list-datalakes { "datalakes": [ { "datalakeName": "zookeeper-190920-144828-vg7", "crn": "crn:cdp:datalake:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:datalake:4529591f-53ea-4196-90fc-5d780d7063a8", "status": "RUNNING", "environmentCrn": "crn:cdp:environments:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:environment:b1935d68-85d5-4f50-a023-56fa96d01c45", "creationDate": "2019-09-20T12:49:55.669000+00:00", "statusReason": "Datalake is running" }, { "datalakeName": "zookeeper-sqqsx", "crn": "crn:cdp:datalake:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:datalake:92d66fed-c5d2-437c-a6eb-a54e40d36287", "status": "RUNNING", "environmentCrn": "crn:cdp:environments:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:environment:1eb291b3-dd23-4bdd-a3e8-09579afdf5a8", "creationDate": "2019-09-25T09:24:08.017000+00:00", "statusReason": "Datalake is running" } ] }
-
Get basic information about a specific cluster:
cdp datalake describe-cluster --cluster-name <value>
Example:cdp datalake describe-datalake --datalake-name test-data-lake { "datalake": { "crn": "crn:cdp:datalake:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:datalake:aa2e8e3e-2d6f-410b-bf3c-a3e02112bfc8", "datalakeName": "test-data-lake", "status": "RUNNING", "environmentCrn": "crn:cdp:environments:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:environment:574aa1cb-7a51-45a2-97ae-dead97072145", "credentialCrn": "crn:altus:environments:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:credential:83c861b6-5f62-4b83-a466-06de751a3964", "cloudPlatform": "AWS", "creationDate": "2019-09-20T22:09:22.422000+00:00", "clouderaManager": { "version": "7.x.0", "clouderaManagerRepositoryURL": "http://cloudera-build-us-west-1.vpc.cloudera.com/s3/build/1445641/cm7/7.0.1/redhat7/yum/", "clouderaManagerServerURL": "https://adar-test-data-lake.adar-tes.xcu2-8y8x.workload-dev.cloudera.com:8443/test-data-lake/cdp-proxy/cmf/home/" }, "productVersions": [ { "name": "CDH", "version": "7.0.1-1.cdh7.0.1.p0.1443705" } ], "statusReason": "Datalake is running", "awsConfiguration": { "instanceProfile": "arn:aws:iam::069336058373:instance-profile/idbroker-assume-role" } } }
-
Obtain information about the health of each of your Data Lake hosts:
cdp datalake get-cluster-host-status --cluster-name <value>
Example:cdp datalake get-cluster-host-status --cluster-name test-data-lake { "hosts": [ { "hostid": "5c8fb276620f0aa54bdd111e33ba5f58", "hostname": "idbroker1.cloudera.site", "healthSummary": "GOOD" }, { "hostid": "30f27ab8472c9677985f04efc2b800c4", "hostname": "master0.cloudera.site", "healthSummary": "GOOD" } ] }
-
Obtain information about the health of each service running on the Data Lake cluster:
cdp datalake get-cluster-service-status --cluster-name <value>
Example:cdp datalake get-cluster-service-status --cluster-name test-data-lake { "services": [ { "type": "ZOOKEEPER", "state": "STARTED", "healthSummary": "GOOD", "healthChecks": [ { "name": "ZOOKEEPER_SERVERS_HEALTHY", "summary": "GOOD" } ] }, { "type": "HDFS", "state": "STARTED", "healthSummary": "GOOD", "healthChecks": [ { "name": "HDFS_DATA_NODES_HEALTHY", "summary": "GOOD" }, { "name": "HDFS_VERIFY_EC_WITH_TOPOLOGY", "summary": "DISABLED" } ] }, { "type": "SOLR", "state": "STARTED", "healthSummary": "GOOD", "healthChecks": [ { "name": "SOLR_SOLR_SERVERS_HEALTHY", "summary": "GOOD" } ] }, { "type": "HIVE", "state": "STARTED", "healthSummary": "GOOD", "healthChecks": [ { "name": "HIVE_HIVEMETASTORES_HEALTHY", "summary": "GOOD" } ] }, { "type": "RANGER", "state": "STARTED", "healthSummary": "GOOD", "healthChecks": [ { "name": "RANGER_RANGER_ADMIN_HEALTHY", "summary": "GOOD" }, { "name": "RANGER_RANGER_RANGER_TAGSYNC_HEALTH", "summary": "GOOD" }, { "name": "RANGER_RANGER_RANGER_USERSYNC_HEALTH", "summary": "GOOD" } ] }, { "type": "HBASE", "state": "STARTED", "healthSummary": "GOOD", "healthChecks": [ { "name": "HBASE_REGION_SERVERS_HEALTHY", "summary": "GOOD" } ] }, { "type": "KAFKA", "state": "STARTED", "healthSummary": "GOOD", "healthChecks": [ { "name": "KAFKA_KAFKA_BROKER_HEALTHY", "summary": "GOOD" } ] }, { "type": "ATLAS", "state": "STARTED", "healthSummary": "GOOD", "healthChecks": [ { "name": "ATLAS_ATLAS_SERVER_HEALTHY", "summary": "GOOD" } ] }, { "type": "KNOX", "state": "STARTED", "healthSummary": "GOOD", "healthChecks": [ { "name": "KNOX_IDBROKER_HEALTHY", "summary": "GOOD" }, { "name": "KNOX_KNOX_GATEWAY_HEALTHY", "summary": "GOOD" } ] } ] }
-
Get status of specified operation:
cdp datalake get-operation --crn <value> [--operation-id <value>]
To use the
get-operation
command to get the status of a specified event, you need to specify the operation id of the operation. Every operation that starts a process running in the background, like creating, starting, stopping, or repairing a cluster, returns anoperationId
field in the response.Example:cdp datalake create-aws-datalake [args] { "datalake": { "datalakeName": "foldikzrvdl", "crn": "crn:cdp:datalake:us-west-1:cloudera:datalake:ce8b73f3-79ac-4a24-b6e9-c671fcdcdb59", "status": "REQUESTED", "environmentCrn": "crn:cdp:environments:us-west-1:cloudera:environment:257c1832-da68-4afd-a84d-212639bbe024", "creationDate": "2025-03-05T13:46:04.154000+00:00", "statusReason": "Datalake requested", "enableRangerRaz": false, "certificateExpirationState": "VALID", "multiAz": false, "security": { "seLinux": "PERMISSIVE" } }, "operationId": "fc951645-eebd-401b-bacb-be3845fbffdd" }
The value of this
operationId
can be used as the value for the--operation-id
option for theget-operation
command.Example:cdp datalake get-operation --crn crn:cdp:datalake:us-west-1:cloudera:datalake:ce8b73f3-79ac-4a24-b6e9-c671fcdcdb59 --operation-id fc951645-eebd-401b-bacb-be3845fbffdd
Output format:{ "operationId": "identifier of the operation", "operationName": "Short name of the operation", "operationStatus": "UNKNOWN | RUNNING | FAILED | FINISHED | CANCELLED", "started": "Start time of the operation" "ended": "End time of the operation if it is completed" }
Output example:{ "operationId": "fc951645-eebd-401b-bacb-be3845fbffdd", "operationName": "DataLakeCreate", "operationStatus": "RUNNING", "started": "2025-03-05T13:46:04+00:00" }
Unsuccessful operation statuses are stored for 2 weeks, while successful status operations are stored for 1 day.
The operation id is optional, and if it is omitted, the status of the last operation is returned.
-
Get status of latest operation:
cdp datalake get-operation --crn <value>
Example:cdp datalake get-operation --crn crn:cdp:datalake:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:datalake:3dd6eed4-e327-4aa4-817f-05a43d21db80 { "operationId": "cf898a54-00ed-406b-9bc6-525401eb5ac5", "operationName": "DataLakeCreate", "operationStatus": "RUNNING", "started": "2025-01-15T09:46:53+00:00" }