Data LakesPDF version

Monitoring a Data Lake

You can monitor the status of your Data Lake from the Cloudera web UI or the CDP CLI.

Required role: EnvironmentAdmin, Data Steward, or Owner of the environment

To access information related to your Data Lake cluster from the Cloudera web UI, navigate to Management Console > Data Lakes. Each Data Lake cluster is represented by an entry on the Data Lakes page. To get more information about a specific Data Lake cluster, click on the tile representing your cluster. When a Data Lake cluster is healthy, its status should be Running.

To check health of specific hosts and services, navigate to Cloudera Manager.

You can view and monitor your available Data Lake clusters via CDP CLI using the following commands:

cdp datalake list-datalakes
cdp datalake describe-datalake
cdp datalake get-cluster-host-status
cdp datalake get-cluster-service-status
cdp datalake get-operation
  • List all available clusters: cdp datalake list-datalakes

    Example:
    cdp environments list-datalakes
    {
        "datalakes": [
            {
                "datalakeName": "zookeeper-190920-144828-vg7",
                "crn": "crn:cdp:datalake:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:datalake:4529591f-53ea-4196-90fc-5d780d7063a8",
                "status": "RUNNING",
                "environmentCrn": "crn:cdp:environments:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:environment:b1935d68-85d5-4f50-a023-56fa96d01c45",
                "creationDate": "2019-09-20T12:49:55.669000+00:00",
                "statusReason": "Datalake is running"
            },
            {
                "datalakeName": "zookeeper-sqqsx",
                "crn": "crn:cdp:datalake:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:datalake:92d66fed-c5d2-437c-a6eb-a54e40d36287",
                "status": "RUNNING",
                "environmentCrn": "crn:cdp:environments:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:environment:1eb291b3-dd23-4bdd-a3e8-09579afdf5a8",
                "creationDate": "2019-09-25T09:24:08.017000+00:00",
                "statusReason": "Datalake is running"
            }
        ]
    }
  • Get basic information about a specific cluster: cdp datalake describe-cluster --cluster-name <value>

    Example:
    cdp datalake describe-datalake --datalake-name test-data-lake
    {
        "datalake": {
            "crn": "crn:cdp:datalake:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:datalake:aa2e8e3e-2d6f-410b-bf3c-a3e02112bfc8",
            "datalakeName": "test-data-lake",
            "status": "RUNNING",
            "environmentCrn": "crn:cdp:environments:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:environment:574aa1cb-7a51-45a2-97ae-dead97072145",
            "credentialCrn": "crn:altus:environments:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:credential:83c861b6-5f62-4b83-a466-06de751a3964",
            "cloudPlatform": "AWS",
            "creationDate": "2019-09-20T22:09:22.422000+00:00",
            "clouderaManager": {
                "version": "7.x.0",
                "clouderaManagerRepositoryURL": "http://cloudera-build-us-west-1.vpc.cloudera.com/s3/build/1445641/cm7/7.0.1/redhat7/yum/",
                "clouderaManagerServerURL": "https://adar-test-data-lake.adar-tes.xcu2-8y8x.workload-dev.cloudera.com:8443/test-data-lake/cdp-proxy/cmf/home/"
            },
            "productVersions": [
                {
                    "name": "CDH",
                    "version": "7.0.1-1.cdh7.0.1.p0.1443705"
                }
            ],
            "statusReason": "Datalake is running",
            "awsConfiguration": {
                "instanceProfile": "arn:aws:iam::069336058373:instance-profile/idbroker-assume-role"
            }
        }
    }
  • Obtain information about the health of each of your Data Lake hosts: cdp datalake get-cluster-host-status --cluster-name <value>

    Example:
    cdp datalake get-cluster-host-status --cluster-name test-data-lake
    {
        "hosts": [
            {
                "hostid": "5c8fb276620f0aa54bdd111e33ba5f58",
                "hostname": "idbroker1.cloudera.site",
                "healthSummary": "GOOD"
            },
            {
                "hostid": "30f27ab8472c9677985f04efc2b800c4",
                "hostname": "master0.cloudera.site",
                "healthSummary": "GOOD"
            }
        ]
    }
  • Obtain information about the health of each service running on the Data Lake cluster: cdp datalake get-cluster-service-status --cluster-name <value>

    Example:
    cdp datalake get-cluster-service-status --cluster-name test-data-lake
    {
        "services": [
            {
                "type": "ZOOKEEPER",
                "state": "STARTED",
                "healthSummary": "GOOD",
                "healthChecks": [
                    {
                        "name": "ZOOKEEPER_SERVERS_HEALTHY",
                        "summary": "GOOD"
                    }
                ]
            },
            {
                "type": "HDFS",
                "state": "STARTED",
                "healthSummary": "GOOD",
                "healthChecks": [
                    {
                        "name": "HDFS_DATA_NODES_HEALTHY",
                        "summary": "GOOD"
                    },
                    {
                        "name": "HDFS_VERIFY_EC_WITH_TOPOLOGY",
                        "summary": "DISABLED"
                    }
                ]
            },
            {
                "type": "SOLR",
                "state": "STARTED",
                "healthSummary": "GOOD",
                "healthChecks": [
                    {
                        "name": "SOLR_SOLR_SERVERS_HEALTHY",
                        "summary": "GOOD"
                    }
                ]
            },
            {
                "type": "HIVE",
                "state": "STARTED",
                "healthSummary": "GOOD",
                "healthChecks": [
                    {
                        "name": "HIVE_HIVEMETASTORES_HEALTHY",
                        "summary": "GOOD"
                    }
                ]
            },
            {
                "type": "RANGER",
                "state": "STARTED",
                "healthSummary": "GOOD",
                "healthChecks": [
                    {
                        "name": "RANGER_RANGER_ADMIN_HEALTHY",
                        "summary": "GOOD"
                    },
                    {
                        "name": "RANGER_RANGER_RANGER_TAGSYNC_HEALTH",
                        "summary": "GOOD"
                    },
                    {
                        "name": "RANGER_RANGER_RANGER_USERSYNC_HEALTH",
                        "summary": "GOOD"
                    }
                ]
            },
            {
                "type": "HBASE",
                "state": "STARTED",
                "healthSummary": "GOOD",
                "healthChecks": [
                    {
                        "name": "HBASE_REGION_SERVERS_HEALTHY",
                        "summary": "GOOD"
                    }
                ]
            },
            {
                "type": "KAFKA",
                "state": "STARTED",
                "healthSummary": "GOOD",
                "healthChecks": [
                    {
                        "name": "KAFKA_KAFKA_BROKER_HEALTHY",
                        "summary": "GOOD"
                    }
                ]
            },
            {
                "type": "ATLAS",
                "state": "STARTED",
                "healthSummary": "GOOD",
                "healthChecks": [
                    {
                        "name": "ATLAS_ATLAS_SERVER_HEALTHY",
                        "summary": "GOOD"
                    }
                ]
            },
            {
                "type": "KNOX",
                "state": "STARTED",
                "healthSummary": "GOOD",
                "healthChecks": [
                    {
                        "name": "KNOX_IDBROKER_HEALTHY",
                        "summary": "GOOD"
                    },
                    {
                        "name": "KNOX_KNOX_GATEWAY_HEALTHY",
                        "summary": "GOOD"
                    }
                ]
            }
        ]
    }
  • Get status of specified operation: cdp datalake get-operation --crn <value> [--operation-id <value>]

    To use the get-operation command to get the status of a specified event, you need to specify the operation id of the operation. Every operation that starts a process running in the background, like creating, starting, stopping, or repairing a cluster, returns an operationId field in the response.

    Example:
    cdp datalake create-aws-datalake [args]
    
    {
        "datalake": {
            "datalakeName": "foldikzrvdl",
            "crn": "crn:cdp:datalake:us-west-1:cloudera:datalake:ce8b73f3-79ac-4a24-b6e9-c671fcdcdb59",
            "status": "REQUESTED",
            "environmentCrn": "crn:cdp:environments:us-west-1:cloudera:environment:257c1832-da68-4afd-a84d-212639bbe024",
            "creationDate": "2025-03-05T13:46:04.154000+00:00",
            "statusReason": "Datalake requested",
            "enableRangerRaz": false,
            "certificateExpirationState": "VALID",
            "multiAz": false,
            "security": {
                "seLinux": "PERMISSIVE"
            }
        },
        "operationId": "fc951645-eebd-401b-bacb-be3845fbffdd"
    }

    The value of this operationId can be used as the value for the --operation-id option for the get-operation command.

    Example:
    cdp datalake get-operation --crn crn:cdp:datalake:us-west-1:cloudera:datalake:ce8b73f3-79ac-4a24-b6e9-c671fcdcdb59 --operation-id fc951645-eebd-401b-bacb-be3845fbffdd
    Output format:
    {
        "operationId": "identifier of the operation",
        "operationName": "Short name of the operation",
        "operationStatus": "UNKNOWN | RUNNING | FAILED | FINISHED | CANCELLED",
        "started": "Start time of the operation"
        "ended": "End time of the operation if it is completed"
    }
    Output example:
    {
        "operationId": "fc951645-eebd-401b-bacb-be3845fbffdd",
        "operationName": "DataLakeCreate",
        "operationStatus": "RUNNING",
        "started": "2025-03-05T13:46:04+00:00"
    }

    Unsuccessful operation statuses are stored for 2 weeks, while successful status operations are stored for 1 day.

    The operation id is optional, and if it is omitted, the status of the last operation is returned.

  • Get status of latest operation: cdp datalake get-operation --crn <value>

    Example:
    cdp datalake get-operation --crn crn:cdp:datalake:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:datalake:3dd6eed4-e327-4aa4-817f-05a43d21db80
    {
        "operationId": "cf898a54-00ed-406b-9bc6-525401eb5ac5",
        "operationName": "DataLakeCreate",
        "operationStatus": "RUNNING",
        "started": "2025-01-15T09:46:53+00:00"
    }

We want your opinion

How can we improve this page?

What kind of feedback do you have?