Launching profilers using the command-line

Cloudera Data Catalog supports launching profilers using the Command-Line Interface (CLI) option.

The CLI is one executable and does not have any external dependencies. You can execute some operations in the Cloudera Data Catalog service using the Cloudera CLI commands.

Users must have valid permissions to launch profilers on a data lake.

For more information about the access details, see Prerequisites to access Cloudera Data Catalog.

Prerequisites

You must have the following entitlement granted to use this feature:

DATA_CATALOG_ENABLE_API_SERVICE

For more information about the Cloudera command-line interface and setting up the same, see Cloudera CLI.

The Cloudera Data Catalog CLI

In your Cloudera CLI environment, enter the following command to get started in the CLI mode.

cdp datacatalog --help

This command provides information about the available commands in Cloudera Data Catalog for Cloudera on cloud 7.2.18. and earlier versions.

The output is displayed as:
NAME
datacatalog
DESCRIPTION
Cloudera Data Catalog Service is a web service, using this service user can execute operations like launching profilers in Data Catalog.
AVAILABLE SUBCOMMANDS
launch-profilers

Parameters for profiler launch command

You get additional information about this command by using:

cdp datacatalog launch-profilers --help

NAME
launch-profilers -
DESCRIPTION
Launches DataCatalog profilers in a given datalake.
NAME
       launch-profilers - Launches DataCatalog profilers in a given datalake.

DESCRIPTION
       Launches DataCatalog profilers in a given datalake.

SYNOPSIS

            launch-profilers
          --datalake <value>
          [--enable-ha | --no-enable-ha]
          [--profilers <value>]
          [--instance-types <value>]
          [--max-nodes <value>]
          [--cli-input-json <value>]
          [--generate-cli-skeleton]

OPTIONS
       --datalake (string)
          The CRN of the Datalake.

       --enable-ha | --no-enable-ha (boolean)
          Enables High Availability (HA) for datacatalog profilers (default
          value is false). The High Availability (HA) Profiler cluster
          provides failure resilience and scalability but incurs additional
          cost.

       --profilers (array)
          List of profiler names that need to be launched. (Applicable only
          for compute cluster enabled environments).

       Syntax:

          "string" "string" ...

       --instance-types (array)
          List of instance types to be used for the auto-scaling node group
          setup (Applicable only for compute cluster enabled environments).

       Syntax:

          "string" "string" ...

       --max-nodes (integer)
          Maximum number of nodes that can be spawned inside the auto-scaling
          node group, in the range of 30 to 100 (both inclusive). (Applicable
          only for compute cluster enabled environments).
        
       --cli-input-json (string)
          Performs service operation based on the JSON string provided. The
          JSON string follows the format provided by --generate-cli-skeleton.
          If other arguments are provided on the command line, the CLI values
          will override the JSON-provided values.

       --generate-cli-skeleton (boolean)
          Prints a sample input JSON to standard output. Note the specified
          operation is not run if this argument is specified. The sample input
          can be used as an argument for --cli-input-json.

OUTPUT
       success -> (boolean)
          Status of the profiler launch operation.

       

FORM FACTORS
       public

Parameters for profiler delete command

You get additional information about this command by using:

cdp datacatalog delete-profiler --help

NAME
       delete-profiler - Deletes DataCatalog profiler in a given datalake.

DESCRIPTION
       Deletes DataCatalog profiler in a given datalake.

SYNOPSIS

            delete-profiler
          --datalake <value>
          [--cli-input-json <value>]
          [--generate-cli-skeleton]

OPTIONS
       --datalake (string)
          The CRN of the Datalake.

       --cli-input-json (string)
          Performs service operation based on the JSON string provided. The
          JSON string follows the format provided by --generate-cli-skeleton.
          If other arguments are provided on the command line, the CLI values
          will override the JSON-provided values.

       --generate-cli-skeleton (boolean)
          Prints a sample input JSON to standard output. Note the specified
          operation is not run if this argument is specified. The sample input
          can be used as an argument for --cli-input-json.

OUTPUT
FORM FACTORS
       public

Launching the profiler

You can use the following CLI command to launch the data profiler:

cdp datacatalog launch-profilers --datalake [***DATALAKE CRN***]

Example:

cdp datacatalog launch-profilers --datalake crn:cdp:datalake:datacentername:c*****b-ccce-4**d-a**1-8********8:datalake:4*****5e-c**1-4**2-8**e-1********2
{
    "success": true
}