CDP CLI for Cloudera Lakehouse Optimizer

Use the CDP CLI commands to create, update, list, delete, and perform other maintenance activities on Cloudera Lakehouse Optimizer Data Hub, policies, and associations. The CDP CLI commands for Cloudera Lakehouse Optimizer are under the "lakehouseopt" CDP CLI option.

Prerequisites for using the CDP CLI

  1. You must install a CDP CLI client.

    For instructions about installing a CDP CLI client, see Installing CDP CLI client.

  2. You must log into CDP CLI.
    Choose one of the following methods to log into CDP CLI:
    • Interactive method. This login method grants a 12-hour access key to the CLI. For more information, see Logging into CLI/SDK.
    • Traditional method. In this method, you generate access credentials and configure the ~/.cdp/credentials file with the key pair. This login method allows you to withdraw the access permission by removing the access credentials from the ~/.cdp/credentials file. For more information, see Generating an API access key and Configuring CDP client with the API access key.

CLI help

CDP CLI includes help that can be accessed by using the cdp help command. For information about a specific CDP CLI, use the cdp [***MODULE-NAME***] [***COMMAND-NAME***] help command.

You can also find all of the CDP CLI commands in the CDP CLI Reference documentation.

CDP CLI options for Cloudera Lakehouse Optimizer

Use the CDP CLI commands to create, update, list, delete, and perform other maintenance activities on Cloudera Lakehouse Optimizer Data Hub, policies, and associations.

CDP CLI options

You can use the following CDP CLI options to perform tasks in Cloudera Lakehouse Optimizer:

CDP CLI option Description
change-table-policy Updates the maintenance policy associated with a table within the specified namespace and operation. You can perform the following operations on the table:
  • APPEND — Adds the specified policies to a set of existing associated policies.
  • REMOVEALL — Deletes all the existing associations for the specified table.
  • REPLACEALL — Replaces the existing associations with the specified set of policies.
create-associations Creates associations between a set of tables in a namespace and a policy.
create-policy Creates a policy with the specified name using the specified resources.
create-table-policies-associations Creates associations between the specified table and one or more policies.
delete-policy Permanently deletes the policy definition applicable to the specified resource scope. The resource scope refers to the hierarchical level at which the policy is defined. You can provide one of the following scopes:
  • * –– Deletes the specified policy at the catalog level.
  • [***CATALOG***].[***NAMESPACE***].* –– Deletes the policy at the namespace level.
  • * for a namespace –– Deletes all the policies across all the namespaces.
  • Specify a namespace and table –– Deletes the policy associated with the specified table.
download-policy Downloads the policy scripts, arguments, or both for the specified policy version to your local machine. The policy version is identified by its unique ID.
execute-policy Runs the specified policy on the specified table. Enter one of the following options as necessary:
  • dry-run — Generates table maintenance actions without initiating them.
  • no-dry-run — Generates and initiates the table maintenance actions.
get-associated-namespaces Lists all the available namespaces for the specified policy, and displays the isAssociated flag as true for the namespaces associated with the policy.
get-associated-tables Lists the tables in every namespace for the specified policy, and also displays the isAssociated flag as true for the table associated with the policy.
get-association-details Lists comprehensive details about all the direct (table-level) and star (namespace-level) policy associations. The output includes the following details:
  • directAssociation — Displays the direct table-to-policy associations for each namespace.
  • starAssociation — Displays the association details of the policy to the entire namespace in each catalog.
get-associations Lists the tables associated in each namespace for the specified policy.
get-clo-datahub Displays the Cloudera Lakehouse Optimizer Data Hub details in the specified environment.
get-datahub-crn Displays the Cloudera Lakehouse Optimizer Data Hub Cloudera Resource Name (CRN) in the specified environment.
get-health Displays the health status of the components in the Cloudera Lakehouse Optimizer service.

To display complete health check details, set the scope to Full. By default, the scope is Partial.

get-namespaces Displays all the available namespaces in the catalog.
get-policy Displays the details of the specified policy for all the available policy version IDs.
get-policy-names Lists the policy names associated with the specified namespace. Enter * to display all the policies across all the namespaces.
get-scripts Displays the default policy scripts and the scripts defined at the catalog level and their URI in the specified Data Hub.
get-table-names Lists the tables in the specified namespace.
get-table-policies Lists all the policies and their details associated with the specified table in the specified namespace.
get-table-status-details Displays the table execution details and the current status of the specified policy for the specified table in the specified namespace.

For paused tables, the system displays the pause message, paused time, and the reasons for the paused state.

For active tables, the system displays the list of actions that ran with a policy on the table, the policy name, and the last execution time.

The output can display the following policy statuses:
  • UNAVAILABLE – Indicates that no tasks ran recently.
  • SUCCESS — Indicates that the most recent task completed successfully.
  • ERROR — Indicates that the most recent task failed.
  • PAUSED — Indicates that the table is in a paused state.
get-timezones Lists the supported time zones and their details to help schedule policies.
list-environments Displays the total number of environments and environment details. The environment details include the environment name and CRN, whether the Cloudera Lakehouse Optimizer service is enabled for the environment, and the Cloudera Lakehouse Optimizer Data Hub details if the service is enabled.
list-policies Displays the policies and policy details across all the namespaces or the specified namespace. The policy details include the policy name, the number of tasks run for the policy, the number of tables associated with the policy, the policy creation time, the last task execution time, and whether the policy is a default policy.
list-tables Displays all the Iceberg tables in the specified namespace, the associated policies for each table, and the last task execution time.
list-tasks
Displays the list of tasks and task details for the specified namespace. The task details include the current task status, task ID, policy name, catalog name, namespace name, table name, and task creation time. The output can display the following task statuses:
  • INIT — Indicates that the task is initiated.
  • SUBMITTED — Indicates that the task is submitted to the Spark engine.
  • COMPLETED — Indicates that the task completed successfully.
  • FAILED — Indicates that the task failed to complete.
pause-table Pauses the maintenance tasks for the specified table in the specified namespace.
unpause-table Resumes the maintenance tasks for the specified table in the specified namespace.
unsubscribe-policy Removes the policy association of the specified policy from the specified table in the specified namespace.
update-policy Updates the specified policy based on the specified resource values. The resources include the JEXL script and the base64 encoded version of policy arguments in the JSON file.

Creating and associating Cloudera Lakehouse Optimizer policies

Create a Cloudera Lakehouse Optimizer policy and associate tables and namespaces to the policies.

  1. Log into the Cloudera Lakehouse Optimizer CDP CLI.
    cdp lakehouseopt
  2. Fetch the Cloudera Lakehouse Optimizer Data Hub CRN for the environment using one of the following commands:
    cdp  lakehouseopt get-datahub-crn --environment-crn [***ENVIRONMENT CRN***]
    cdp  lakehouseopt get-clo-datahub --environment-crn [***ENVIRONMENT CRN***]
  3. Perform a health check, and then verify whether the ClouderaAdaptive default policy is available.
    cdp  lakehouseopt  get-health --datahub-crn [*** DATA HUB CRN***] --scope [*** ENTER full OR partial***]
    cdp  lakehouseopt  get-scripts --datahub-crn [***DATA HUB CRN***]
  4. Create a Cloudera Lakehouse Optimizer policy.
    cdp  lakehouseopt  create-policy --datahub-crn  [***DATA HUB CRN***]  --policy-name  [***POLICY NAME***]  --resources  [***ENTER base64 ENCODED POLICY ARGUMENT AND CONSTANTS JSON***]
    For example, the following CDP CLI command creates a policy:
    cdp lakehouseopt create-policy --datahub-crn crn:cdp:datahub:abcd --policy-name cli_policy_1 --resources '{  "arguments": "eyJzY3JpcHQiOiJkbG06Ly90cHM6ZGVmYXVsdC9DbG91ZGVyYUFkYXB0aXZlIiwiY3JvbiI6IjAgMCA0ICogKiA/ICoiLCJleHBpcmVTbmFwc2hvdCI6eyJlbmFibGVkIjp0cnVlLCJleHBpcmVPbGRlclRoYW4iOjQzMjAwMDAwMCwicmV0YWluTGFzdCI6NSwiY2xlYW5FeHBpcmVkRmlsZXMiOnRydWUsImV4cGlyZVNuYXBzaG90SWQiOm51bGx9LCJyZXdyaXRlRGF0YUZpbGVzIjp7ImVuYWJsZWQiOnRydWUsInRhcmdldEZpbGVTaXplIjo1MzY4NzA5MTIsIm1heENvbmN1cnJlbnRSZXdyaXRlRmlsZUdyb3VwcyI6NSwibWluSW5wdXRGaWxlcyI6NSwicGFydGlhbFByb2dyZXNzTWF4Q29tbWl0cyI6MTAsImRlbGV0ZUZpbGVUaHJlc2hvbGQiOjIwMDAwMDAsInBhcnRpYWxQcm9ncmVzc0VuYWJsZWQiOmZhbHNlLCJ1c2VTdGFydGluZ1NlcXVlbmNlTnVtYmVyIjpmYWxzZSwicmV3cml0ZUFsbCI6ZmFsc2V9LCJyZXdyaXRlTWFuaWZlc3QiOnsiZW5hYmxlZCI6dHJ1ZSwidGFyZ2V0RmlsZVNpemUiOjgzODg2MDgsInVzZUNhY2hpbmciOnRydWV9LCJkZWxldGVPcnBoYW5GaWxlcyI6eyJlbmFibGVkIjp0cnVlLCJvbGRlclRoYW4iOjI1OTIwMDAwMH0sInJld3JpdGVQb3NpdGlvbkRlbGV0ZSI6eyJlbmFibGVkIjp0cnVlLCJ0YXJnZXRGaWxlU2l6ZSI6NjcxMDg4NjQsIm1heENvbmN1cnJlbnRHcm91cFJld3JpdGUiOjUsIm1pbklucHV0RmlsZXMiOjUsInBhcnRpYWxQcm9ncmVzc01heENvbW1pdHMiOjEwLCJwYXJ0aWFsUHJvZ3Jlc3NFbmFibGVkIjp0cnVlfSwidGltZXpvbmUiOiJBc2lhL0NhbGN1dHRhIiwiZGVzY3JpcHRpb24iOiIifQ=="}'
    The arguments inside the hash decode into the following structure:
    {"script":"dlm://tps:default/ClouderaAdaptive","cron":"0 0 4 * * ? *","expireSnapshot":{"enabled":true,"expireOlderThan":432000000,"retainLast":5,"cleanExpiredFiles":true,"expireSnapshotId":null},"rewriteDataFiles":{"enabled":true,"targetFileSize":536870912,"maxConcurrentRewriteFileGroups":5,"minInputFiles":5,"partialProgressMaxCommits":10,"deleteFileThreshold":2000000,"partialProgressEnabled":false,"useStartingSequenceNumber":false,"rewriteAll":false},"rewriteManifest":{"enabled":true,"targetFileSize":8388608,"useCaching":true},"deleteOrphanFiles":{"enabled":true,"olderThan":259200000},"rewritePositionDelete":{"enabled":true,"targetFileSize":67108864,"maxConcurrentGroupRewrite":5,"minInputFiles":5,"partialProgressMaxCommits":10,"partialProgressEnabled":true},"timezone":"Asia/Calcutta","description":""}
  5. Create associations to a namespace or tables.
    cdp lakehouseopt create-associations --datahub-crn [***DATA HUB CRN***] --policy-name [***POLICY NAME***] --associations [***ENTER ASSOCIATION DETAILS***]
    Examples:
    • cdp lakehouseopt create-associations --datahub-crn crn:cdp:datahub:abcd --policy-name cli_policy_1 --associations '[{"namespace": "finance","tables": ["*"]}]'
    • cdp lakehouseopt create-associations --datahub-crn crn:cdp:datahub:abcd --policy-name cli_policy_1 --associations file://association.json
      
  6. Run the policy to initiate the maintenance activity.
    cdp lakehouseopt  execute-policy --datahub-crn [***DATA HUB CRN***] --namespace [*** NAMESPACE NAME***] --table-name [*** TABLE NAME***] --policy-name [***POLICY NAME***]

    Append – dry-run to verify the generated maintenance actions before you run the policy.

Managing and monitoring Cloudera Lakehouse Optimizer table maintenance tasks

Perform several monitoring tasks using CDP CLI commands, including fetching the maintenance tasks for the specified namespace, and pausing and resuming table maintenance.

  • Get the maintenance tasks for a given namespace.
    cdp lakehouseopt  list-tasks --datahub-crn [***DATA HUB CRN***] --namespace [***NAMESPACE NAME***]
  • Pause table maintenance for the specified table.
    cdp lakehouseopt pause-table --datahub-crn [***DATA HUB CRN***] --namespace [***NAMESPACE NAME***] --table-name [***TABLE NAME***]
  • Resume table maintenance for the specified table.
    cdp lakehouseopt unpause-table --datahub-crn [***DATA HUB CRN***] --namespace [***NAMESPACE NAME***] --table-name [***TABLE NAME***]
  • Update an existing policy.
    cdp lakehouseopt  update-policy --datahub-crn  [***DATA HUB CRN***]  --policy-name [*** POLICY NAME***]  --resources  [***ENTER base64 ENCODED POLICY ARGUMENT/CONSTANTS JSON***]
  • Remove a policy associated with a table.
    cdp lakehouseopt  unsubscribe-policy --datahub-crn  [***DATA HUB CRN***]  --policy-name [*** POLICY NAME***]   --table-name  [***TABLE NAME***]
  • Associate new policies, remove existing policies, or replace existing policies to a table.
    cdp lakehouseopt change-table-policy  --datahub-crn  [***DATA HUB CRN***]  --policy-name [*** POLICY NAME***]   --table-name  [***TABLE NAME***] --operation [*** ENTER APPEND, REMOVEALL, or REPLACEALL *** ] --policies [*** COMMA SEPARATED POLICY NAMES *** ]
  • Download the policy scripts, arguments, or both for the specified policy version.
    cdp lakehouseopt  --policy-name  [*** POLICY NAME *** ]   --policy-version  [***POLICY VERSION ID***] --datahub-crn  [***DATA HUB CRN***]
  • Delete the specified policy.
    cdp  lakehouseopt  create-policy --datahub-crn  [***DATA HUB CRN***]  --policy-name  [*** POLICY NAME***]  --resource-scope [***ENTER * FOR CATALOG LEVEL, catalog.namespace.* FOR NAMESPACE LEVEL***]