cdswctl Command Line Interface Client

Cloudera Data Science Workbench 1.6 and later ships with a CLI client that you can download from the Cloudera Data Science Workbench web UI. The cdswctl client can perform the following tasks:
  • Logging in
  • Creating an SSH endpoint
  • Listing sessions that are starting or running
  • Starting or stopping a session
Other actions, such as creating a project, require you to use the Cloudera Data Science Workbench web UI. For information about the available commands, run the following command:
cdswctl --help

Download and Configure the cdswctl

Before you begin, ensure that the following prerequisites are met:
  • You have an SSH public/private key pair for your local machine.
  • You have Contributor permissions for an existing Cloudera Data Science project. Alternatively, create a new project you have access to.
  • The Site Administrator has not disabled remote editing for Cloudera Data Science Workbench.

(Optional) Generate an SSH Public/Private Key

This task is optional. If you already have an SSH public/private key pair, skip this task. The steps to create an SSH public/private key pair differ based on your operating system. The following instructions are meant to be an example and are written for macOS using ssh-keygen.
  1. Open Terminal.
  2. Run the following command and complete the fields:
    ssh-keygen -t rsa -f ~/.ssh/id_rsa
    Keep the following guidelines in mind:
    • Make sure that the SSH key you generate meets the requirements for the local IDE you want to use. For example, PyCharm requires the -m PEM option because PyCharm does not support modern (RFC 4716) OpenSSH keys.
    • Provide a passphrase when you generate the key pair. Use this passphrase when prompted for the SSH key passphrase.
    • Save the SSH key to the default ~/.ssh location.

Download cdswctl and Add an SSH Key

  1. Open the Cloudera Data Science Workbench web UI and go to Settings > Remote Editing for your user account.
  2. Download cdswctl client for your operating system.
    If you are using the macOS executable, cdswctl will be unsigned and therefore cannot be launched on the recent version of macOS without performing the following additional steps:
    1. In the Finder on your Mac and locate the app you want to open.
      Don’t use Launchpad to do this. Launchpad doesn’t allow you to access the shortcut menu.
    2. Control-click the app icon, then choose Open from the shortcut menu.
    3. Click Open.
      The app is saved as an exception to your security settings, and you can open it in the future by double-clicking it just as you can any registered app.
  3. In the terminal, run cat ~/.ssh/id_rsa.pub. If you used a different filename above when generating the key, use that filename instead. This command prints the key as a string.
  4. Copy the key. It should resemble the following: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCha2J5mW3i3BgtZ25/FOsxywpLVkx1RgmZunI
  5. In SSH public keys for session access, paste the key.
Cloudera Data Science Workbench uses the SSH public key to authenticate your CLI client session, including the SSH endpoint connection to the Cloudera Data Science Workbench deployment. Any SSH endpoints that are running when you add an SSH public key must also be restarted.

Initialize an SSH Connection to Cloudera Data Science Workbench

The following task describes how to establish an SSH endpoint for Cloudera Data Science Workbench. Creating an SSH endpoint is the first step to configuring a remote editor for Cloudera Data Science Workbench.

  1. Log in to Cloudera Data Science Workbench with the CLI client. Depending on your deployment, make sure you add http or https to the URL as shown below:
    cdswctl login -n <username> -u http(s)://cdsw.your_domain.com
    For example, the following command logs the user sample_user into the https://cdsw.your_domain.com deployment:
    cdswctl login -n sample_user -u https://cdsw.your_domain.com
  2. Create a local SSH endpoint to Cloudera Data Science Workbench. Run the following command:
    cdswctl ssh-endpoint -p <username>/<project_name> [-c <CPU_cores>] [-m <memory_in_GB>] [-g <number_of_GPUs>]
    The command uses the following defaults for optional parameters:
    • CPU cores: 1
    • Memory: 1 GB
    • GPUs: 0
    For example, the following command starts a session for the logged-in user sample_user under the customerchurn project with .5 cores, .75 GB of memory, 0 GPUs, and the Python3 kernel:
    cdswctl ssh-endpoint -p customerchurn -c 0.5 -m 0.75

    If you saved your private key to a non-default location, or if you have multiple private keys, you can specify the one that you created above for cdswctl with:

    cdswctl ssh-endpoint -i /path/to/id_rsa -p <username>/<project_name> [-c <CPU_cores>] [-m <memory_in_GB>] [-g <number_of_GPUs>]

    To create an SSH endpoint in a project owned by another user or a team, for example finance, prepend the username to the project and separate them with a forward slash:

    cdswctl ssh-endpoint -p finance/customerchurn -c 0.5 -m 0.75
    This command creates session in the project customerchurn that belongs to the team finance.
    Information for the SSH endpoint appears in the output:
    ...
    You can SSH to it using
    
        ssh -p <some_port> cdsw@localhost
    ...
  3. Open a new command prompt and run the outputted command from the previous step:
    ssh -p <some_port> cdsw@localhost
    For example:
    ssh -p 9750 cdsw@localhost
    You will be prompted for the passphrase for the SSH key you entered in the Cloudera Data Science web UI.
    Once you are connected to the endpoint, you are logged in as the cdsw user and can perform actions as though you are accessing the terminal through the Cloudera Data Science Workbench web UI.
  4. Test the connection.
    If you run ls, the project files associated with the session you created are shown. If you run whoami, the command returns the cdsw user.

Log in to cdswctl

  1. Open the Model CLI client.
  2. Run the following command while specifying the actual values for the variables:
    cdswctl login -u <workspace_url> -n <username> -y <api_key>

    where

    • workspace_url is the workspace URL including the protocol (http(s)://domain.com)
    • username is your user name on the workspace
    • api_key is the API key that you can obtain from the CDSW web UI. Go to Settings > API Keys > and copy the API Key (and not the Model API Key).

    A Login succeeded message is displayed.

    To see more information about the login command parameters, run
    cdswctl login --help

Prepare to manage models using the model CLI

Before you can start using the model CLI to automate model deployment or to perform any other tasks, you must install the scikit-learn machine learning library for Python through the CDSW web UI.

You must perform this task through the CDSW web UI.

  1. Create a new project with Python through the web UI.
    Python provides sample files that you can use to create models using CLI.
  2. To start a new session, go to the Sessions page from the left navigation panel and click New Session.
    The Start A New Session page is displayed.
  3. On the Start A New Session page, select Python 3 from the Engine Kernel drop-down menu, and click Start Session.
    A new “Untitled Session” is created.
  4. From the input prompt, install the scikit-learn machine learning library for Python by running the following command:
    !pip3 install sklearn
  5. Open the fit.py file available within your project from the left navigation panel.
    You can use the fit.py file to create a fitted model which creates a model.pkl file that you can use to deploy the actual model.
  6. Run the fit.py file by clicking Run > Run all.
    The model.pkl directory is created that you can see within your project on the left navigation pane.
  7. Close the session by clicking Stop.

Create a model using the CLI

  1. Open a terminal window and log into cdswctl.
  2. Obtain the engine image ID and the project ID as described in the following steps:
    1. Run the following command:
      cdswctl projects list
      The project ID, your username, and the project name are displayed. For example:
      1: john-smith/petal-length-predictor
    2. Note the project ID, which is a number in front of your project name.
      In this case, it is "1".
  3. Run the following command while specifying the project name and note the engine image ID:
    cdswctl engine-images list -p <project-name>
    For example,
    cdswctl engine-images list -p john-smith/petal-length-predictor
  4. Create a model by using the following command:
    cdswctl models create 
    --kernel="python3" 
    --targetFilePath="predict.py" 
    --targetFunctionName="predict" 
    --name="Petal Length Predictor" 
    --cpuMillicores=1000 
    --memoryMb=2000 
    --description="Model of the Iris dataset" 
    --replicationType=fixed 
    --numReplicas=1 
    --visibility="private" 
    --autoBuildModel 
    --autoDeployModel 
    --projectId=<project ID> 
    --examples='{"request":{"petal_length":1}}'  
    --engineImageId=<engine image ID from before>
    If the command runs successfully, the system displays the model details in a JSON format.
    For more information about the models create command parameters, run the following command:
    cdswctl models create --help

View replica logs for a model using the CLI

When a model is deployed, CDSW enables you to specify the number of replicas that must be deployed to serve requests. If a replica crashes or fails to come up, you can diagnose it by viewing the logs for every replica using the model CLI.

  1. Obtain the modelReplicaId by using the following command:
    cdswctl models listReplicas --modelDeploymentId=<model_deployment_ID>
    where the model_deployment_ID is the ID of a successfully deployed model.
  2. To view the replica logs, run the following command:
    cdswctl models getReplicaLogs --modelDeploymentId=<model_deployment_ID> --modelReplicaId="<replica_ID>" --streams=stdout
    For example:
    cdswctl models getReplicaLogs --modelDeploymentId=2 --modelReplicaId="petal-length-predictor-1-2-6d6496b467-hp6tz" --streams=stdout
    The valid values for the streams parameter are stdout and stderr.