cdswctl Command Line Interface Client

Cloudera Data Science Workbench 1.6 and later ships with a CLI client that you can download from the Cloudera Data Science Workbench web UI. The cdswctl client can perform the following tasks:
  • Logging in
  • Creating an SSH endpoint
  • Listing sessions that are starting or running
  • Starting or stopping a session
Other actions, such as creating a project, require you to use the Cloudera Data Science Workbench web UI. For information about the available commands, run the following command:
cdswctl help

Download and Configure the cdswctl

Before you begin, ensure that the following prerequisites are met:
  • You have an SSH public/private key pair for your local machine.
  • You have Contributor permissions for an existing Cloudera Data Science project. Alternatively, create a new project you have access to.
  • The Site Administrator has not disabled remote editing for Cloudera Data Science Workbench.

(Optional) Generate an SSH Public/Private Key

This task is optional. If you already have an SSH public/private key pair, skip this task. The steps to create an SSH public/private key pair differ based on your operating system. The following instructions are meant to be an example and are written for macOS using ssh-keygen.
  1. Open Terminal.
  2. Run the following command and complete the fields:
    ssh-keygen -t rsa -f ~/.ssh/id_rsa
    Keep the following guidelines in mind:
    • Make sure that the SSH key you generate meets the requirements for the local IDE you want to use. For example, PyCharm requires the -m PEM option because PyCharm does not support modern (RFC 4716) OpenSSH keys.
    • Provide a passphrase when you generate the key pair. Use this passphrase when prompted for the SSH key passphrase.
    • Save the SSH key to the default ~/.ssh location.

Download cdswctl and Add an SSH Key

  1. Open the Cloudera Data Science Workbench web UI and go to Settings > Remote Editing for your user account.
  2. Download cdswctl client for your operating system.
    If you are using the macOS executable, cdswctl will be unsigned and therefore cannot be launched on the recent version of macOS without performing the following additional steps:
    1. In the Finder on your Mac and locate the app you want to open.
      Don’t use Launchpad to do this. Launchpad doesn’t allow you to access the shortcut menu.
    2. Control-click the app icon, then choose Open from the shortcut menu.
    3. Click Open.
      The app is saved as an exception to your security settings, and you can open it in the future by double-clicking it just as you can any registered app.
  3. In the terminal, run cat ~/.ssh/id_rsa.pub. If you used a different filename above when generating the key, use that filename instead. This command prints the key as a string.
  4. Copy the key. It should resemble the following: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCha2J5mW3i3BgtZ25/FOsxywpLVkx1RgmZunI
  5. In SSH public keys for session access, paste the key.
Cloudera Data Science Workbench uses the SSH public key to authenticate your CLI client session, including the SSH endpoint connection to the Cloudera Data Science Workbench deployment. Any SSH endpoints that are running when you add an SSH public key must also be restarted.

Initialize an SSH Connection to Cloudera Data Science Workbench

The following task describes how to establish an SSH endpoint for Cloudera Data Science Workbench. Creating an SSH endpoint is the first step to configuring a remote editor for Cloudera Data Science Workbench.

  1. Log in to Cloudera Data Science Workbench with the CLI client. Depending on your deployment, make sure you add http or https to the URL as shown below:
    cdswctl login -n <username> -u http(s)://cdsw.your_domain.com
    For example, the following command logs the user sample_user into the https://cdsw.your_domain.com deployment:
    cdswctl login -n sample_user -u https://cdsw.your_domain.com
  2. Create a local SSH endpoint to Cloudera Data Science Workbench. Run the following command:
    cdswctl ssh-endpoint -p <username>/<project_name> [-c <CPU_cores>] [-m <memory_in_GB>] [-g <number_of_GPUs>]
    The command uses the following defaults for optional parameters:
    • CPU cores: 1
    • Memory: 1 GB
    • GPUs: 0
    For example, the following command starts a session for the logged-in user sample_user under the customerchurn project with .5 cores, .75 GB of memory, 0 GPUs, and the Python3 kernel:
    cdswctl ssh-endpoint -p customerchurn -c 0.5 -m 0.75

    To create an SSH endpoint in a project owned by another user or a team, for example finance, prepend the username to the project and separate them with a forward slash:

    cdswctl ssh-endpoint -p finance/customerchurn -c 0.5 -m 0.75
    This command creates session in the project customerchurn that belongs to the team finance.
    Information for the SSH endpoint appears in the output:
    ...
    You can SSH to it using
    
        ssh -p <some_port> cdsw@localhost
    ...
  3. Open a new command prompt and run the outputted command from the previous step:
    ssh -p <some_port> cdsw@localhost
    For example:
    ssh -p 9750 cdsw@localhost
    You will be prompted for the passphrase for the SSH key you entered in the Cloudera Data Science web UI.
    Once you are connected to the endpoint, you are logged in as the cdsw user and can perform actions as though you are accessing the terminal through the Cloudera Data Science Workbench web UI.
  4. Test the connection.
    If you run ls, the project files associated with the session you created are shown. If you run whoami, the command returns the cdsw user.