Initialize an SSH Connection to Cloudera Data Science Workbench for VS Code

The following task describes how to establish an SSH endpoint for Cloudera Data Science Workbench. Creating an SSH endpoint is the first step to configuring a remote editor for Cloudera Data Science Workbench.

  1. Log in to Cloudera Data Science Workbench with the CLI client.
    cdswctl login -n <username> -u http(s)://cdsw.your_domain.com
    For example, the following command logs the user sample_user into the https://cdsw.your_domain.com deployment:
    cdswctl login -n sample_user -u https://cdsw.your_domain.com
  2. Create a local SSH endpoint to Cloudera Data Science Workbench.
    Run the following command:
    cdswctl ssh-endpoint -p <username>/<project_name> [-c <CPU_cores>] [-m <memory_in_GB>] [-g <number_of_GPUs>] [-r <runtime ID> ]

    If the project is configured to use ML runtimes, the -r parameter must be specified, otherwise it must be omitted. To retrieve the Runtime ID, use the following command:

    cdswctl runtimes list

    See Using ML runtimes with cdswctl documentation page for more information.

    The command uses the following defaults for optional parameters:
    • CPU cores: 1
    • Memory: 1 GB
    • GPUs: 0
    For example, the following command starts a session for the logged-in user sample_user under the customerchurn project with .5 cores, .75 GB of memory, 0 GPUs, and the Python3 kernel:
    cdswctl ssh-endpoint -p customerchurn -c 0.5 -m 0.75

    To create an SSH endpoint in a project owned by another user or a team, for example finance, prepend the username to the project and separate them with a forward slash:

    cdswctl ssh-endpoint -p finance/customerchurn -c 0.5 -m 0.75
    This command creates session in the project customerchurn that belongs to the team finance.
    Information for the SSH endpoint appears in the output:
    ...
    You can SSH to it using
    
        ssh -p <some_port> cdsw@localhost
    ...
  3. Open a new command prompt and run the outputted command from the previous step:
    ssh -p <some_port> cdsw@localhost
    For example:
    ssh -p 7847 cdsw@localhost
    You will be prompted for the passphrase for the SSH key you entered in the Cloudera Data Science Workbench web UI.
    The public key could be rejected when the new ssh key pair is generated with a special name such as id_rsa_systest. If the public key is rejected, you must add the following information to the ~/.ssh/config file:
    Host * 
              AddKeysToAgent yes 
              StrictHostKeyChecking no 
              IdentityFile ~/.ssh/id_rsa_cdswctl
    Once you are connected to the endpoint, you are logged in as the cdsw user and can perform actions as though you are accessing the terminal through the Cloudera Data Science Workbench web UI.
  4. Test the connection.
    If you run ls, the project files associated with the session you created are shown. If you run whoami, the command returns the cdsw user.
    Once you are connected, you should see something like this:
    $ cdswctl ssh-endpoint -p ml-at-scale -m 4 -c 2
    Forwarding local port 7847 to port 2222 on session bhsb7k4eqmonap62 in project finance/customerchurn.
    You can SSH to the session using
    
        ssh -p 7847 cdsw@localhost
  5. Add an entry into your SSH config file.
    For example:
    $ cat ~/.ssh/config
    Host cdsw-public
        HostName localhost
        IdentityFile ~/.ssh/id_rsa
        User cdsw
        Port 7847
    HostName is always localhost and User is always cdsw. You get the Port number from Step 2.