Configure PyCharm as a Local IDE
- You have an edition of PyCharm that supports SSH, such as the Professional Edition.
- You have an SSH public/private key pair for your local machine that is compatible with PyCharm. If you use OpenSSH to generate the key, include the -m PEM option because PyCharm does not support modern (RFC 4716) OpenSSH keys.
- You have Contributor permissions for an existing Cloudera Data Science project. Alternatively, create a new project you have access to.
Download cdswctl and Add an SSH Key
- Open the Cloudera Data Science Workbench web UI and go to for your user account.
- Download cdswctl client for your operating system.
If you are using the macOS executable, cdswctl will be unsigned and therefore cannot be launched on the recent version of macOS without performing the following additional steps:
- In the Finder on your Mac and locate the app you want to open.
Don’t use Launchpad to do this. Launchpad doesn’t allow you to access the shortcut menu.
- Control-click the app icon, then choose Open from the shortcut menu.
- Click Open.
The app is saved as an exception to your security settings, and you can open it in the future by double-clicking it just as you can any registered app.
- In the Finder on your Mac and locate the app you want to open.
- Add your SSH public key to SSH public keys for session access.
Cloudera Data Science Workbench uses the SSH public key to authenticate your CLI client session, including the SSH endpoint connection to the Cloudera Data Science Workbench deployment.
Any SSH endpoints that are running when you add an SSH public key must also be restarted.
- In the terminal, run cat ~/.ssh/id_rsa.pub. If you used a different filename above when generating the key, use that filename instead. This command prints the key as a string.
- Copy the key. It should resemble the following: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCha2J5mW3i3BgtZ25/FOsxywpLVkx1RgmZunI
- In SSH public keys for session access, paste the key.
Initialize an SSH Connection to Cloudera Data Science Workbench
The following task describes how to establish an SSH endpoint for Cloudera Data Science Workbench. Creating an SSH endpoint is the first step to configuring a remote editor for Cloudera Data Science Workbench.
- Log in to Cloudera Data Science Workbench with the CLI client. Depending on your deployment, make sure you add http
or https to the URL as shown below:
cdswctl login -n <username> -u http(s)://cdsw.your_domain.comFor example, the following command logs the user sample_user into the https://cdsw.your_domain.com deployment:
cdswctl login -n sample_user -u https://cdsw.your_domain.com
- Create a local SSH endpoint to Cloudera Data Science Workbench. Run the following command:
cdswctl ssh-endpoint -p <username>/<project_name> [-c <CPU_cores>] [-m <memory_in_GB>] [-g <number_of_GPUs>]The command uses the following defaults for optional parameters:
For example, the following command starts a session for the logged-in user sample_user under the customerchurn project with .5 cores, .75 GB of memory, 0 GPUs, and the Python3 kernel:
- CPU cores: 1
- Memory: 1 GB
- GPUs: 0
cdswctl ssh-endpoint -p customerchurn -c 0.5 -m 0.75
To create an SSH endpoint in a project owned by another user or a team, for example finance, prepend the username to the project and separate them with a forward slash:
cdswctl ssh-endpoint -p finance/customerchurn -c 0.5 -m 0.75This command creates session in the project customerchurn that belongs to the team finance.Information for the SSH endpoint appears in the output:
... You can SSH to it using ssh -p <some_port> cdsw@localhost ...
- Open a new command prompt and run the outputted command from the previous step:
ssh -p <some_port> cdsw@localhostFor example:
ssh -p 9750 cdsw@localhostYou will be prompted for the passphrase for the SSH key you entered in the Cloudera Data Science web UI.Once you are connected to the endpoint, you are logged in as the cdsw user and can perform actions as though you are accessing the terminal through the Cloudera Data Science Workbench web UI.
- Test the connection.
If you run ls, the project files associated with the session you created are shown. If you run whoami, the command returns the cdsw user.
Add Cloudera Data Science Workbench as an Interpreter for PyCharm
- Verify that the SSH endpoint for Cloudera Data Science Workbench is running with cdswctl. If the endpoint is not running, start it.
- Open PyCharm.
- Create a new project.
- Expand Project Interpreter and select Existing interpreter.
- Click on ... and select SSH Interpreter
- Select New server configuration and complete the fields:
- Host: localhost
- Port: <port_number>
This is the port number provided by cdswctl.
- Username: cdsw
- Select Key pair and complete the fields using the RSA private key that corresponds to the public key you added to
the Remote Editing tab in the Cloudera Data Science Workbench web UI..
For macOS users, you must add your RSA private key to your keychain. In a terminal window, run the following command:
ssh-add -K <path to your prviate key>/<private_key>
- Complete the wizard. Based on the Python version you want to use, enter one of the following parameters:
You are returned to the New Project window. Existing interpreter is selected, and you should see the connection to Cloudera Data Science Workbench in the Interpreter field.
- For Python 2: /usr/local/bin/python
- For Python 3: /usr/local/bin/python3
- In the Remote project location field, specify the following directory:
- Create the project.
(Optional) Configure the Sync Between Cloudera Data Science Workbench and PyCharm
- In your project, go to Preferences.
Depending on your operating system, Preferences may be called Settings.
- Go to Build, Execution, Deployment and select Deployment.
- On the Connection tab, add the following path to the Root path field:
- On the Excluded Paths tab, add any paths you want to exclude.
Cloudera recommends excluding the following paths at a minimum:
- Optionally, add a Deployment path on the Mappings tab if the code for your Cloudera Data Science Workbench project lives in a subdirectory of the root path.
- Expand Deployment in the left navigation and go to and set the behavior to adhere to the policies set forth by IT and
the Site Administrator.
Cloudera recommends setting the behavior to Automatic upload because the data remains on the cluster while your changes get uploaded.
- Sync for the project file(s) to your machine and begin editing.