(Optional) Configure the Sync Between Cloudera Data Science Workbench and PyCharm

Configuring what files PyCharm ignores can help you adhere to IT policies.

Before you configure syncing behavior between the remote editor and Cloudera Data Science Workbench, ensure that you understand the policies set forth by IT and the Site Administrator. For example, a policy might require that data remains within the Cloudera Data Science Workbench deployment but allow you to download and edit code.
  1. In your project, go to Preferences.
    Depending on your operating system, Preferences may be called Settings.
  2. Go to Build, Execution, Deployment and select Deployment.
  3. On the Connection tab, add the following path to the Root path field:
    /home/cdsw
  4. On the Excluded Paths tab, add any paths you want to exclude.
    Cloudera recommends excluding the following paths at a minimum:
    • /home/cdsw/.local
    • /home/cdsw/.cache
    • /home/cdsw/.ipython
    • /home/cdsw/.ipython
    • /home/cdsw/.oracle_jre_usage
    • /home/cdsw/.pip
    • /home/cdsw/.pycharm_helpers
  5. Optionally, add a Deployment path on the Mappings tab if the code for your Cloudera Data Science Workbench project lives in a subdirectory of the root path.
  6. Expand Deployment in the left navigation and go to Options > Upload changed files automatically to the default server and set the behavior to adhere to the policies set forth by IT and the Site Administrator.

    Cloudera recommends setting the behavior to Automatic upload because the data remains on the cluster while your changes get uploaded.

  7. Sync for the project file(s) to your machine and begin editing.