Creating a Project with Legacy Engine Variants

Projects create an independent working environment to hold your code, configuration, and libraries for your analysis. This topic describes how to create a project with Legacy Engine variants in Cloudera Machine Learning.

  1. Go to Cloudera Machine Learning and on the left sidebar, click Projects.
  2. Click New Project.
  3. If you are a member of a team, from the drop-down menu, select the Account under which you want to create this project. If there is only one account on the deployment, you will not see this option.
  4. Enter a Project Name.
  5. Select Project Visibility from one of the following options.
    • Private - Only project collaborators can view or edit the project.
    • Team - If the project is created under a team account, all members of the team can view the project. Only explicitly-added collaborators can edit the project.
    • Public - All authenticated users of Cloudera Machine Learning will be able to view the project. Collaborators will be able to edit the project.
  6. Under Initial Setup, you can either create a blank project, or select one of the following sources for your project files.
    • Built-in Templates - Template projects contain example code that can help you get started with Cloudera Machine Learning. They are available in R, Python, PySpark, and Scala. Using a template project is not required, but it helps you start using Cloudera Machine Learning right away.

    • Custom Templates - Site administrators can add template projects that are customized for their organization's use-cases. For details, see Custom Template Projects.

    • Local - If you have an existing project on your local disk, use this option to upload compressed files or folders to Cloudera Machine Learning.

    • Git - If you already use Git for version control and collaboration, you can continue to do so with Cloudera Machine Learning. Specifying a Git URL will clone the project into Cloudera Machine Learning. If you use a Git SSH URL, a private SSH key associated with your CML user account is used to clone the repository. This is the recommended approach if you plan to contribute back to the cloned repository. However, you must add the public SSH key from your personal Cloudera Machine Learning account to the remote Git hosting service before you can clone the project.

      Select the correct format for the GitHub URL for the repository, and paste it into the text box.

      In GitHub, the “Code” button in a repository shows you how to clone the repository.

      If authentication is not required, use the format shown for Clone with HTTPS:
      https://github.com/<example-org>/<example-project>.git
      If you are cloning a Git repository that requires authentication, use this format for SSH:
      git@github.com:<example-org>/<example-project>.git

      With SSH, you need to first upload your public key to GitHub. Also, if you plan to perform GitHub operations such as git push or git commit, you should always use GitHub over SSH.

  7. Click Create Project. After the project is created, you can see your project files and the list of jobs defined in your project.
    Note that as part of the project filesystem, Cloudera Machine Learning also creates the following .gitignore file.
    R
    node_modules
    *.pyc
    .*
    !.gitignore
  8. (Optional) To work with team members on a project, add them as collaborators to the project.