Using Spark 2 with Cloudera Director

The Spark 2 service is distributed in its own parcel and is not part of CDH. (CDH includes Spark 1, but Spark 2 may be installed alongside Spark 1 in the same cluster.) To add Spark 2 to a cluster bootstrapped by Cloudera Director, perform the following steps.

List "SPARK2" as a product in the cluster template, providing its version number.
Include the URL for a Spark 2 parcel repository in the list of parcel repositories for the cluster template. Be sure to also include the URL for the CDH parcel repository, even if it is the default repository that Cloudera Director uses when no parcel repositories are listed.
Manually assign roles for Spark 2, as well as other services, to instances in the cluster template.
Provide the URL for the corresponding Spark 2 CSD in the list of CSDs in the deployment template.

```
cloudera-manager {
  ...
  csds: [
     "https://archive.cloudera.com/spark2/csd/SPARK2_ON_YARN-2.0.0.cloudera2.jar"
     "https://archive.cloudera.com/kudu/csd/KUDU-5.10.0.jar",
  ]
}

cluster {
  products {
    CDH: 5.11.0,
    SPARK2: 2.0.0.cloudera2
  }
  parcelRepositories: ["https://archive.cloudera.com/cdh5/parcels/5.11.0/",
                       "https://archive.cloudera.com/spark2/parcels/2.0.0.cloudera2/"]

  services: [HDFS, YARN, SPARK2_ON_YARN]

  masters {
    count: 1
    instance: {
      type: m4.xlarge
      image: ami-12345678
    }

    roles {
      HDFS: [NAMENODE, SECONDARYNAMENODE]
      YARN: [RESOURCEMANAGER, JOBHISTORY]
      SPARK2_ON_YARN: [SPARK2_YARN_HISTORY_SERVER]
    }
  }

  workers {
    count: 3
    minCount: 3
    instance: {
      type: m4.xlarge
      image: ami-12345678
    }
    roles {
      HDFS: [DATANODE]
      YARN: [NODEMANAGER]
    }
  }
}
```

To learn more about using Spark 2 alongside CDH, see Cloudera Distribution of Apache Spark 2 Overview.

Using Kudu with CDH 5.12 or Earlier

Using Cloudera Data Science Workbench with Cloudera Director