Using Spark 2 with Cloudera Director
The Spark 2 service is distributed in its own parcel and is not part of CDH. (CDH includes Spark 1, but Spark 2 may be installed alongside Spark 1 in the same cluster.)
To add Spark 2 to a cluster bootstrapped by Cloudera Director, perform the following steps.
- List "SPARK2" as a product in the cluster template, providing its version number.
- Include the URL for a Spark 2 parcel repository in the list of parcel repositories for the cluster template. Be sure to also include the URL for the CDH parcel repository, even if it is the default repository that Cloudera Director uses when no parcel repositories are listed.
- Manually assign roles for Spark 2, as well as other services, to instances in the cluster template.
- Provide the URL for the corresponding Spark 2 CSD in the list of CSDs in the deployment template.
``` cloudera-manager { ... csds: [ "https://archive.cloudera.com/spark2/csd/SPARK2_ON_YARN-2.0.0.cloudera2.jar" "https://archive.cloudera.com/kudu/csd/KUDU-5.10.0.jar", ] } cluster { products { CDH: 5.11.0, SPARK2: 2.0.0.cloudera2 } parcelRepositories: ["https://archive.cloudera.com/cdh5/parcels/5.11.0/", "https://archive.cloudera.com/spark2/parcels/2.0.0.cloudera2/"] services: [HDFS, YARN, SPARK2_ON_YARN] masters { count: 1 instance: { type: m4.xlarge image: ami-12345678 } roles { HDFS: [NAMENODE, SECONDARYNAMENODE] YARN: [RESOURCEMANAGER, JOBHISTORY] SPARK2_ON_YARN: [SPARK2_YARN_HISTORY_SERVER] } } workers { count: 3 minCount: 3 instance: { type: m4.xlarge image: ami-12345678 } roles { HDFS: [DATANODE] YARN: [NODEMANAGER] } } } ```
To learn more about using Spark 2 alongside CDH, see Cloudera Distribution of Apache Spark 2 Overview.