Installing Apache Spark
Also available as:
PDF

Install Spark using Ambari

Use the following steps to install Apache Spark on an Ambari-managed cluster.

The following diagram shows the Spark installation process using Ambari. Before you install Spark using Ambari, refer to "Adding a Service" in the Ambari Managing and Monitoring a Cluster guide for background information about how to install Hortonworks Data Platform (HDP) components using Ambari.



Caution
Caution

During the installation process, Ambari creates and edits several configuration files. If you configure and manage your cluster using Ambari, do not edit these files during or after installation. Instead, use the Ambari web UI to revise configuration settings.

  1. Click the ellipsis () symbol next to Services on the Ambari dashboard, then click Add Service.
  2. On the Add Service Wizard, select Spark2, then click Next.
  3. On the Assign Masters page, review the node assignment for the Spark2 History Server, then click Next.
  4. On the Assign Slaves and Clients page:
    1. Scroll to the right and select the client nodes where you want to run Spark clients. These are the nodes from which Spark jobs can be submitted to YARN.
    2. To install the optional Livy server for security and user impersonation features, select the Livy for Spark2 Server box for the desired node assignment.
    3. To install the optional Spark Thrift server for ODBC or JDBC access, review the Spark2 Thrift Server node assignments and assign one or two nodes to the Thrift Server.

      Deploying the Thrift server on multiple nodes increases scalability of the Thrift server. When specifying the number of nodes, take into consideration the cluster capacity allocated to Spark.

  5. Click Next to continue.
  6. On the Customize Services page, set the following configuration property for the Thrift Server:
    1. Click Advanced spark-thrift-sparkconf.
    2. Set the spark.yarn.queue property value to the name of the YARN queue that you want to use.
  7. Click Next to continue.
  8. If Kerberos is enabled on the cluster, review the principal and keytab settings on the Configure Identities page, modify the settings if desired, then click Next.
  9. Review the configuration on the Review page, then click Deploy to begin the installation.
  10. The Install, Start, and Test page displays the installation status.
  11. When the progress bar reaches 100% and a "Success" message appears, click Next.
  12. On the Summary page, click Complete to finish installing Spark.