Installing Spark 2 - Hortonworks Data Platform

Command Line Installation

Also available as:

PDF

Contents

1. Preparing to Manually Install HDP
- Meeting Minimum System Requirements
- Virtualization and Cloud Platforms
- Configuring Remote Repositories
- Deciding on a Deployment Type
- Collect Information
- Prepare the Environment
- Download Companion Files
- Define Environment Parameters
- Creating System Users and Groups
- Determining HDP Memory Configuration Settings
  - Running the YARN Utility Script
  - Calculating YARN and MapReduce Memory Requirements
- Configuring NameNode Heap Size
- Allocating Adequate Log Space for HDP
- Downloading the HDP Maven Artifacts
2. Installing Apache ZooKeeper
- Install the ZooKeeper Package
- Securing ZooKeeper with Kerberos (optional)
- Securing ZooKeeper Access
- Set Directories and Permissions
- Set Up the Configuration Files
- Start ZooKeeper
3. Installing HDFS, YARN, and MapReduce
- Set Default File and Directory Permissions
- Install the Hadoop Packages
- Install Compression Libraries
  - Install Snappy
  - Install LZO
- Create Directories
4. Setting Up the Hadoop Configuration
5. Validating the Core Hadoop Installation
- Format and Start HDFS
- Smoke Test HDFS
- Configure YARN and MapReduce
- Start YARN
- Start MapReduce JobHistory Server
- Smoke Test MapReduce
6. Deploying HDP In Production Data Centers With Firewalls
- Terminology
- Mirroring or Proxying
- Considerations for choosing a Mirror or Proxy solution
- Recommendations for Deploying HDP
- Detailed Instructions for Creating Mirrors and Proxies
  - Option I - Mirror server has no access to the Internet
  - Option II - Mirror server has temporary or continuous access to the Internet
- Set up a trusted proxy server
7. Installing Apache HBase
- Install the HBase Package
- Set Directories and Permissions
- Set Up the Configuration Files
- Add Configuration Parameters for Bulk Load Support
- Validate the Installation
- Starting the HBase Thrift and REST Servers
8. Installing Apache Phoenix
- Installing the Phoenix Package
- Configuring HBase for Phoenix
- Configuring Phoenix to Run in a Secure Cluster
- Validating the Phoenix Installation
- Troubleshooting Phoenix
9. Installing and Configuring Apache Tez
- Prerequisites
- Installing the Tez Package
- Configuring Tez
- Setting Up Tez for the Tez UI
- Creating a New Tez View Instance
- Validating the Tez Installation
- Troubleshooting
10. Installing Apache Hive and Apache HCatalog
- Installing the Hive-HCatalog Package
- Setting Up the Hive/HCatalog Configuration Files
  - HDP-Utility script
  - Configure Hive and HiveServer2 for Tez
- Setting Up the Database for the Hive Metastore
- Setting up RDBMS for use with Hive Metastore
- Enabling Tez for Hive Queries
- Disabling Tez for Hive Queries
- Configuring Tez with the Capacity Scheduler
- Validating Hive-on-Tez Installation
- Installing Apache Hive LLAP
- LLAP Prerequisites
- Preparing to Install LLAP
- Installing LLAP on an Unsecured Cluster
- Installing LLAP on a Secured Cluster
- Stopping the LLAP Service
- Tuning LLAP for Performance
11. Installing Apache Pig
- Install the Pig Package
- Validate the Installation
12. Installing Apache WebHCat
- Install the WebHCat Package
- Upload the Pig, Hive and Sqoop tarballs to HDFS
- Set Directories and Permissions
- Modify WebHCat Configuration Files
- Set Up HDFS User and Prepare WebHCat Directories
- Validate the Installation
13. Installing Apache Oozie
- Install the Oozie Package
- Set Directories and Permissions
- Set Up the Oozie Configuration Files
- Configure Your Database for Oozie
- Set up the Sharelib
- Validate the Installation
- Stop and Start Oozie
14. Installing Apache Ranger
- Installation Prerequisites
- Installing Policy Manager
- Installing UserSync
  - Using the LDAP Connection Check Tool
  - Install UserSync and Start the Service
- Installing Ranger Plug-ins
- Installing Ranger in a Kerberized Environment
- Verifying the Installation
15. Installing Hue
- Before You Begin
- Configure HDP to Support Hue
- Install the Hue Packages
- Configure Hue to Communicate with the Hadoop Components
  - Configure the Web Server
  - Configure Hadoop
- Configure Hue for Databases
- Start, Stop, and Restart Hue
- Validate the Hue Installation
16. Installing Apache Sqoop
- Install the Sqoop Package
- Set Up the Sqoop Configuration
- Validate the Sqoop Installation
17. Installing Apache Mahout
- Install Mahout
- Validate Mahout
18. Installing and Configuring Apache Flume
- Installing Flume
- Configuring Flume
- Starting Flume
19. Installing and Configuring Apache Storm
- Install the Storm Package
- Configure Storm
- Configure a Process Controller
- (Optional) Configure Kerberos Authentication for Storm
- (Optional) Configuring Authorization for Storm
- Validate the Installation
20. Installing and Configuring Apache Spark
- Spark Prerequisites
- Installing Spark
- Configuring Spark
- (Optional) Starting the Spark Thrift Server
- (Optional) Configuring Dynamic Resource Allocation
- (Optional) Installing and Configuring Livy
- Validating Spark
21. Installing and Configuring Apache Spark 2
- Spark 2 Prerequisites
- Installing Spark 2
- Configuring Spark 2
- (Optional) Starting the Spark 2 Thrift Server
- (Optional) Configuring Dynamic Resource Allocation
- (Optional) Installing and Configuring Livy
- Validating Spark 2
22. Installing and Configuring Apache Kafka
- Install Kafka
- Configure Kafka
- Validate Kafka
23. Installing and Configuring Zeppelin
- Installation Prerequisites
- Installing the Zeppelin Package
- Configuring Zeppelin
- Starting, Stopping, and Restarting Zeppelin
- Validating Zeppelin
- Accessing the Zeppelin UI
24. Installing Apache Accumulo
- Installing the Accumulo Package
- Configuring Accumulo
- Configuring the "Hosts" Files
- Validating Accumulo
- Smoke Testing Accumulo
25. Installing Apache Falcon
- Installing the Falcon Package
- Setting Directories and Permissions
- Configuring Proxy Settings
- Configuring Falcon Entities
- Configuring Oozie for Falcon
- Configuring Hive for Falcon
- Configuring for Secure Clusters
- Validate Falcon
26. Installing Apache Knox
- Install the Knox Package on the Knox Server
- Set up and Validate the Knox Gateway Installation
- Configuring Knox Single Sign-on (SSO)
27. Installing Apache Slider
28. Setting Up Kerberos Security for Manual Installs
29. Uninstalling HDP

Installing Spark 2

When you install Spark 2, the following directories are created:

/usr/hdp/current/spark2-client for submitting Spark 2 jobs
/usr/hdp/current/spark2-history for launching Spark 2 master processes, such as the Spark 2 History Server
/usr/hdp/current/spark2-thriftserver for the Spark 2 Thrift Server

To install Spark 2:

Search for Spark 2 in the HDP repo:

For RHEL or CentOS:
yum search spark2
For SLES:
zypper install spark2
For Ubuntu and Debian:
apt-cache spark2

This shows all the versions of Spark 2 available. For example:

spark_<version>_<build>-master.noarch : Server for Spark 2 master
spark_<version>_<build>-python.noarch : Python client for Spark 2
spark_<version>_<build>-worker.noarch : Server for Spark 2 worker
spark_<version>_<build>.noarch : Lightning-Fast Cluster Computing

Install the version corresponding to the HDP version you currently have installed.
- For RHEL or CentOS:
  yum install spark2_<version>-master spark_<version>-python
- For SLES:
  zypper install spark2_<version>-master spark_<version>-python
- For Ubuntu and Debian:
  apt-get install spark2_<version>-master spark_<version>-python
Before you launch the Spark 2 Shell or Thrift Server, make sure that you set $JAVA_HOME:
export JAVA_HOME=<path to JDK 1.8>
Change owner of /var/log/spark2 to spark2:hadoop.
```
chown spark2:hadoop /var/log/spark2
```