Command Line Installation

Also available as:

PDF

Contents

1. Preparing to Manually Install HDP
- Meeting Minimum System Requirements
- Virtualization and Cloud Platforms
- Configuring Remote Repositories
- Deciding on a Deployment Type
- Collect Information
- Prepare the Environment
- Download Companion Files
- Define Environment Parameters
- Creating System Users and Groups
- Determining HDP Memory Configuration Settings
  - Running the YARN Utility Script
  - Calculating YARN and MapReduce Memory Requirements
- Configuring NameNode Heap Size
- Allocating Adequate Log Space for HDP
- Downloading the HDP Maven Artifacts
2. Installing Apache ZooKeeper
- Install the ZooKeeper Package
- Securing ZooKeeper with Kerberos (optional)
- Securing ZooKeeper Access
- Set Directories and Permissions
- Set Up the Configuration Files
- Start ZooKeeper
3. Installing HDFS, YARN, and MapReduce
- Set Default File and Directory Permissions
- Install the Hadoop Packages
- Install Compression Libraries
  - Install Snappy
  - Install LZO
- Create Directories
4. Setting Up the Hadoop Configuration
5. Validating the Core Hadoop Installation
- Format and Start HDFS
- Smoke Test HDFS
- Configure YARN and MapReduce
- Start YARN
- Start MapReduce JobHistory Server
- Smoke Test MapReduce
6. Installing Apache HBase
- Install the HBase Package
- Set Directories and Permissions
- Set Up the Configuration Files
- Add Configuration Parameters for Bulk Load Support
- Validate the Installation
- Starting the HBase Thrift and REST Servers
7. Installing Apache Phoenix
- Installing the Phoenix Package
- Configuring HBase for Phoenix
- Configuring Phoenix to Run in a Secure Cluster
- Validating the Phoenix Installation
- Troubleshooting Phoenix
8. Installing and Configuring Apache Tez
- Prerequisites
- Installing the Tez Package
- Configuring Tez
- Setting Up Tez for the Tez UI
- Creating a New Tez View Instance
- Validating the Tez Installation
- Troubleshooting
9. Installing Apache Hive and Apache HCatalog
- Installing the Hive-HCatalog Package
- Setting Up the Hive/HCatalog Configuration Files
  - HDP-Utility script
  - Configure Hive and HiveServer2 for Tez
- Setting Up the Database for the Hive Metastore
- Setting up RDBMS for use with Hive Metastore
- Enabling Tez for Hive Queries
- Disabling Tez for Hive Queries
- Configuring Tez with the Capacity Scheduler
- Validating Hive-on-Tez Installation
- Installing Apache Hive LLAP
- LLAP Prerequisites
- Preparing to Install LLAP
- Installing LLAP on an Unsecured Cluster
- Installing LLAP on a Secured Cluster
- Stopping the LLAP Service
- Tuning LLAP for Performance
10. Installing Apache Pig
- Install the Pig Package
- Validate the Installation
11. Installing Apache WebHCat
- Install the WebHCat Package
- Upload the Pig, Hive and Sqoop tarballs to HDFS
- Set Directories and Permissions
- Modify WebHCat Configuration Files
- Set Up HDFS User and Prepare WebHCat Directories
- Validate the Installation
12. Installing Apache Oozie
- Install the Oozie Package
- Set Directories and Permissions
- Set Up the Oozie Configuration Files
- Configure Your Database for Oozie
- Set up the Sharelib
- Validate the Installation
- Stop and Start Oozie
13. Installing Apache Ranger
- Installation Prerequisites
- Installing Policy Manager
- Installing UserSync
  - Using the LDAP Connection Check Tool
  - Install UserSync and Start the Service
- Installing Ranger Plug-ins
- Installing Ranger in a Kerberized Environment
- Verifying the Installation
14. Installing Hue
- Before You Begin
- Configure HDP to Support Hue
- Install the Hue Packages
- Configure Hue to Communicate with the Hadoop Components
  - Configure the Web Server
  - Configure Hadoop
- Configure Hue for Databases
- Start, Stop, and Restart Hue
- Validate the Hue Installation
15. Installing Apache Sqoop
- Install the Sqoop Package
- Set Up the Sqoop Configuration
- Validate the Sqoop Installation
16. Installing Apache Mahout
- Install Mahout
- Validate Mahout
17. Installing and Configuring Apache Flume
- Installing Flume
- Configuring Flume
- Starting Flume
18. Installing and Configuring Apache Storm
- Install the Storm Package
- Configure Storm
- Configure a Process Controller
- (Optional) Configure Kerberos Authentication for Storm
- (Optional) Configuring Authorization for Storm
- Validate the Installation
19. Installing and Configuring Apache Spark
- Spark Prerequisites
- Installing Spark
- Configuring Spark
- (Optional) Starting the Spark Thrift Server
- (Optional) Configuring Dynamic Resource Allocation
- (Optional) Installing and Configuring Livy
- Validating Spark
20. Installing and Configuring Apache Spark 2
- Spark 2 Prerequisites
- Installing Spark 2
- Configuring Spark 2
- (Optional) Starting the Spark 2 Thrift Server
- (Optional) Configuring Dynamic Resource Allocation
- (Optional) Installing and Configuring Livy
- Validating Spark 2
21. Installing and Configuring Apache Kafka
- Install Kafka
- Configure Kafka
- Validate Kafka
22. Installing and Configuring Zeppelin
- Installation Prerequisites
- Installing the Zeppelin Package
- Configuring Zeppelin
- Starting, Stopping, and Restarting Zeppelin
- Validating Zeppelin
- Accessing the Zeppelin UI
23. Installing Apache Accumulo
- Installing the Accumulo Package
- Configuring Accumulo
- Configuring the "Hosts" Files
- Validating Accumulo
- Smoke Testing Accumulo
24. Installing Apache Falcon
- Installing the Falcon Package
- Setting Directories and Permissions
- Configuring Proxy Settings
- Configuring Falcon Entities
- Configuring Oozie for Falcon
- Configuring Hive for Falcon
- Configuring for Secure Clusters
- Validate Falcon
25. Installing Apache Knox
- Install the Knox Package on the Knox Server
- Set up and Validate the Knox Gateway Installation
- Configuring Knox Single Sign-on (SSO)
26. Installing Apache Slider
27. Setting Up Kerberos Security for Manual Installs
28. Uninstalling HDP

Hortonworks Data Platform

Command Line Installation

Copyright © 2012-2017 Hortonworks, Inc.

Except where otherwise noted, this document is licensed under Creative Commons Attribution ShareAlike 4.0 License

http://creativecommons.org/licenses/by-sa/4.0/legalcode

The Hortonworks Data Platform, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing, processing and analyzing large volumes of data. It is designed to deal with data from many sources and formats in a very quick, easy and cost-effective manner. The Hortonworks Data Platform consists of the essential set of Apache Software Foundation projects that focus on the storage and processing of Big Data, along with operations, security, and governance for the resulting system. This includes Apache Hadoop -- which includes MapReduce, Hadoop Distributed File System (HDFS), and Yet Another Resource Negotiator (YARN) -- along with Ambari, Falcon, Flume, HBase, Hive, Kafka, Knox, Oozie, Phoenix, Pig, Ranger, Slider, Spark, Sqoop, Storm, Tez, and ZooKeeper. Hortonworks is the major contributor of code and patches to many of these projects. These projects have been integrated and tested as part of the Hortonworks Data Platform release process and installation and configuration tools have also been included.

Unlike other providers of platforms built using Apache Hadoop, Hortonworks contributes 100% of our code back to the Apache Software Foundation. The Hortonworks Data Platform is Apache-licensed and completely open source. We sell only expert technical support, training and partner-enablement services. All of our technology is, and will remain, free and open source.

Please visit the Hortonworks Data Platform page for more information on Hortonworks technology. For more information on Hortonworks services, please visit either the Support or Training page. Feel free to contact us directly to discuss your specific needs.

Contents

1. Preparing to Manually Install HDP

Meeting Minimum System Requirements

Hardware Recommendations
Operating System Requirements
Software Requirements
JDK Requirements
Metastore Database Requirements

Virtualization and Cloud Platforms

Configuring Remote Repositories

Deciding on a Deployment Type

Collect Information

Prepare the Environment

Enable NTP on Your Cluster
Disable SELinux
Disable IPTables

Download Companion Files

Define Environment Parameters

Creating System Users and Groups

Determining HDP Memory Configuration Settings

Running the YARN Utility Script
Calculating YARN and MapReduce Memory Requirements

Configuring NameNode Heap Size

Allocating Adequate Log Space for HDP

Downloading the HDP Maven Artifacts

2. Installing Apache ZooKeeper

Install the ZooKeeper Package

Securing ZooKeeper with Kerberos (optional)

Securing ZooKeeper Access

ZooKeeper Configuration
YARN Configuration
HDFS Configuration

Set Directories and Permissions

Set Up the Configuration Files

Start ZooKeeper

3. Installing HDFS, YARN, and MapReduce

Set Default File and Directory Permissions

Install the Hadoop Packages

Install Compression Libraries

Install Snappy
Install LZO

Create Directories

Create the NameNode Directories
Create the SecondaryNameNode Directories
Create DataNode and YARN NodeManager Local Directories
Create the Log and PID Directories
Symlink Directories with hdp-select

4. Setting Up the Hadoop Configuration

5. Validating the Core Hadoop Installation

Format and Start HDFS
Smoke Test HDFS
Configure YARN and MapReduce
Start YARN
Start MapReduce JobHistory Server
Smoke Test MapReduce

6. Installing Apache HBase

Install the HBase Package
Set Directories and Permissions
Set Up the Configuration Files
Add Configuration Parameters for Bulk Load Support
Validate the Installation
Starting the HBase Thrift and REST Servers

7. Installing Apache Phoenix

Installing the Phoenix Package
Configuring HBase for Phoenix
Configuring Phoenix to Run in a Secure Cluster
Validating the Phoenix Installation
Troubleshooting Phoenix

8. Installing and Configuring Apache Tez

Prerequisites

Installing the Tez Package

Configuring Tez

Setting Up Tez for the Tez UI

Setting Up Tez for the Tez UI
Deploying the Tez UI
Additional Steps for the Application Timeline Server

Creating a New Tez View Instance

Validating the Tez Installation

Troubleshooting

9. Installing Apache Hive and Apache HCatalog

Installing the Hive-HCatalog Package

Setting Up the Hive/HCatalog Configuration Files

HDP-Utility script
Configure Hive and HiveServer2 for Tez

Setting Up the Database for the Hive Metastore

Setting up RDBMS for use with Hive Metastore

Enabling Tez for Hive Queries

Disabling Tez for Hive Queries

Configuring Tez with the Capacity Scheduler

Validating Hive-on-Tez Installation

Installing Apache Hive LLAP

LLAP Prerequisites

Preparing to Install LLAP

Installing LLAP on an Unsecured Cluster

Installing LLAP on a Secured Cluster

Prerequisites
Installing LLAP on a Secured Cluster
Validating the Installation on a Secured Cluster

Stopping the LLAP Service

Tuning LLAP for Performance

10. Installing Apache Pig

Install the Pig Package
Validate the Installation

11. Installing Apache WebHCat

Install the WebHCat Package
Upload the Pig, Hive and Sqoop tarballs to HDFS
Set Directories and Permissions
Modify WebHCat Configuration Files
Set Up HDFS User and Prepare WebHCat Directories
Validate the Installation

12. Installing Apache Oozie

Install the Oozie Package

Set Directories and Permissions

Set Up the Oozie Configuration Files

For Derby
For MySQL
For PostgreSQL
For Oracle

Configure Your Database for Oozie

Set up the Sharelib

Validate the Installation

Stop and Start Oozie

13. Installing Apache Ranger

Installation Prerequisites

Installing Policy Manager

Install the Ranger Policy Manager
Install the Ranger Policy Administration Service
Start the Ranger Policy Administration Service
Configuring the Ranger Policy Administration Authentication Mode
Configuring Ranger Policy Administration High Availability

Installing UserSync

Using the LDAP Connection Check Tool
Install UserSync and Start the Service

Installing Ranger Plug-ins

Installing the Ranger HDFS Plug-in
Installing the Ranger YARN Plug-in
Installing the Ranger Kafka Plug-in
Installing the Ranger HBase Plug-in
Installing the Ranger Hive Plug-in
Installing the Ranger Knox Plug-in
Installing the Ranger Storm Plug-in

Installing Ranger in a Kerberized Environment

Creating Keytab and Principals
Installing Ranger Services
Manually Installing and Enabling the Ranger Plug-ins

Verifying the Installation

14. Installing Hue

Before You Begin

Configure HDP to Support Hue

Install the Hue Packages

Configure Hue to Communicate with the Hadoop Components

Configure the Web Server
Configure Hadoop

Configure Hue for Databases

Using Hue with Oracle
Using Hue with MySQL
Using Hue with PostgreSQL

Start, Stop, and Restart Hue

Validate the Hue Installation

15. Installing Apache Sqoop

Install the Sqoop Package
Set Up the Sqoop Configuration
Validate the Sqoop Installation

16. Installing Apache Mahout

Install Mahout
Validate Mahout

17. Installing and Configuring Apache Flume

Installing Flume
Configuring Flume
Starting Flume

18. Installing and Configuring Apache Storm

Install the Storm Package
Configure Storm
Configure a Process Controller
(Optional) Configure Kerberos Authentication for Storm
(Optional) Configuring Authorization for Storm
Validate the Installation

19. Installing and Configuring Apache Spark

Spark Prerequisites

Installing Spark

Configuring Spark

(Optional) Starting the Spark Thrift Server

(Optional) Configuring Dynamic Resource Allocation

(Optional) Installing and Configuring Livy

Installing Livy
Configuring Livy
Starting, Stopping, and Restarting Livy
Granting Livy the Ability to Impersonate
(Optional) Configuring Zeppelin to Interact with Livy

Validating Spark

20. Installing and Configuring Apache Spark 2

Spark 2 Prerequisites

Installing Spark 2

Configuring Spark 2

(Optional) Starting the Spark 2 Thrift Server

(Optional) Configuring Dynamic Resource Allocation

(Optional) Installing and Configuring Livy

Installing Livy
Configuring Livy
Starting, Stopping, and Restarting Livy
Granting Livy the Ability to Impersonate
(Optional) Configuring Zeppelin to Interact with Livy

Validating Spark 2

21. Installing and Configuring Apache Kafka

Install Kafka
Configure Kafka
Validate Kafka

22. Installing and Configuring Zeppelin

Installation Prerequisites
Installing the Zeppelin Package
Configuring Zeppelin
Starting, Stopping, and Restarting Zeppelin
Validating Zeppelin
Accessing the Zeppelin UI

23. Installing Apache Accumulo

Installing the Accumulo Package
Configuring Accumulo
Configuring the "Hosts" Files
Validating Accumulo
Smoke Testing Accumulo

24. Installing Apache Falcon

Installing the Falcon Package
Setting Directories and Permissions
Configuring Proxy Settings
Configuring Falcon Entities
Configuring Oozie for Falcon
Configuring Hive for Falcon
Configuring for Secure Clusters
Validate Falcon

25. Installing Apache Knox

Install the Knox Package on the Knox Server
Set up and Validate the Knox Gateway Installation
Configuring Knox Single Sign-on (SSO)

26. Installing Apache Slider

27. Setting Up Kerberos Security for Manual Installs

28. Uninstalling HDP

List of Tables

1.1. Directories Needed to Install Core Hadoop
1.2. Directories Needed to Install Ecosystem Components
1.3. Define Users and Groups for Systems
1.4. Typical System Users and Groups
1.5. yarn-utils.py Options
1.6. Reserved Memory Recommendations
1.7. Recommended Container Size Values
1.8. YARN and MapReduce Configuration Values
1.9. Example Value Calculations Without HBase
1.10. Example Value Calculations with HBase
1.11. Recommended NameNode Heap Size Settings
8.1. Tez Configuration Parameters
9.1. Hive Configuration Parameters
9.2.
9.3.
9.4. LLAP Properties to Set in hive-site.xml
9.5. HiveServer2 Properties to Set in hive-site.xml to Enable Concurrent Queries with LLAP
9.6. Properties to Set in hive-site.xml for Secured Clusters
9.7. Properties to Set in ssl-server.xml for LLAP on Secured Clusters
9.8. LLAP Package Parameters
11.1. Hadoop core-site.xml File Properties
13.1. install.properties Entries
13.2. Properties to Update in the install.properties File
13.3. Properties to Edit in the install.properties File
13.4. Properties to Edit in the install.properties File
13.5. Properties to Edit in the install.properties File
13.6. HBase Properties to Edit in the install.properties file
13.7. Hive-Related Properties to Edit in the install.properties File
13.8. Knox-Related Properties to Edit in the install.properties File
13.9. Storm-Related Properties to Edit in the install.properties file
13.10. install.properties Property Values
13.11. install.properties Property Values
13.12. install.properties Property Values
13.13. install.properties Property Values
13.14. install.properties Property Values
13.15. install.properties Property Values
13.16. install.properties Property Values
13.17. install.properties Property Values
13.18. install.properties Property Values
13.19. install.properties Property Values
13.20. install.properties Property Values
17.1. Flume 1.5.2 Dependencies
18.1. Required jaas.conf Sections for Cluster Nodes
18.2. Supported Authorizers
18.3. storm.yaml Configuration File Properties
18.4. worker-launcher.cfg File Configuration Properties
18.5. multitenant-scheduler.yaml Configuration File Properties
19.1. Prerequisites for running Spark 1.6
20.1. Prerequisites for running Spark 2
21.1. Kafka Configuration Properties
22.1. Installation Prerequisites