Non-Ambari Cluster Installation Guide

Also available as:

PDF

Contents

loading table of contents...

Hortonworks Data Platform

Non-Ambari Cluster Installation Guide

Copyright © 2012-2015 Hortonworks, Inc.

Except where otherwise noted, this document is licensed under Creative Commons Attribution ShareAlike 4.0 License

http://creativecommons.org/licenses/by-sa/4.0/legalcode

The Hortonworks Data Platform, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing, processing and analyzing large volumes of data. It is designed to deal with data from many sources and formats in a very quick, easy and cost-effective manner. The Hortonworks Data Platform consists of the essential set of Apache Software Foundation projects that focus on the storage and processing of Big Data, along with operations, security, and governance for the resulting system. This includes Apache Hadoop -- which includes MapReduce, Hadoop Distributed File System (HDFS), and Yet Another Resource Negotiator (YARN) -- along with Ambari, Falcon, Flume, HBase, Hive, Kafka, Knox, Oozie, Phoenix, Pig, Ranger, Slider, Spark, Sqoop, Storm, Tez, and ZooKeeper. Hortonworks is the major contributor of code and patches to many of these projects. These projects have been integrated and tested as part of the Hortonworks Data Platform release process and installation and configuration tools have also been included.

Unlike other providers of platforms built using Apache Hadoop, Hortonworks contributes 100% of our code back to the Apache Software Foundation. The Hortonworks Data Platform is Apache-licensed and completely open source. We sell only expert technical support, training and partner-enablement services. All of our technology is, and will remain, free and open source.

Please visit the Hortonworks Data Platform page for more information on Hortonworks technology. For more information on Hortonworks services, please visit either the Support or Training page. Feel free to contact us directly to discuss your specific needs.

Contents

1. Getting Ready to Install

Meet Minimum System Requirements

Hardware Recommendations
Operating System Requirements
Software Requirements
JDK Requirements
Metastore Database Requirements

Virtualization and Cloud Platforms

Configure the Remote Repositories

Decide on Deployment Type

Collect Information

Prepare the Environment

Enable NTP on the Cluster
Disable SELinux
Disable IPTables

Download Companion Files

Define Environment Parameters

[Optional] Create System Users and Groups

Determine HDP Memory Configuration Settings

Running the Yarn Utility Script
Manually Calculating YARN and MapReduce Memory Configuration Settings

Configuring NameNode Heap Size

Allocate Adequate Log Space for HDP

Download the HDP Maven Artifacts

2. Installing Apache ZooKeeper

Install the ZooKeeper Package
Securing ZooKeeper with Kerberos (optional)
Set Directories and Permissions
Set Up the Configuration Files
Start ZooKeeper

3. Installing HDFS, YARN, and MapReduce

Set Default File and Directory Permissions

Install the Hadoop Packages

Install Compression Libraries

Install Snappy
Install LZO

Create Directories

Create the NameNode Directories
Create the SecondaryNameNode Directories
Create DataNode and YARN NodeManager Local Directories
Create the Log and PID Directories
Symlink Directories with hdp-select

4. Setting Up the Hadoop Configuration

5. Validating the Core Hadoop Installation

Format and Start HDFS
Smoke Test HDFS
Configure YARN and MapReduce
Start YARN
Start MapReduce JobHistory Server
Smoke Test MapReduce

6. Installing Apache HBase

Install the HBase Package
Set Directories and Permissions
Set Up the Configuration Files
Validate the Installation
Start the HBase Thrift and REST Servers

7. Installing Apache Phoenix

Installing the Phoenix Package
Configuring HBase for Phoenix
Configuring Phoenix to Run in a Secure Cluster
Validating the Phoenix Installation
Troubleshooting Phoenix

8. Installing and Configuring Apache Tez

Prerequisites
Installing the Tez Package
Configuring Tez
Creating a New Tez View Instance
Validating the Tez Installation
Troubleshooting

9. Installing Apache Hive and Apache HCatalog

Installing the Hive-HCatalog Package

Setting Directories and Permissions

Setting Up the Hive/HCatalog Configuration Files

HDP-Utility script
Configure Hive and HiveServer2 for Tez

Setting Up the Database for the Hive Metastore

Setting up RDBMS for use with Hive Metastore

Creating Directories on HDFS

Enabling Tez for Hive Queries

Disabling Tez for Hive Queries

Configuring Tez with the Capacity Scheduler

Validating Hive-on-Tez Installation

10. Installing Apache Pig

Install the Pig Package
Validate the Installation

11. Installing Apache WebHCat

Install the WebHCat Package
Upload the Pig, Hive and Sqoop tarballs to HDFS
Set Directories and Permissions
Modify WebHCat Configuration Files
Set Up HDFS User and Prepare WebHCat Directories
Validate the Installation

12. Installing Apache Oozie

Install the Oozie Package

Set Directories and Permissions

Set Up the Oozie Configuration Files

For Derby
For MySQL
For PostgreSQL
For Oracle:

Configure Your Database for Oozie

Setting up the Sharelib

Validate the Installation

13. Installing Apache Ranger

Installation Prerequisites

Installing Policy Manager

Install the Ranger Policy Manager
Install the Ranger Policy Administration Service
Start the Ranger Policy Administration Service
Configuring the Ranger Policy Administration Authentication Mode
Configuring Ranger Policy Administration High Availability

Installing UserSync

Installing Ranger Plug-ins

Installing the Ranger HDFS Plug-in
Installing the Ranger YARN Plug-in
Installing the Ranger Kafka Plug-in
Installing the Ranger HBase Plug-in
Installing the Ranger Hive Plug-in
Installing the Ranger Knox Plug-in
Installing the Ranger Storm Plug-in

Enabling Audit Logging for HDFS and Solr

Verifying the Installation

14. Installing Hue

Prerequisites

Configure HDP

Install Hue

Configure Hue

Configuring Hue for an External Database

Using Hue with Oracle
Using Hue with MySQL
Using Hue with PostgreSQL

15. Installing Apache Sqoop

Install the Sqoop Package
Set Up the Sqoop Configuration
Validate the Sqoop Installation

16. Installing Apache Mahout

Install Mahout
Validate Mahout

17. Installing and Configuring Apache Flume

Understanding Flume
Installing Flume
Configuring Flume
Starting Flume
HDP and Flume
A Simple Example

18. Installing and Configuring Apache Storm

Install the Storm Package
Configure Storm
Configure a Process Controller
(Optional) Configure Kerberos Authentication for Storm
(Optional) Configuring Authorization for Storm
Validate the Installation

19. Installing and Configuring Apache Spark

Spark Prerequisites
Installing Spark
Configuring Spark
Validating Spark

20. Installing and Configuring Apache Kafka

Install Kafka
Configure Kafka
Validate Kafka

21. Installing Apache Accumulo

Installing the Accumulo Package
Configuring Accumulo
Configuring the "Hosts" Files
Validating Accumulo
Smoke Testing Accumulo

22. Installing Apache Falcon

Installing the Falcon Package
Setting Directories and Permissions
Configuring Proxy Settings
Configuring Falcon Entities
Configuring Oozie for Falcon
Configuring Hive for Falcon
Configuring for Secure Clusters
Validate Falcon

23. Installing Apache Knox

Install the Knox Package on the Knox server
Set up and Validate the Knox Gateway Installation

24. Installing Apache Slider

25. Installing and Configuring Apache Atlas

Atlas Prerequisites
Installing Atlas
Installing Atlas Metadata Hive Plugin
Configuring Hive Hook
Configuring for Secure Clusters
Validating Atlas

26. Setting Up Security for Manual Installs

Preparing Kerberos

Kerberos Overview
Installing and Configuring the KDC
Creating the Database and Setting Up the First Administrator
Creating Service Principals and Keytab Files for HDP

Configuring HDP

Configuration Overview
Creating Mappings Between Principals and UNIX Usernames
Examples
Adding Security Information to Configuration Files

Configuring Hue

Setting up One-Way Trust with Active Directory

Configure Kerberos Hadoop Realm on the AD DC
Configure the AD Domain on the KDC and Hadoop Cluster Hosts

27. Uninstalling HDP

List of Tables

1.1. Define Directories for Core Hadoop
1.2. Define Directories for Ecosystem Components
1.3. Define Users and Groups for Systems
1.4. Typical System Users and Groups
1.5. hdp-configuration-utils.py Options
1.6. Reserved Memory Recommendations
1.7. Recommended Values
1.8. YARN and MapReduce Configuration Setting Value Calculations
1.9. Example Value Calculations
1.10. Example Value Calculations
1.11. NameNode Heap Size Settings
8.1. Tez Configuration Parameters
9.1. Hive Configuration Parameters
11.1. Hadoop core-site.xml File Properties
13.1. install.properties Entries
13.2. Properties to Update in the install.properties File
13.3. Properties to Edit in the install.properties File
13.4. Properties to Edit in the install.properties File
13.5. Properties to Edit in the install.properties File
13.6. HBase Properties to Edit in the install.properties file
13.7. Hive-Related Properties to Edit in the install.properties File
13.8. Knox-Related Properties to Edit in the install.properties File
13.9. Storm-Related Properties to Edit in the install.properties file
14.1. Hue-Supported Browsers
14.2. Hue Dependencies on HDP Components
14.3. Variables to Configure HDFS Cluster
14.4. Variables to Configure the YARN Cluster
14.5. Beeswax Configuration Values
17.1. Flume 1.5.2 Dependencies
18.1. Required jaas.conf Sections for Cluster Nodes
18.2. Supported Authorizers
18.3. storm.yaml Configuration File Properties
18.4. worker-launcher.cfg File Configuration Properties
18.5. multitenant-scheduler.yaml Configuration File Properties
19.1. Spark Cluster Prerequisites
20.1. Kafka Configuration Properties
25.1. Atlas Cluster Prerequisites
26.1. Service Principals
26.2. Service Keytab File Names
26.3. General core-site.xml, Knox, and Hue
26.4. core-site.xml Master Node Settings -- Knox Gateway
26.5. core-site.xml Master Node Settings -- Hue
26.6. hdfs-site.xml File Property Settings
26.7. yarn-site.xml Property Settings
26.8. mapred-site.xml Property Settings
26.9. hbase-site.xml Property Settings -- HBase Server
26.10. hive-site.xml Property Settings
26.11. oozie-site.xml Property Settings
26.12. webhcat-site.xml Property Settings