Non-Ambari Cluster Installation Guide
Also available as:
PDF
loading table of contents...

Abstract

The Hortonworks Data Platform, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing, processing and analyzing large volumes of data. It is designed to deal with data from many sources and formats in a very quick, easy and cost-effective manner. The Hortonworks Data Platform consists of the essential set of Apache Hadoop projects including MapReduce, Hadoop Distributed File System (HDFS), HCatalog, Pig, Hive, HBase, ZooKeeper and Ambari. Hortonworks is the major contributor of code and patches to many of these projects. These projects have been integrated and tested as part of the Hortonworks Data Platform release process and installation and configuration tools have also been included.

Unlike other providers of platforms built using Apache Hadoop, Hortonworks contributes 100% of our code back to the Apache Software Foundation. The Hortonworks Data Platform is Apache-licensed and completely open source. We sell only expert technical support, training and partner-enablement services. All of our technology is, and will remain, free and open source.

Please visit the Hortonworks Data Platform page for more information on Hortonworks technology. For more information on Hortonworks services, please visit either the Support or Training page. Feel free to contact technical support, training and partner-enablement services. All of our technology is, and will remain, free and open source.

Please visit the Hortonworks Data Platform page for more information on Hortonworks technology. For more information on Hortonworks services, please visit either the Support or Training page. Feel free to contact us directly to discuss your specific needs.


Contents

1. Getting Ready to Install
Meet Minimum System Requirements
Hardware Recommendations
Operating System Requirements
Software Requirements
JDK Requirements
Metastore Database Requirements
Virtualization and Cloud Platforms
Configure the Remote Repositories
Decide on Deployment Type
Collect Information
Prepare the Environment
Enable NTP on the Cluster
Disable SELinux
Disable IPTables
Download Companion Files
Define Environment Parameters
[Optional] Create System Users and Groups
Determine HDP Memory Configuration Settings
Running the YARN Utility Script
Manually Calculating YARN and MapReduce Memory Configuration Settings
Configuring NameNode Heap Size
Allocate Adequate Log Space for HDP
Download the HDP Maven Artifacts
2. Installing HDFS, YARN, and MapReduce
Set Default File and Directory Permissions
Install the Hadoop Packages
Install Compression Libraries
Install Snappy
Install LZO
Create Directories
Create the NameNode Directories
Create the SecondaryNameNode Directories
Create DataNode and YARN NodeManager Local Directories
Create the Log and PID Directories
Symlink Directories with hdp-select
3. Installing Apache ZooKeeper
Install the ZooKeeper Package
(Optional) Securing ZooKeeper with Kerberos
Securing ZooKeeper Access
ZooKeeper Configuration
YARN Configuration
HDFS Configuration
Set Directories and Permissions
Set Up the Configuration Files
Start ZooKeeper
4. Setting Up the Hadoop Configuration
5. Validating the Core Hadoop Installation
Format and Start HDFS
Smoke Test HDFS
Configure YARN and MapReduce
Start YARN
Start MapReduce JobHistory Server
Smoke Test MapReduce
6. Installing Apache HBase
Install the HBase Package
Set Directories and Permissions
Set Up the Configuration Files
Validate the Installation
Start the HBase Thrift and REST Servers
7. Installing Apache Phoenix
Installing the Phoenix Package
Configuring HBase for Phoenix
Configuring Phoenix to Run in a Secure Cluster
Validating the Phoenix Installation
Best Practices for Setting Client-side Timeouts
Troubleshooting Phoenix
8. Installing and Configuring Apache Tez
Prerequisites
Installing the Tez Package
Configuring Tez
Creating a New Tez View Instance
Validating the Tez Installation
Troubleshooting
9. Installing Apache Hive and Apache HCatalog
Installing the Hive-HCatalog Package
Setting Directories and Permissions
Setting Up the Hive/HCatalog Configuration Files
HDP-Utility script
Configure Hive and HiveServer2 for Tez
Setting Up the Database for the Hive Metastore
Setting up RDBMS for use with Hive Metastore
Creating Directories on HDFS
Enabling Tez for Hive Queries
Disabling Tez for Hive Queries
Configuring Tez with the Capacity Scheduler
Validating Hive-on-Tez Installation
10. Installing Apache Pig
Install the Pig Package
Validate the Installation
11. Installing Apache WebHCat
Install the WebHCat Package
Upload the Pig, Hive and Sqoop Tarballs to HDFS
Set Directories and Permissions
Modify WebHCat Configuration Files
Set Up HDFS User and Prepare WebHCat Directories
Validate the Installation
12. Installing Apache Oozie
Install the Oozie Package
Set Directories and Permissions
Set Up the Oozie Configuration Files
For Derby
For MySQL
For PostgreSQL
For Oracle
Configure Your Database for Oozie
Set Up the sharelib
Validate the Installation
13. Installing Apache Ranger
Installation Prerequisites
Installing Policy Manager
Install the Ranger Policy Manager
Install the Ranger Policy Administration Service
Start the Ranger Policy Administration Service
Configuring the Ranger Policy Administration Authentication Mode
Configuring Ranger Policy Administration High Availability
Installing UserSync
Using the LDAP Connection Check Tool
Installing UserSync and Starting the Service
Installing Ranger Plug-ins
Installing the Ranger HDFS Plug-in
Installing the Ranger YARN Plug-in
Installing the Ranger Kafka Plug-in
Installing the Ranger HBase Plug-in
Installing the Ranger Hive Plug-in
Installing the Ranger Knox Plug-in
Installing the Ranger Storm Plug-in
Enabling Audit Logging for HDFS and Solr
Verifying the Installation
14. Installing Hue
Before You Begin
Configure HDP to Support Hue
Install the Hue Packages
Configure Hue to Communicate with the Hadoop Components
Configure the Web Server
Configure Hadoop
Configure Hue for Databases
Using Hue with Oracle
Using Hue with MySQL
Using Hue with PostgreSQL
Start, Stop, and Restart Hue
Validate the Hue Installation
15. Installing Apache Sqoop
Install the Sqoop Package
Set Up the Sqoop Configuration
Validate the Sqoop Installation
16. Installing Apache Mahout
Install Mahout
Validate Mahout
17. Installing and Configuring Apache Flume
Understanding Flume
Installing Flume
Configuring Flume
Starting Flume
HDP and Flume
A Simple Example
18. Installing and Configuring Apache Storm
Install the Storm Package
Configure Storm
Configure a Process Controller
(Optional) Configure Kerberos Authentication for Storm
(Optional) Configuring Authorization for Storm
Validate the Installation
19. Installing and Configuring Apache Spark
Spark Prerequisites
Installing Spark
Configuring Spark
(Optional) Starting the Spark Thrift Server
(Optional) Configuring Dynamic Resource Allocation
Validating Spark
20. Installing and Configuring Apache Kafka
Install Kafka
Configure Kafka
Validate Kafka
21. Installing Apache Accumulo
Installing the Accumulo Package
Configuring Accumulo
Configuring the "Hosts" Files
Validating Accumulo
Smoke Testing Accumulo
22. Installing Apache Falcon
Installing the Falcon Package
Setting Directories and Permissions
Configuring Proxy Settings
Configuring Falcon Entities
Configuring Oozie for Falcon
Configuring Hive for Falcon
Configuring for Secure Clusters
Validating Falcon
23. Installing Apache Knox
Install the Knox Package on the Knox Server
Set up and Validate the Knox Gateway Installation
24. Installing Apache Slider
25. Installing and Configuring Apache Atlas
Atlas Prerequisites
Installing Atlas
Installing Atlas Metadata Hive Plugin
Configuring Hive Hook
Configuring the Graph Database
Choosing Between Storage Backends
Choosing Between Indexing Backends
Configure Atlas to Use HBase
Configure Atlas to Use SolrCloud
Configuring for Secure Clusters
Configuring Atlas in a Kerberized Cluster
Validating Atlas
26. Setting Up Kerberos Security for Manual Installs
27. Uninstalling HDP