Validating Hive-on-Tez Installation - Hortonworks Data Platform

Non-Ambari Cluster Installation Guide

Also available as:

PDF

Contents

1. Getting Ready to Install
- Meet Minimum System Requirements
- Virtualization and Cloud Platforms
- Configure the Remote Repositories
- Decide on Deployment Type
- Collect Information
- Prepare the Environment
- Download Companion Files
- Define Environment Parameters
- [Optional] Create System Users and Groups
- Determine HDP Memory Configuration Settings
  - Running the YARN Utility Script
  - Manually Calculating YARN and MapReduce Memory Configuration Settings
- Configuring NameNode Heap Size
- Allocate Adequate Log Space for HDP
- Download the HDP Maven Artifacts
2. Installing HDFS, YARN, and MapReduce
- Set Default File and Directory Permissions
- Install the Hadoop Packages
- Install Compression Libraries
  - Install Snappy
  - Install LZO
- Create Directories
3. Installing Apache ZooKeeper
- Install the ZooKeeper Package
- Securing ZooKeeper with Kerberos (optional)
- Securing ZooKeeper Access
- Set Directories and Permissions
- Set Up the Configuration Files
- Start ZooKeeper
4. Setting Up the Hadoop Configuration
5. Validating the Core Hadoop Installation
- Format and Start HDFS
- Smoke Test HDFS
- Configure YARN and MapReduce
- Start YARN
- Start MapReduce JobHistory Server
- Smoke Test MapReduce
6. Installing Apache HBase
- Install the HBase Package
- Set Directories and Permissions
- Set Up the Configuration Files
- Validate the Installation
- Start the HBase Thrift and REST Servers
7. Installing Apache Phoenix
- Installing the Phoenix Package
- Configuring HBase for Phoenix
- Configuring Phoenix to Run in a Secure Cluster
- Validating the Phoenix Installation
- Best Practices for Setting Client-side Timeouts
- Troubleshooting Phoenix
8. Installing and Configuring Apache Tez
- Prerequisites
- Installing the Tez Package
- Configuring Tez
- Creating a New Tez View Instance
- Validating the Tez Installation
- Troubleshooting
9. Installing Apache Hive and Apache HCatalog
- Installing the Hive-HCatalog Package
- Setting Directories and Permissions
- Setting Up the Hive/HCatalog Configuration Files
  - HDP-Utility script
  - Configure Hive and HiveServer2 for Tez
- Setting Up the Database for the Hive Metastore
- Setting up RDBMS for use with Hive Metastore
- Creating Directories on HDFS
- Enabling Tez for Hive Queries
- Disabling Tez for Hive Queries
- Configuring Tez with the Capacity Scheduler
- Validating Hive-on-Tez Installation
10. Installing Apache Pig
- Install the Pig Package
- Validate the Installation
11. Installing Apache WebHCat
- Install the WebHCat Package
- Upload the Pig, Hive and Sqoop tarballs to HDFS
- Set Directories and Permissions
- Modify WebHCat Configuration Files
- Set Up HDFS User and Prepare WebHCat Directories
- Validate the Installation
12. Installing Apache Oozie
- Install the Oozie Package
- Set Directories and Permissions
- Set Up the Oozie Configuration Files
- Configure Your Database for Oozie
- Setting up the Sharelib
- Validate the Installation
13. Installing Apache Ranger
- Installation Prerequisites
- Installing Policy Manager
- Installing UserSync
  - Using the LDAP Connection Check Tool
  - Install UserSync and Start the Service
- Installing Ranger Plug-ins
- Enabling Audit Logging for HDFS and Solr
- Verifying the Installation
14. Installing Hue
- Prerequisites
- Configure HDP
- Install Hue
- Configure Hue
- Start Hue
- Configuring Hue for an External Database
15. Installing Apache Sqoop
- Install the Sqoop Package
- Set Up the Sqoop Configuration
- Validate the Sqoop Installation
16. Installing Apache Mahout
- Install Mahout
- Validate Mahout
17. Installing and Configuring Apache Flume
- Understanding Flume
- Installing Flume
- Configuring Flume
- Starting Flume
- HDP and Flume
- A Simple Example
18. Installing and Configuring Apache Storm
- Install the Storm Package
- Configure Storm
- Configure a Process Controller
- (Optional) Configure Kerberos Authentication for Storm
- (Optional) Configuring Authorization for Storm
- Validate the Installation
19. Installing and Configuring Apache Spark
- Spark Prerequisites
- Installing Spark
- Configuring Spark
- Configuring the Spark History Server
  - Configuring the Spark History Server to Use HDFS
- (Optional) Starting the Spark Thrift Server
- Validating Spark
20. Installing and Configuring Apache Kafka
- Install Kafka
- Configure Kafka
- Validate Kafka
21. Installing Apache Accumulo
- Installing the Accumulo Package
- Configuring Accumulo
- Configuring the "Hosts" Files
- Validating Accumulo
- Smoke Testing Accumulo
22. Installing Apache Falcon
- Installing the Falcon Package
- Setting Directories and Permissions
- Configuring Proxy Settings
- Configuring Falcon Entities
- Configuring Oozie for Falcon
- Configuring Hive for Falcon
- Configuring for Secure Clusters
- Validate Falcon
23. Installing Apache Knox
- Install the Knox Package on the Knox Server
- Set up and Validate the Knox Gateway Installation
24. Installing Apache Slider
25. Installing and Configuring Apache Atlas
- Atlas Prerequisites
- Installing Atlas
- Installing Atlas Metadata Hive Plugin
- Configuring Hive Hook
- Configuring the Graph Database
- Configuring for Secure Clusters
- Configuring Atlas in a Kerberized Cluster
- Validating Atlas
26. Setting Up Kerberos Security for Manual Installs
27. Uninstalling HDP

Validating Hive-on-Tez Installation

Use the following procedure to validate your configuration of Hive-on-Tez:

Create a sample test.txt file:
echo -e "alice miller\t49\t3.15" > student.txt

Upload the new data file to HDFS:

su - $HDFS_USER 
hadoop fs -mkdir -p /user/test/student 
hadoop fs -copyFromLocal student.txt /user/test/student 
hadoop fs -chown hive:hdfs /user/test/student/student.txt 
hadoop fs -chmod 775 /user/test/student/student.txt

Open the Hive command-line shell:
su - $HDFS_USER
Create a table named "student" in Hive:
hive> CREATE EXTERNAL TABLE student(name string, age int, gpa double) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'STORED AS TEXTFILE LOCATION '/user/test/student';

Execute the following query in Hive:

hive> SELECT COUNT(*) FROM student;

If Hive-on-Tez is configured properly, this query should return successful results similar to the following:

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED    117        117        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 445.02 s
--------------------------------------------------------------------------------
Status: DAG finished successfully in 445.02 seconds
Time taken: 909.882 seconds