Hortonworks Data Platform deploys Apache Hive for your Hadoop cluster.
Hive is a data warehouse infrastructure built on top of Hadoop. It provides tools to enable easy data ETL, a mechanism to put structures on the data, and the capability for querying and analysis of large data sets stored in Hadoop files.
Hive defines a simple SQL-like query language, called QL, that enables users familiar with SQL to query the data. At the same time, this language also allows programmers who are familiar with the MapReduce framework to be able to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the built-in capabilities of the language.
Hive Documentation
Documentation for Hive release 0.10.0 can be found in multiple places.
The Hive wiki contains documentation organized in these sections:
General Information about Hive
User Documentation
Administrator Documentation
Resources for Contributors
Supplementary documentation describes new features and bug fixes, including:
HiveServer2 JDBC
Decimal data type
Metastore server security
Secure cluster configuration (JDBC client setup)
Javadocs describe the Hive API. The supplementary documentation includes a complete set of Javadocs for this release.
Hive indexing was added in version 0.7.0; documentation and examples can be found here:
Indexes -- design document (lists the indexing Jiras with current status, starting with HIVE-417)
Create/Drop Index -- HiveQL language manual
Bitmap indexes -- added in Hive version 0.8.0 (Jira HIVE-1803)
Indexed Hive -- overview and examples by Prafulla Tekawade and Nikhil Deshpande, October 2010
Tutorial: SQL-like join and index with MapReduce using Hadoop and Hive -- blog by Ashish Garg, April 2012
Hive JIRAs
Issue tracking for Hive bugs and improvements can be found here: Hive JIRAs.
Hive ODBC Driver
Hortonworks provides a Hive ODBC driver that allows you to connect popular Business Intelligence (BI) tools to query, analyze and visualize data stored within the Hortonworks Data Platform.