As of CDH 5, HCatalog is part of Apache Hive.
HCatalog is a table and storage management layer for Hadoop that makes the same table information available to Hive, Pig, MapReduce, and Sqoop. Table definitions are maintained in the Hive metastore, which HCatalog requires. WebHCat allows you to access HCatalog using an HTTP (REST style) interface.
This page explains how to install and configure HCatalog and WebHCat. For Sqoop, see Sqoop-HCatalog Integration in the Sqoop User Guide.
Configuring HCatalog Using Cloudera Manager
Minimum Required Role: Cluster Administrator (also provided by Full Administrator)
- Go to the Hive service by clicking .
- Select the Hive Instances tab.
- Add a WebHCat server role:
- Click Add Role Instances.
- Click Select hosts under WebHCat Server.
- Select the host on which you want the WebHCat server; this adds a WHCS icon.
- Click OK.
- Click Continue.
- Start the new role type.
- Select the new role type, WebHCat Server.
- Select .
- Click Start and Close.
Configuring HCatalog Using the Command Line
This section applies to unmanaged deployments without Cloudera Manager. Use the following sections to install, configure and use HCatalog:
- Installing and Upgrading the HCatalog RPM or Debian Packages
- Host Configuration Changes
- Starting and Stopping the WebHCat REST Server
- Accessing Table Data with the Command-line API
- Accessing Table Data with MapReduce
- Accessing Table Data with Pig
- Accessing Table Data with REST
- Apache HCatalog Documentation
For more information, see the HCatalog documentation.