Data Access
Also available as:
PDF
loading table of contents...

Chapter 4. Managing Metadata Services with Apache HCatalog

Hortonworks Data Platform (HDP) deploys Apache HCatalog to manage the metadata services for your Hadoop cluster.

Apache HCatalog is a table and storage management service for data created using Apache Hadoop. This includes:

  • Providing a shared schema and data type mechanism.

  • Providing a table abstraction so that users need not be concerned with where or how their data is stored.

  • Providing interoperability across data processing tools such as Pig, MapReduce, and Hive.

Start the HCatalog CLI with the following command:

<hadoop-install-dir>\hcatalog-0.5.0\bin\hcat.cmd 
[Note]Note

HCatalog 0.5.0 was the final version released from the Apache Incubator. In March 2013, HCatalog graduated from the Apache Incubator and became part of the Apache Hive project. New releases of Hive include HCatalog, starting with Hive 0.11.0.

HCatalog includes two documentation sets:

  1. General information about HCatalog

    This documentation covers installation and user features. The next section, Using HCatalog, provides links to individual documents in the HCatalog documentation set.

  2. WebHCat information

    WebHCat is a web API for HCatalog and related Hadoop components. The section Using WebHCat provides links to user and reference documents, and includes a technical update about standard WebHCat parameters.

For more details on the Apache Hive project, including HCatalog and WebHCat, see the Using Apache Hive chapter and the following resources: