Hive catalog

Hive can be integrated as one of the catalogs in Flink SQL.

The Hive catalog serves two purposes:
  • It is a persistent storage for pure Flink metadata
  • It is an interface for reading and writing existing Hive tables
Maven Dependency
<dependency>
   <groupId>org.apache.flink</groupId>
   <artifactId>flink-connector-hive_2.11</artifactId>
   <version>1.10.0-csa1.2.0.0</version>
</dependency>
The following example shows how to register and use the Hive catalog from Java:
String HIVE = "hive";
String DB = "default";
String HIVE_CONF_DIR = "/etc/hive/conf";
String HIVE_VERSION = "3.1.3000";

HiveCatalog catalog = new HiveCatalog(HIVE, DB, HIVE_CONF_DIR, HIVE_VERSION);
tableEnv.registerCatalog(HIVE, catalog);
tableEnv.useCatalog(HIVE);

To use the Hive Catalog from the SQL client, you can enable it either globally from Cloudera Manager or use custom environment settings as YAML configuration files.

To enable Hive Catalog in Cloudera Manager:
  1. Log in to Cloudera Manager
  2. Go to Flink>Configuration>Flink(Service-Wide ).
  3. Enable Hive Service.
  4. Enable Hive Catalog for SQL Client.
To enable Hive Catalog per user, add the following snippet in the custom environment file to the catalogs section:
...
catalogs:
  - name: hive
    type: hive
    hive-conf-dir: /etc/hive/conf
    hive-version: 3.1.3000
...
Launch the flink-sql-client and test the Hive Catalog with the following commands:
Flink SQL> show catalogs;
default_catalog
hive
Flink SQL> use catalog hive;
Flink SQL> show tables;