Understanding Apache HBase Hive integration
With Hortonworks Data Platform (HDP), you can use Hive HBase integration to perform
READ and WRITE operations on the HBase tables. HBase is integrated with Hive using the
StorageHandler. You can access the data through both Hive and HBase.
Prerequisites You must complete the following steps before configuring the Hive and HBase.Configuring HBase and Hive Follow this step to complete the configuration:Using HBase Hive integration Before you begin to use the Hive HBase integration, complete the following steps:HBase Hive integration example A change to Hive in HDP 3.0 is that all StorageHandlers must be marked as “external”. There is no such thing as an non-external table created by a StorageHandler. If the corresponding HBase table exists when the Hive table is created, it will mimic the HDP 2.x semantics of an “external” table. If the corresponding HBase table does not exist when the Hive table is created, it will mimic the HDP 2.x semantics of a non-external table (e.g. the HBase table is dropped when the Hive table is dropped).Using Hive to access an existing HBase table example Use the following steps to access the existing HBase table through Hive.Understanding Bulk Loading A common pattern in HBase to obtain high rates of data throughput on the write path is to use “bulk loading”. This generates HBase files (HFiles) that have a specific format instead of shipping edits to HBase RegionServers. The Hive integration has the ability to generate HFiles, which can be enabled by setting the property “hive.hbase.generatehfiles” to true, for example, `set hive.hbase.generatehfiles=true`. Additionally, the path to a directory which to write the HFiles must also be provided, for example,`set hfile.family.path=/tmp/hfiles”.Understanding HBase Snapshots When an HBase snapshot exists for an HBase table which a Hive table references, you can choose to execute queries over the “offline” snapshot for that table instead of the table itself.