Understanding Apache Phoenix-Hive Connector
This connector enables you to access the Phoenix data from Hive without any data transfer. So the Business Intelligence (BI) logic in Hive can access the operational data available in Phoenix.
Using this connector, you can run a certain type of queries in Phoenix more efficiently than using Hive or other applications, however, this is not a universal tool that can run all types of queries. In some cases, Phoenix can run queries faster than the Phoenix Hive integration and vice versa. In others, you can run this tool to perform operations like many to many joins and aggregations which Phoenix would otherwise struggle to effectively run on its own. This integration is better suited for performing online analytical query processing (OLAP) operations than Phoenix.
Another use case for this connector is transferring the data between these two systems. You can use this connector to simplify the data movement between Hive and Phoenix, since an intermediate form of the data (for example, a .CSV file) is not required. The automatic movement of structured data between these two systems is the major advantage of using this tool. You should be aware that for moving a large amount of data from Hive to Phoenix CSV bulk load is preferable due to performance reasons.
Configure Phoenix-Hive Connector using Cloudera Manager
You must configure Phoenix-Hive connector before you can access Phoenix data from Hive. To configure the Phoenix-Hive connector using Cloudera Manager:
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
- Go to the Hive service.
- Click the Configuration tab.
- Select .
- Select .
- Locate the Hive Auxiliary JARs Directory property or search for it by typing its name in the Search box.
- Add the following auxiliary path directory: /usr/local/phoenix-hive.
- Locate the Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml property or search for it by typing its name in the Search box.
- Add the following property values:
Name: hive.aux.jars.path
Value: file:///opt/cloudera/parcels/PHOENIX/lib/phoenix/phoenix-version-hive.jar
- Select .
- Select .
- Locate the Hive Client Advanced Configuration Snippet (Safety Valve) for hive-site.xml property or search for it by typing its name in the Search box.
- Add the following property values:
Name: hive.aux.jars.path
Value: file:///opt/cloudera/parcels/PHOENIX/lib/phoenix/phoenix-version-hive.jar
- Enter a Reason for change, and then click Save Changes to commit the changes.
- Restart the role and service when Cloudera Manager prompts you to restart.
Apache Phoenix-Hive Usage Examples
You can refer to the following Phoenix-Hive connector examples:
- Creating a table
- Loading data
- Querying data
Creating a table
Creating an external Hive table requires an existing table in Phoenix. Hive manages only the Hive metadata. Dropping an external table from Hive deletes only the Hive metadata, but the Phoenix table is not deleted.
Use the create external table command to create an EXTERNAL Hive table.
create external table ext_table ( i1 int, s1 string, f1 float, d1 decimal ) STORED BY 'org.apache.phoenix.hive.PhoenixStorageHandler' TBLPROPERTIES ( "phoenix.table.name" = "ext_table", "phoenix.zookeeper.quorum" = "localhost", "phoenix.zookeeper.znode.parent" = "/hbase", "phoenix.zookeeper.client.port" = "2181", "phoenix.rowkeys" = "i1", "phoenix.column.mapping" = "i1:i1, s1:s1, f1:f1, d1:d1" );
Parameter | Default Value | Description |
---|---|---|
phoenix.table.name | The same name as the Hive table | Name of the existing Phoenix table |
phoenix.zookeeper.quorum | localhost | Specifies the ZooKeeper quorum for HBase |
phoenix.zookeeper.znode.parent | /hbase | Specifies the ZooKeeper parent node for HBase |
phoenix.zookeeper.client.port | 2181 | Specifies the ZooKeeper port |
phoenix.rowkeys | N/A | The list of columns to be the primary key in a Phoenix table |
phoenix.column.mapping | N/A | Mappings between column names for Hive and Phoenix |
Loading data
Use insert statement to load data to the Phoenix table through Hive.
insert into table T values (....); insert into table T select c1,c2,c3 from source_table;
Querying data
hive.fetch.task.conversion=more and hive.exec.parallel=true
Following are some of the parameters that you could use when querying the data:
Parameter | Default Value | Description |
---|---|---|
hbase.scan.cache | 100 | Read row size for a unit request |
hbase.scan.cacheblock | false | Whether or not cache block |
split.by.stats | false | If true, mappers use table statistics. One mapper per guide post. |
[hive-table-name].reducer.count | 1 | Number of reducers. In Tez mode, this affects only single-table queries. See Limitations. |
[phoenix-table-name].query.hint | N/A | Hint for Phoenix query (for example, NO_INDEX) |
Limitations of Phoenix-Hive Connector
Following are some of the limitations of Phoenix-Hive connector:
- Only 4K character specification is allowed to specify a full table. If the volume of the data is huge, then there is a possibility to lose the metadata information.
- There is a difference in the way timestamp is saved in Phoenix and Hive. Phoenix uses binary format, whereas Hive uses a text format to store data.
- Hive LLAP is not supported in this integration.