Using Apache Phoenix to store and access data
Also available as:
PDF

Apache Phoenix-Hive usage examples

You can view the examples of creating a table, loading data, and querying data using Hive. You can use the Apache Phoenix StorageHandler plugin to enable Apache Hive access to Phoenix tables from the Apache Hive command line using HiveQL.

Creating a table

In HDP 3.0, all the Hive tables that are backed by a StorageHandler must use the EXTERNAL keyword. Creating an external Hive table requires an existing table in Phoenix. Hive manages only the Hive metadata. Dropping an external table from Hive deletes only the Hive metadata, but the Phoenix table is not deleted.

Use the create external table command to craete an EXTERNAL Hive table.


create external table ext_table (
  i1 int,
  s1 string,
  f1 float,
  d1 decimal
)
STORED BY 'org.apache.phoenix.hive.PhoenixStorageHandler'
TBLPROPERTIES (
  "phoenix.table.name" = "ext_table",
  "phoenix.zookeeper.quorum" = "localhost",
  "phoenix.zookeeper.znode.parent" = "/hbase",
  "phoenix.zookeeper.client.port" = "2181",
  "phoenix.rowkeys" = "i1",
  "phoenix.column.mapping" = "i1:i1, s1:s1, f1:f1, d1:d1"
);

Following are the parameters that you could use when creating an external table.

Parameter Default Value Description
phoenix.table.name The same name as the Hive table Name of the existing Phoenix table
phoenix.zookeeper.quorum localhost Specifies the ZooKeeper quorum for HBase
phoenix.zookeeper.znode.parent /hbase Specifies the ZooKeeper parent node for HBase
phoenix.zookeeper.client.port 2181 Specifies the ZooKeeper port
phoenix.rowkeys N/A The list of columns to be the primary key in a Phoenix table
phoenix.column.mapping N/A Mappings between column names for Hive and Phoenix

Load data

Use insert statement to load data to the Phoenix table through Hive.


insert into table T values (....);
insert into table T select c1,c2,c3 from source_table;

Query data

You can use HiveQL for querying data in a Phoenix table. A Hive query on a single table can be as fast as running the query in the Phoenix CLI with the following property settings:
hive.fetch.task.conversion=more and hive.exec.parallel=true

Following are some of the parameters that you could use when querying the data.

Parameter Default Value Description
hbase.scan.cache 100 Read row size for a unit request
hbase.scan.cacheblock false Whether or not cache block
split.by.stats false If true, mappers use table statistics. One mapper per guide post.
[hive-table-name].reducer.count 1 Number of reducers. In Tez mode, this affects only single-table queries. See Limitations.
[phoenix-table-name].query.hint Hint for Phoenix query (for example, NO_INDEX)