Chapter 6. Orchestrating SQL and APIs with Apache Phoenix
Apache Phoenix is a SQL abstraction layer for interacting with Apache HBase and other Hadoop components. Phoenix lets you create and interact with tables in the form of typical DDL/DML statements via its standard JDBC API. With the driver APIs, Phoenix translates SQL to native HBase API calls. Consequently, Phoenix provides a SQL skin for working with data and objects stored in the NoSQL schema of HBase.
This Phoenix documentation focuses on interoperability with HBase. For more information about Phoenix capabilities, see the Apache Phoenix website.
Enabling Phoenix and Interdependent Components
If you have a Hortonworks Data Platform installation with Ambari, then no separate installation is required for Phoenix.
To enable Phoenix with Ambari:
Open Ambari.
Select Services tab > HBase > Configs tab.
Scroll down to the Phoenix SQL settings.
(Optional) Reset the Phoenix Query Timeout.
Click the Enable Phoenix slider button.
If you installed Hortonworks Data Platform manually and did not include the Phoenix component, see Installing Apache Phoenix.
Important | |
---|---|
Your Phoenix installation must be the same version as the one that is packaged with the distribution of the HDP stack version that is deployed across your cluster. |
Thin Client Connectivity with Phoenix Query Server
The Phoenix Query Server (PQS) is a component of the Apache Phoenix distribution. PQS provides an alternative means to connect directly. PQS is a stand-alone server that converts custom API calls from "thin clients" to HTTP requests that make use of Phoenix capabilities. This topology offloads most computation to PQS and requires a smaller client-side footprint. The PQS client protocol is based on the Avatica component of Apache Calcite.
Securing Authentication on the Phoenix Query Server
You can enable Kerberos-based authentication on PQS with Ambari. If you chose to install HDP manually instead, see Configuring Phoenix Query Server to enable the Kerberos protocol.
Selecting and Obtaining a Client Driver
You have two options to develop an application that works with Phoenix, depending on the client-server architecture:
Without Phoenix Query Server: If your environment does not have a PQS layer, applications that connnect to Phoenix must use the Phoenix JDBC client driver.
With Phoenix Query Server: PQS is an abstraction layer that enables other languages such as Python and GoLang to work with Phoenix. The layer provides a protocol buffer as an HTTP wrapper around Phoenix JDBC. You might prefer to use a non-Java client driver for one of various reasons, such as to avoid the JVM footprint on the client or to develop with a different application framework.
To obtain the appropriate driver for application development:
- JDBC Driver
Use the
/usr/hdp/current/phoenix-client/phoenix-client.jar
file in the Hortonworks Phoenix server-client repository . If you use the repository, download the JAR file corresponding to your installed HDP version. With Ambari, you can determine the HDP version by using the Versions tab. Alternatively, run the hadoop version command to print information displaying the HDP version.- JDBC Driver as a Maven dependency
See Download the HDP Maven Artifacts for Maven artifact repositories that are available for HDP.
- Microsoft .NET Driver
Download and install a NuGet package for the Microsoft .NET Driver for Apache Phoenix and Phoenix Query Server. Note: Operability with this driver is a Hortonworks Technical Preview and considered under development. Do not use this feature in your production systems. If you have questions regarding this feature, contact Support by logging a case on the Hortonworks Support Portal.
- Other non-Java drivers
Other non-JDBC Drivers for Phoenix are available as HDP add-ons and on other websites, but they are not currently supported by Hortonworks. You can find compatible client drivers by constructing a web search string consisting of "avatica" and the name of an application programming language that you want to use. Example:
avatica python
.
Creating and Using User-Defined Functions (UDFs) in Phoenix
With a user-defined function (UDF), you can extend the functionality of your SQL statements by creating scalar functions that operate on a specific tenant. For details about creating, dropping, and how to use UDFs for Phoenix, see User-defined functions on the Apache website.
Mapping Phoenix Schemas to HBase Namespaces
You can map a Phoenix schema to an HBase namespace to gain multitenancy features in Phoenix.
HBase, which is often the underlying storage engine for Phoenix, has namespaces to support multitenancy features. Multitenancy helps an HBase user or administrator perform access control and quota management tasks. Also, namespaces enable tighter control of where a particular data set is stored on RegionsServers. See Enabling Multitenancy with Namepaces for further information.
Prior to HDP 2.5, Phoenix tables could not be associated with a namespace other than the
default
namespace.
Enabling Namespace Mapping
Important | |
---|---|
After you set the properties to enable the mapping of Phoenix schemas to HBase namespaces, reverting the property settings renders the Phoenix database unusable. Test or carefully plan the Phoenix to HBase namespace mappings before implementing them. |
To enable Phoenix schema mapping to a non-default HBase namespace:
Set the phoenix.schema.isNamespaceMappingEnabled property to
true
in thehbase-site.xml
file of both the client and the server.Restart the HBase Master and RegionServer processes.
Note | |
---|---|
You might not want to map Phoenix system tables to namespaces because there are compatibility
issues with your current applications. In this case, set the
phoenix.schema.mapSystemTablesToNamespace property of the
|
Namespace Mapping Properties in the hbase-site.xml
File
- phoenix.schema.isNamespaceMappingEnabled
Enables mapping of tables of a Phoenix schema to a non-default HBase namespace. To enable mapping of schema to a non-default namespace, set the value of this property to
true
. Default setting for this property isfalse
.- phoenix.schema.mapSystemTablesToNamespace
With
true
setting (default): After namespace mapping is enabled with the other property, all system tables, if any, are migrated to a namespace calledsystem
.With
false
setting: System tables are associated with thedefault
namespace.
Creating New Schemas and Tables with Namespace Mapping
You can use the following DDL statements for managing schemas:
CREATE SCHEMA
USE SCHEMA
DROP SCHEMA
You must have admin
privileges in HBase to run CREATE SCHEMA or DROP SCHEMA.
See the Apache Phoenix Grammar reference page for how you can use these DDL statements.
As you create physical tables, views, and indexes, you can associate them with a schema.
If the schema already has namespace mapping enabled, the newly created objects automatically
become part of the HBase namespace. The directory of the HBase namespace that maps to the
Phoenix schema inherits the schema name. For example, if the schema name is
store1
, then the full path to the namespace is
$hbase.rootdir/data/store1.
See the "F.A.Q." section of Apache Phoenix Namespace
Mapping for more information.
Associating Tables of a Schema to a Namespace
After you enable namespace mapping on a Phoenix schema that already has tables, you
can migrate the tables to an HBase namespace. The namespace directory that contains the
migrated tables inherits the schema name. For example, if the schema name is
store1
, then the full path to the namespace is
$hbase.rootdir/data/store1
.
System tables are migrated to the namespace automatically during the first connection after enabling namespace properties.
Associating in a Noncustomized Environment without Kerberos
Run the following command to associate a table:
$bin/psql.py
{ZooKeeper_hostname
-m
schema_name.table_name
}
Associating in a Customized Kerberos Environment
Prerequisite: In a Kerberos-secured environment, you must have
admin privileges (user hbase
) to complete the following task.
Navigate to the Phoenix home directory. The default location is
/usr/hdp/current/phoenix-client/
.Run a command to migrate a table of a schema to a namespace, using the following command syntax for the options that apply to your environment:
$ bin/psql.py
{{ZooKeeper_hostnames
:2181} | [:zookeeper.znode.parent
] | [:HBase_headless_keytab_location
] | [:principal_name
] | [;TenantId=tenant_Id
] | [;CurrentSCN=current_SCN
]}-m
{schema_name.table_name
}
Additional information for valid command parameters:
ZooKeeper_hostnames
Enter the ZooKeeper hostname or hostnames that compose the ZooKeeper quorum. If you enter multiple hostnames, enter them as comma-separated values. This parameter is required. You must append the colon and ZooKeeper port number if you invoke the other security parameters in the command. The default port number is 2181.
zookeeper.znode.parent
This setting is defined in the
hbase-site.xml
file.- -m
schema_name.table_name
The
-m
argument is required. There is a space before and after the-m
option.
Phoenix Repair Tool
Important | |
---|---|
The Phoenix repair tool of HDP is a technical preview and considered under development. Do not use this feature in your production systems. If you have questions regarding this feature, contact Support by logging a case on the Hortonworks Support Portal. |
Apache Phoenix depends on the SYSTEM.CATALOG table for metadata information, such as table structure and index location, to function correctly. Use the Phoenix repair tool to validate the data integrity of the SYSTEM.CATALOG table. If a Phoenix client is not functioning as expected and throwing exceptions such as ArrayIndexOutOfBound or TableNotFound, this tool can help identify the problem and fix it.
The repair tool is designed to flag issues that are flagrant trouble spots and to fix SYSTEM.CATALOG problems in a way that does not radically affect your Phoenix system. The tool prompts you to confirm changes before the SYSTEM.CATALOG table is modified.
Do not use the Phoenix repair tool for an upgrade. The tool is designed to function ony with the current version of the system catalog and to use the HBase API directly.
Running the Phoenix Repair Tool
Tip | |
---|---|
Run the HDFS fsck and HBase hbck tools before running the Phoenix repair tool. Checking the condition of HDFS and HBase is highly recommended because the Phoenix repair tool does not run on HDFS and HBase, both of which must be in working order for the repair tool to fix Phoenix problems. |
About this Task
The Phoenix repair tool looks for table records in the system catalog and collects all corresponding information about columns and indexes. If certain inconsistencies are detected, then the tool prompts you to verify that it should proceed with fixing the problems. The tool can fix the following problems:
Missing or disabled physical table
Incorrect number of columns in table metadata information
Table record has columns with an internal index that is out of range
The tool performs a cross-reference check between user tables and indexes. If a user table has an index that misses a physical table, the tool offers to delete the link to this index as well as to delete the index table record from the system catalog. If the physical table is disabled, the tool asks whether it needs to be enabled.
If you allow the Phoenix repair tool to fix an issue, the tool creates a snapshot of the SYSTEM.CATALOG table. The snapshot is created in case you want to rollback the repair operation.
Prerequisites
Verify that no concurrent execution of the Phoenix repair tool launches or runs while you run the tool. Also, ensure that no other clients modify the system catalog data while the tool runs.
Steps
Run the psl.py utility with the
-r
option:/usr/hdp/current/phoenix-client/psql.py -r
If the tool detects previously stored snapshots on the system, respond to the Restore dialogue prompt:
Respond whether the tool should delete or retain the previously recorded snapshots.
Indicate whether the tool should proceed with the integrity check or restore a table from the one of the snapshots.
Result and Next Step
After the tool completes the check, you can consider the SYSTEM.CATALOG table as validated. You can proceed with SQL operations in the Phoenix CLI.