Chapter 6. Orchestrating SQL and APIs with Apache Phoenix
Apache Phoenix is a SQL abstraction layer for interacting with Apache HBase and other Hadoop components. Phoenix lets you create and interact with tables in the form of typical DDL/DML statements via its standard JDBC API. With the driver APIs, Phoenix translates SQL to native HBase API calls. Consequently, Phoenix provides a SQL skin for working with data and objects stored in the NoSQL schema of HBase.
This Phoenix documentation focuses on interoperability with HBase. For more information about Phoenix capabilities, see the Apache Phoenix website.
Enabling Phoenix and Interdependent Components
If you have a Hortonworks Data Platform installation with Ambari, then no separate installation is required for Phoenix.
To enable Phoenix with Ambari:
Open Ambari.
Select Services tab > HBase > Configs tab.
Scroll down to the Phoenix SQL settings.
(Optional) Reset the Phoenix Query Timeout.
Click the Enable Phoenix slider button.
If you installed Hortonworks Data Platform manually and did not include the Phoenix component, see Installing Apache Phoenix.
Important | |
---|---|
Your Phoenix installation must be the same version as the one that is packaged with the distribution of the HDP stack version that is deployed across your cluster. |
Thin Client Connectivity with Phoenix Query Server
The Phoenix Query Server (PQS) is a component of the Apache Phoenix distribution. PQS provides an alternative means to connect directly. PQS is a stand-alone server that converts custom API calls from "thin clients" to HTTP requests that make use of Phoenix capabilities. This topology offloads most computation to PQS and requires a smaller client-side footprint. The PQS client protocol is based on the Avatica component of Apache Calcite.
Securing Authentication on the Phoenix Query Server
You can enable Kerberos-based authentication on PQS with Ambari. If you chose to install HDP manually instead, see Configuring Phoenix Query Server to enable the Kerberos protocol.
Selecting and Obtaining a Client Driver
You have two options to develop an application that works with Phoenix, depending on the client-server architecture:
Without Phoenix Query Server: If your environment does not have a PQS layer, applications that connnect to Phoenix must use the Phoenix JDBC client driver.
With Phoenix Query Server: PQS is an abstraction layer that enables other languages such as Python and GoLang to work with Phoenix. The layer provides a protocol buffer as an HTTP wrapper around Phoenix JDBC. You might prefer to use a non-Java client driver for one of various reasons, such as to avoid the JVM footprint on the client or to develop with a different application framework.
To obtain the appropriate driver for application development:
- JDBC Driver
Use the
/usr/hdp/current/phoenix-client/phoenix-client.jar
file in the Hortonworks Phoenix server-client repository . If you use the repository, download the JAR file corresponding to your installed HDP version. With Ambari, you can determine the HDP version by using the Versions tab. Alternatively, run the hadoop version command to print information displaying the HDP version.- JDBC Driver as a Maven dependency
See Download the HDP Maven Artifacts for Maven artifact repositories that are available for HDP.
- Microsoft .NET Driver
Download and install a NuGet package for the Microsoft .NET Driver for Apache Phoenix and Phoenix Query Server. Note: Operability with this driver is a Hortonworks Technical Preview and considered under development. Do not use this feature in your production systems. If you have questions regarding this feature, contact Support by logging a case on the Hortonworks Support Portal.
- Other non-Java drivers
Other non-JDBC Drivers for Phoenix are available as HDP add-ons and on other websites, but they are not currently supported by Hortonworks. You can find compatible client drivers by constructing a web search string consisting of "avatica" and the name of an application programming language that you want to use. Example:
avatica python
.
Mapping Phoenix Schemas to HBase Namespaces
You can map a Phoenix schema to an HBase namespace to gain multitenancy features in Phoenix.
HBase, which is often the underlying storage engine for Phoenix, has namespaces to support multitenancy features. Multitenancy helps an HBase user or administrator perform access control and quota management tasks. Also, namespaces enable tighter control of where a particular dataset is stored on RegionsServers. See Enabling Multitenancy with Namepaces for further information.
Prior to HDP 2.5, Phoenix tables could not be associated with a namespace other than the
default
namespace.
Enabling Namespace Mapping
Important | |
---|---|
After you set the properties to enable the mapping of Phoenix schemas to HBase namespaces, reverting the property settings renders the Phoenix database unusable. Test or carefully plan the Phoenix to HBase namespace mappings before implementing them. |
To enable Phoenix schema mapping to a non-default HBase namespace:
Set the phoenix.schema.isNamespaceMappingEnabled property to
true
in thehbase-site.xml
file of both the client and the server.Restart the HBase Master and RegionServer processes.
Note | |
---|---|
You might not want to map Phoenix system tables to namespaces because there are compatibility
issues with your current applications. In this case, set the
phoenix.schema.mapSystemTablesToNamespace property of the
|
Namespace Mapping Properties in the hbase-site.xml
File
- phoenix.schema.isNamespaceMappingEnabled
Enables mapping of tables of a Phoenix schema to a non-default HBase namespace. To enable mapping of schema to a non-default namespace, set the value of this property to
true
. Default setting for this property isfalse
.- phoenix.schema.mapSystemTablesToNamespace
With
true
setting (default): After namespace mapping is enabled with the other property, all system tables, if any, are migrated to a namespace calledsystem
.With
false
setting: System tables are associated with thedefault
namespace.
Creating New Schemas and Tables with Namespace Mapping
You can use the following DDL statements for managing schemas:
CREATE SCHEMA
USE SCHEMA
DROP SCHEMA
You must have admin
privileges in HBase to run CREATE SCHEMA or DROP SCHEMA.
See the Apache Phoenix Grammar reference page for how you can use these DDL statements.
As you create physical tables, views, and indexes, you can associate them with a schema.
If the schema already has namespace mapping enabled, the newly created objects automatically
become part of the HBase namespace. The directory of the HBase namespace that maps to the
Phoenix schema inherits the schema name. For example, if the schema name is
store1
, then the full path to the namespace is
$hbase.rootdir/data/store1.
See the "F.A.Q." section of Apache Phoenix Namespace
Mapping for more information.
Associating Tables of a Schema to a Namespace
After you enable namespace mapping on a Phoenix schema that already has tables, you
can migrate the tables to an HBase namespace. The namespace directory that contains the
migrated tables inherits the schema name. For example, if the schema name is
store1
, then the full path to the namespace is
$hbase.rootdir/data/store1
.
System tables are migrated to the namespace automatically during the first connection after enabling namespace properties.
Associating in a Noncustomized Environment without Kerberos
Run the following command to associate a table:
$bin/psql.py
{ZooKeeper_hostname
-m
schema_name.table_name
}
Associating in a Customized Kerberos Environment
Prerequisite: In a Kerberos-secured environment, you must have
admin privileges (user hbase
) to complete the following task.
Navigate to the Phoenix home directory. The default location is
/usr/hdp/current/phoenix-client/
.Run a command to migrate a table of a schema to a namespace, using the following command syntax for the options that apply to your environment:
$ bin/psql.py
{{ZooKeeper_hostnames
:2181} | [:zookeeper.znode.parent
] | [:HBase_headless_keytab_location
] | [:principal_name
] | [;TenantId=tenant_Id
] | [;CurrentSCN=current_SCN
]}-m
{schema_name.table_name
}
Additional information for valid command parameters:
ZooKeeper_hostnames
Enter the ZooKeeper hostname or hostnames that compose the ZooKeeper quorum. If you enter multiple hostnames, enter them as comma-separated values. This parameter is required. You must append the colon and ZooKeeper port number if you invoke the other security parameters in the command. The default port number is 2181.
zookeeper.znode.parent
This setting is defined in the
hbase-site.xml
file.- -m
schema_name.table_name
The
-m
argument is required. There is a space before and after the-m
option.