Data Access
Also available as:
PDF
loading table of contents...

Chapter 6. Orchestrating SQL and APIs with Apache Phoenix

Apache Phoenix is a SQL abstraction layer for interacting with Apache HBase and other Hadoop components. Phoenix lets you create and interact with tables in the form of typical DDL/DML statements via its standard JDBC API. With the driver APIs, Phoenix translates SQL to native HBase API calls. Consequently, Phoenix provides a SQL skin for working with data and objects stored in the NoSQL schema of HBase.

This Phoenix documentation focuses on interoperability with HBase. For more information about Phoenix capabilities, see the Apache Phoenix website.

Enabling Phoenix and Interdependent Components

If you have a Hortonworks Data Platform installation with Ambari, then no separate installation is required for Phoenix.

To enable Phoenix with Ambari:

  1. Open Ambari.

  2. Select Services tab > HBase > Configs tab.

  3. Scroll down to the Phoenix SQL settings.

  4. (Optional) Reset the Phoenix Query Timeout.

  5. Click the Enable Phoenix slider button.

If you installed Hortonworks Data Platform manually and did not include the Phoenix component, see Installing Apache Phoenix.

[Important]Important

Your Phoenix installation must be the same version as the one that is packaged with the distribution of the HDP stack version that is deployed across your cluster.

Thin Client Connectivity with Phoenix Query Server

The Phoenix Query Server (PQS) is a component of the Apache Phoenix distribution. PQS provides an alternative means to connect directly. PQS is a stand-alone server that converts custom API calls from "thin clients" to HTTP requests that make use of Phoenix capabilities. This topology offloads most computation to PQS and requires a smaller client-side footprint. The PQS client protocol is based on the Avatica component of Apache Calcite.

Securing Authentication on the Phoenix Query Server

You can enable Kerberos-based authentication on PQS with Ambari. If you chose to install HDP manually instead, see Configuring Phoenix Query Server to enable the Kerberos protocol.

Selecting and Obtaining a Client Driver

You have two options to develop an application that works with Phoenix, depending on the client-server architecture:

Without Phoenix Query Server: If your environment does not have a PQS layer, applications that connnect to Phoenix must use the Phoenix JDBC client driver.

With Phoenix Query Server: PQS is an abstraction layer that enables other languages such as Python and GoLang to work with Phoenix. The layer provides a protocol buffer as an HTTP wrapper around Phoenix JDBC. You might prefer to use a non-Java client driver for one of various reasons, such as to avoid the JVM footprint on the client or to develop with a different application framework.

To obtain the appropriate driver for application development:

JDBC Driver

Use the /usr/hdp/current/phoenix-client/phoenix-client.jar file in the Hortonworks Phoenix server-client repository . If you use the repository, download the JAR file corresponding to your installed HDP version. With Ambari, you can determine the HDP version by using the Versions tab. Alternatively, run the hadoop version command to print information displaying the HDP version.

JDBC Driver as a Maven dependency

See Download the HDP Maven Artifacts for Maven artifact repositories that are available for HDP.

Microsoft .NET Driver

Download and install a NuGet package for the Microsoft .NET Driver for Apache Phoenix and Phoenix Query Server. Note: Operability with this driver is a Hortonworks Technical Preview and considered under development. Do not use this feature in your production systems. If you have questions regarding this feature, contact Support by logging a case on the Hortonworks Support Portal.

Other non-Java drivers

Other non-JDBC Drivers for Phoenix are available as HDP add-ons and on other websites, but they are not currently supported by Hortonworks. You can find compatible client drivers by constructing a web search string consisting of "avatica" and the name of an application programming language that you want to use. Example: avatica python .

Mapping Phoenix Schemas to HBase Namespaces

You can map a Phoenix schema to an HBase namespace to gain multitenancy features in Phoenix.

HBase, which is often the underlying storage engine for Phoenix, has namespaces to support multitenancy features. Multitenancy helps an HBase user or administrator perform access control and quota management tasks. Also, namespaces enable tighter control of where a particular dataset is stored on RegionsServers. See Enabling Multitenancy with Namepaces for further information.

Prior to HDP 2.5, Phoenix tables could not be associated with a namespace other than the default namespace.

Enabling Namespace Mapping

[Important]Important

After you set the properties to enable the mapping of Phoenix schemas to HBase namespaces, reverting the property settings renders the Phoenix database unusable. Test or carefully plan the Phoenix to HBase namespace mappings before implementing them.

To enable Phoenix schema mapping to a non-default HBase namespace:

  1. Set the phoenix.schema.isNamespaceMappingEnabled property to true in the hbase-site.xml file of both the client and the server.

  2. Restart the HBase Master and RegionServer processes.

[Note]Note

You might not want to map Phoenix system tables to namespaces because there are compatibility issues with your current applications. In this case, set the phoenix.schema.mapSystemTablesToNamespace property of the hbase-site.xml file to false.

Namespace Mapping Properties in the hbase-site.xml File

phoenix.schema.isNamespaceMappingEnabled

Enables mapping of tables of a Phoenix schema to a non-default HBase namespace. To enable mapping of schema to a non-default namespace, set the value of this property to true. Default setting for this property is false.

phoenix.schema.mapSystemTablesToNamespace

With true setting (default): After namespace mapping is enabled with the other property, all system tables, if any, are migrated to a namespace called system.

With false setting: System tables are associated with the default namespace.

Creating New Schemas and Tables with Namespace Mapping

You can use the following DDL statements for managing schemas:

  • CREATE SCHEMA

  • USE SCHEMA

  • DROP SCHEMA

You must have admin privileges in HBase to run CREATE SCHEMA or DROP SCHEMA.

See the Apache Phoenix Grammar reference page for how you can use these DDL statements.

As you create physical tables, views, and indexes, you can associate them with a schema. If the schema already has namespace mapping enabled, the newly created objects automatically become part of the HBase namespace. The directory of the HBase namespace that maps to the Phoenix schema inherits the schema name. For example, if the schema name is store1, then the full path to the namespace is $hbase.rootdir/data/store1. See the "F.A.Q." section of Apache Phoenix Namespace Mapping for more information.

Associating Tables of a Schema to a Namespace

After you enable namespace mapping on a Phoenix schema that already has tables, you can migrate the tables to an HBase namespace. The namespace directory that contains the migrated tables inherits the schema name. For example, if the schema name is store1, then the full path to the namespace is $hbase.rootdir/data/store1.

System tables are migrated to the namespace automatically during the first connection after enabling namespace properties.

Associating in a Noncustomized Environment without Kerberos

Run the following command to associate a table:

$bin/psql.py {ZooKeeper_hostname -m schema_name.table_name}

Associating in a Customized Kerberos Environment

Prerequisite: In a Kerberos-secured environment, you must have admin privileges (user hbase) to complete the following task.

  1. Navigate to the Phoenix home directory. The default location is /usr/hdp/current/phoenix-client/.

  2. Run a command to migrate a table of a schema to a namespace, using the following command syntax for the options that apply to your environment:

    $ bin/psql.py {{ZooKeeper_hostnames:2181} | [:zookeeper.znode.parent] | [:HBase_headless_keytab_location] | [:principal_name] | [;TenantId=tenant_Id] | [;CurrentSCN=current_SCN]}
    -m {schema_name.table_name}

Additional information for valid command parameters:

ZooKeeper_hostnames

Enter the ZooKeeper hostname or hostnames that compose the ZooKeeper quorum. If you enter multiple hostnames, enter them as comma-separated values. This parameter is required. You must append the colon and ZooKeeper port number if you invoke the other security parameters in the command. The default port number is 2181.

zookeeper.znode.parent

This setting is defined in the hbase-site.xml file.

-m schema_name.table_name

The -m argument is required. There is a space before and after the -m option.