Data Access
Also available as:
PDF
loading table of contents...

Query Capability on Remote Clusters

[Important]Important

Hortonworks certification of the ability to query remote clusters as described in this section applies only to environments where the data lake is stored in a HDP 2.4.2 Hive data warehouse and the local cluster for computation has HDP 2.6.3 Hive.

The DDL statements to create an infrastructure where the local, compute-only cluster can query data on a remote cluster are documented on the LanguageManual DDL page of the Apache Hive wiki. The only additional requirement is that the LOCATION clause must point to the IP address of the active NameNode in the remote cluster.

Required Setup of Database Objects

[Important]Important

The SQL statements in this section are examples to illustrate possible ways of creating the objects with properties that support queries on remote clusters. Do not copy these SQL statements and paste them into your production system.

  • An external table on the local cluster that exposes the Hive data tracked by an active NameNode and that specifies the IP address of the NameNode in the LOCATION clause of the SQL statement.

    Example of how to create an external table pointing to the active NameNode:

    CREATE EXTERNAL TABLE orders_ext
    (
      O_ORDERKEY int,
      O_CUSTKEY int,
      O_ORDERSTATUS char(1),
      O_TOTALPRICE decimal(15,2),
      O_ORDERDATE date,
      O_ORDERPRIORITY char(15),
      O_CLERK char(15),
      O_SHIPPRIORITY int,
      O_COMMENT string
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
    LOCATION "hdfs://172.27.24.198:8020/tmp/tpch/data/orders";
  • A Hive table on the remote cluster into which the data of the active NameNode is loaded.

    Example of how to create a Hive table and load it with data of a NameNode IP address:

    CREATE TABLE orders_managed
    (
      O_ORDERKEY int,
      O_CUSTKEY int,
      O_ORDERSTATUS char(1),
      O_TOTALPRICE decimal(15,2),
      O_ORDERDATE date,
      O_ORDERPRIORITY char(15),
      O_CLERK char(15),
      O_SHIPPRIORITY int,
      O_COMMENT string
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '|';
    
    load data inpath "hdfs://172.27.24.198:8020/tmp/orders" into table orders_managed;
    
  • Some environments also require temporary tables or views to manipulate a subset of the data or cache the results of a large query.

Next Steps

You can query the data on the remote cluster, including the ability to perform WRITE operations from the local cluster.

Examples of Supported Queries

CREATE TABLE orders_ctas AS SELECT * FROM orders_ext;
INSERT INTO orders_ctas SELECT * FROM orders_ext;
INSERT OVERWRITE TABLE orders_ctas SELECT * FROM orders_ext;