Replicating Phoenix Index Tables

The secondary indexing implementation in Phoenix changed drastically between versions lower than Phoenix 4.14 and higher. The manner in which HBase replication sends Phoenix data is fundamentally incompatible with the new style of Phoenix secondary index maintenance. Architecturally, it is most desirable that we replicate only Phoenix data tables, and let the “local” HBase cluster maintain any secondary indexes for you. However, currently no “Phoenix-aware” replication strategy exists which is capable of automatically maintaining the Phoenix secondary indexes for replicated Phoenix data tables.

Follow the recommended steps in Replicating Phoenix Data Tables to replicate all Phoenix data tables before proceeding. For each index table that you want to replicate, perform the following steps:
  1. Create the corresponding index in COD. Be sure to specify the same exact create index command that was used in the legacy product (both indexes columns and “include” columns). If you have a significant amount of data in the corresponding data table, include the async keyword in the create index command. The async keyword will create the table but not populate any data into this table.
    jdbc:phoenix> CREATE INDEX SMALL_INDEX on SMALL_TABLE(col1) INCLUDE (col2);
    jdbc:phoenix> CREATE INDEX BIG_INDEX ON BIG_TABLE(col1) INCLUDE (col2) ASYNC;
  2. Enable replication on the Phoenix index table in the source cluster by using the “alter” command in the HBase shell like enabling replication for HBase and Phoenix Data tables.
    hbase> alter “BIG_INDEX”, {NAME=>”0”, REPLICATION_SCOPE=>1}
  3. If you used the ASYNC keyword in the create index in COD, now you have to use IndexTool to build the index. This will launch a MapReduce job to build in the index in parallel, rather than synchronously from a single client as is normally done. An example IndexTool invocation would be done on the command line for the COD cluster (using the HBase client tarball) like the following:
    $ hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table BIG_TABLE --index-table 
    BIG_INDEX --output-path /tmp/BIG_INDEX-files
  4. Ensure the MapReduce job launched by this command completes before proceeding.
  5. Validate the contents of the index table, as the index table is used for queries. For example, you can read a few rows from the index table directly:
    jdbc:phoenix> SELECT * FROM BIG_INDEX LIMIT 10;
    Alternatively, you could look at the explain plan for a query that uses the index (noting that BIG_INDEX is used instead of BIG_TABLE in this example)
    jdbc:phoenix> EXPLAIN SELECT col1, col2 FROM BIG_TABLE;