Table-level replication

To enable table-level replication, you must specify the list of tables to be replicated in a given replication policy. Table-level replication enables you to replicate just the critical tables. It also helps you to speed-up the replication process and also reduces network bandwidth utilization.

You can define table-level replication policy using regular expressions, for example, db.marketing_*. You can dynamically add or remove tables to the list by manually changing the replication policy during run time. Hive also automatically validates the rename table operation to check if the new table name is included or excluded as per the defined replication policy and act accordingly.

Hive supports database level replication policy of the format <db_name>.*. In the real-time world, the policy format is similar to <db_name>.(t1, t3, …). The tables list can be specified using Java supported regular expressions in the replication policy of format: <db_name>.<include_regex>.<exclude_regex>.

The replication policy has three parts separated with a DOT (.). First part is the DB name, second part is single regex to represent the included tables list, and third part is single regex to represent the tables that needs to be excluded from the list even if it matches the include_regex format.

The following examples illustrate the different ways to specify the tables list in the policy:

  1. <db_name> -- Full DB replication which is currently supported.
  2. <db_name>.'.*?' -- Full DB replication.
  3. <db_name>.'t1|t3' -- DB replication with static list of tables t1 and t3 included.
  4. <db_name>.'(t1*)|t2'.'t100' -- DB replication with all tables having prefix t1 and also include table t2 which does not have prefix t1 and exclude t100 which has the prefix t1.
  • If any table is dynamically added for replication due to changes in regular expression or added to the include list, the tables data may not be point-in-time consistent with other tables which are already replicated incrementally. However, this inconsistency is seen for a very small duration of completing next incremental replication after tables are added in the bootstrapped manner.
  • Hive does not support overlapping replication policies such as db.,, db.[t1], and *. to same target database but works fine if the target database is different.