Table Level Replication

To enable Table Level replication, you must specify the list of tables to be replicated in the given replication policy.

Table level replication enables you to replicate only those critical tables instead of replicating all the tables. Enabling table level replication policy helps you speed-up the replication process and also reduces network bandwidth utilization.

You can define table level replication policy using regular expressions, for example, db.marketing_*. You can dynamically add or remove tables to the list by manually changing the replication policy during run time. Hive automatically bootstrap the table if it is dynamically added to the policy and automatically drop the table if it is dynamically excluded. Hive also automatically validates the rename table operation to check if the new table name is included or excluded as per the defined replication policy and act accordingly.

Hive supports database level replication policy of the format <db_name>.*. In the real-time world, the policy format is similar to <db_name>.(t1, t3, …). The tables list can be specified using Java supported regular expressions in the replication policy of format: <db_name>.<include_regex>.<exclude_regex>.

The replication policy has three parts separated with a DOT (.). First part is the DB name, second part is single regex to represent the included tables list, and third part is single regex to represent the tables that needs to be excluded from the list even if it matches the include_regex format.

For Example:

  1. <db_name> -- Full DB replication which is currently supported.
  2. <db_name>.'.*?' -- Full DB replication.
  3. <db_name>.'t1|t3' -- DB replication with static list of tables t1 and t3 included.
  4. <db_name>.'(t1*)|t2'.'t100' -- DB replication with all tables having prefix t1 and also include table t2 which does not have prefix t1 and exclude t100 which has the prefix t1.
Limitations using Table Level Replication
  • If any table is dynamically added for replication due to changes in regular expression or added to the include list, the tables data may not be point-in-time consistent with other tables which are already replicated incrementally. However, this inconsistency is seen for a very small duration of completing next incremental replication after tables are added in the bootstrapped manner.
  • Hive does not support single replication policy with tables from different databases. Each DB makes independent policies.
  • Hive does not support overlapping replication policies such as db.,, db.[t1], and *. to same target database. However, it works fine if the target database is different.