Table-level replication

To enable table-level replication, you must specify the list of tables to be replicated in a given replication policy. Table-level replication enables you to replicate just the critical tables. It also helps you to speed-up the replication process and also reduces network bandwidth utilization.

You can define table level replication policy using regular expressions, for example, db.marketing_*. You can dynamically add or remove tables to the list by manually changing the replication policy during run time.

Hive supports database level replication policy of the format <db_name>.*. In the real-time world, the policy format is similar to <db_name>.(t1, t3, …). The tables list can be specified using Java supported regular expressions in the replication policy of format: <db_name>.<include_regex>.<exclude_regex>.

The replication policy has three parts separated with a DOT (.). The first part is the database name, the second part is a single regex to represent the included tables list, and the third part is a single regex to represent the tables that need to be excluded from the list even if it matches the include_regex format.

The following examples illustrate the different ways to specify the tables list in the policy:

  1. <db_name> - Full database replication which is currently supported.
  2. <db_name>.'.*?' -Full database replication.
  3. <db_name>.'t1|t3' - Database replication with static list of tables t1 and t3 included.
  4. <db_name>.'(t1*)|t2'.'t100' - Database replication with all tables having prefix t1, includes table t2 which does not have prefix t1, and excludes t100 which has the prefix t1.
Limitations
  • If a table is dynamically added for replication due to changes in regular expression or added to the include list, the tables' data may not be point-in-time consistent with other tables which are already replicated incrementally. However, this inconsistency is seen for a very small duration until the completion of the next incremental replication after tables are added in the bootstrapped manner.
  • Hive does not support overlapping replication policies such as db., db.[t1], and *. to same target database but works as expected if the target databases are different.