To enable table-level replication, you must specify the list of tables to be replicated in a given replication policy. Table-level replication enables you to replicate just the critical tables. It also helps you to speed-up the replication process and also reduces network bandwidth utilization.
You can define table-level replication policy using regular expressions, for example, db.marketing_*. You can dynamically add or remove tables to the list by manually changing the replication policy during run time. Hive automatically bootstrap the table if it is dynamically added to the policy and automatically drop the table if it is dynamically excluded. Hive also automatically validates the rename table operation to check if the new table name is included or excluded as per the defined replication policy and act accordingly.
Hive supports database level replication policy of the format
<db_name>.*. In the real-time world, the policy format is
<db_name>.(t1, t3, …). The tables list can be specified
using Java supported regular expressions in the replication policy of format:
The replication policy has three parts separated with a DOT (.). First part is
the DB name, second part is single regex to represent the included tables list, and
third part is single regex to represent the tables that needs to be excluded from the
list even if it matches the
<db_name>-- Full DB replication which is currently supported.
<db_name>.'.*?'-- Full DB replication.
<db_name>.'t1|t3'-- DB replication with static list of tables t1 and t3 included.
<db_name>.'(t1*)|t2'.'t100'-- DB replication with all tables having prefix t1 and also include table t2 which does not have prefix t1 and exclude t100 which has the prefix t1.
- If any table is dynamically added for replication due to changes in regular expression or added to the include list, the tables data may not be point-in-time consistent with other tables which are already replicated incrementally. However, this inconsistency is seen for a very small duration of completing next incremental replication after tables are added in the bootstrapped manner.
- Hive does not support single replication policy with tables from different databases. Each DB makes independent policies.
- Hive does not support overlapping replication policies such as db.,, db.[t1], and *. to same target database. However, it works fine if the target database is different.