Hive/Impala replication using snapshots

Before you create Hive external table replication policies, ensure that you enable snapshots for the databases and directories that contain the required external tables. Before you replicate Impala tables, ensure that the storage locations for the tables and associated databases are also snapshottable.

For example, if the database resides in a custom location, such as /apps/folder1/folder2/[sales.db, marketing.db, hr.db, etc.], you can enable the snapshots at the following database or directory levels depending on your requirement:
  • /apps/folder1/folder2/sales.db
  • /apps/folder1/folder2/marketing.db
  • /apps/folder1/folder2/hr.db

You can also isolate the database-level snapshots from each other so that the Hive external table replication policy replicates only the specified database.

The following table shows sample custom locations that contain the external tables and the recommended directory level to enable snapshots to isolate the database-level snapshots:
Sample custom location of external tables Recommended directory level to enable snapshots
/data/folder1/folder2/sales/[table1, table2, table3 ... tablen] /data/folder1/folder2/sales
/data/folder1/folder2/marketing/[table1, table2, table3 ... tablen] /data/folder1/folder2/marketing
/data/folder1/folder2/hr/[table1, table2, table3 ... tablen] /data/folder1/folder2/hr