You set up the HDP cluster after replicating one or more databases before you can
verify replication. Set up requires stopping jobs.
Of course, before you can verify a replication, you must have completed one.
The first cycle in replicating data is called the bootstrap, and the following cycles
are called incremental replications. For each database, a completed replication consists
of one bootstrap and at least one incremental replication cycle.
Run at least one incremental replication for a databases before attempting
verification.
In the CDP cluster, find the dump directory path using the following query:
select * from sys.replication_metrics
where policy_name=‘<policy name>’
order by scheduled_execution_id desc limit 1;
Find and copy the external table paths listed in the CDP dump directory path in
_file_list_external file. You will use these paths to set up Ranger
policies in Ambari.
On the HDP source cluster, stop all ETL jobs.
In Ambari > Ranger Admin > Service Manager > Hive policies, add a Deny policy (no writes) for all users including ‘hive’ on all databases:
Database *, Table *, Hive column *
You need only one policy to deny any writes to managed tables or any access to any
external tables
In Ambari > Ranger Admin > Service Manager > HDFS policies, add a Ranger Deny policy for all external table paths.
In Resource Path, paste the external table paths you copied from in the CDP dump directory
path in the _file_list_external file.
You can add single or multiple policies for all the external table paths in all the
databases.
For example:
Disable the StatsUpdaterThread background thread by configuring the
hive.metastore.stats.auto.analyze property to none.
Disable the PartitionManagementTask background thread by configuring the metastore.partition.management.database.pattern property to ^*.