Managing Hive ACID table replication policies

After you create a replication policy, you can run the replication job, disable or delete the job, edit the policy configuration, or view the replication job history in Cloudera Manager.

  1. Go to the Cloudera Manager > Replication > Replication Policies page.
    The following replication policy details are displayed on the page:
    Columns Description
    ID Automatically generated replication policy ID.
    Name Name of the replication policy.
    Type Displays Hive ACID for Hive ACID replication policies.
    Source Source cluster used in the replication policy.
    Destination Target cluster used in the replication policy.
    Progress Displays a spinner when the replication policy job is running.
    Completed Timestamp when the replication job is submitted to the Hive service.
    Next Run* Displays the Managed by Hive message. Hover over the message to see more information about the next scheduled run.
    Message Displays the status of the replication job.

    The following job status of the replication job run are displayed depending on the replication job status:

    • Waiting for Update – The replication policy creation is complete and is waiting for the job status confirmation by the Hive service.
    • Running – The replication job is in progress.
    • Failed –The replication policy job has failed.
    • Skipped – The replication job was skipped.
    • Success – The replication job completed successfully.
    *When you schedule and submit a Hive ACID replication policy, the Next Run field displays the None scheduled message on the Replication Policies page. When the next run is scheduled, the date and time are not displayed. You can ignore the None scheduled message as the replication job runs on Hive as scheduled or as per the schedule clause. The schedules are managed by Hive. Cloudera Manager does not run any scheduled runs.
  2. Select the required replication policy.
  3. Click Actions to view the following action items:
    1. Show History opens the Replication History page where you can view the replication policy job history.
      On this page, you can view the replication policy name, the replication policy type, the chosen source and destination clusters for the policy, and the next scheduled run. Expand each job run to view the Started At timestamp, Duration of the job run, and Status of the job, including any error messages.

      The page also displays the following details for each replication policy job:

      • Start Time – Date and time the replication policy job started.
      • Duration – The time taken to complete the job.
      • Outcome – The current job status.
      • Origin – The origin of the collected Hive metrics. Click SOURCE to view DUMP operation details or TARGET to view LOAD operation details.
      • Tables – The number of successfully replicated tables compared to the number of tables to be replicated.
      • Functions – An incremented number of functions processed during DUMP and LOAD operations.
      • Events – Number of processed events incremented for every dumped loaded event. The counts for dump and load operation might not match because they are distinct operations.
    2. Edit Configuration allows you edit the schedule of the replication policy.
    3. Run Now runs the replication job.
    4. Disable disables the selected replication job.
    5. Delete deletes the selected replication job.

The following example illustrates how Replication Manager determines the Tables count for a Hive ACID replication policy:

  1. You create a database that contains one managed table, two external tables, three virtual views, and four materialized views:
    create database sample3;
    use sample3;
    
    CREATE TABLE table_04 (id int, name string);
    
    CREATE EXTERNAL TABLE table_02 (id int, name string);
    CREATE EXTERNAL TABLE table_03 (id int, name string);
    
    -- Create virtual views (these cause the mismatch)
    
    CREATE VIEW view_01 AS SELECT * FROM table_04;
    CREATE VIEW view_02 AS SELECT * FROM table_04;
    CREATE VIEW view_03 AS SELECT * FROM table_04;
    
    CREATE materialized VIEW mv1 AS SELECT * FROM table_04;
    CREATE materialized VIEW mv2 AS SELECT * FROM table_04;
    CREATE materialized VIEW mv3 AS SELECT * FROM table_04;
    CREATE materialized VIEW mv4 AS SELECT * FROM table_04;
  2. You create and run a Hive ACID replication policy on the sample3 database.
  3. After the replication policy run is complete, the Tables column on the Replication Policies page displays the following statistics for the replication policy:
    • The SOURCE displays the DUMP operation details the following ways:
      • The numerator is the sum total of all the managed tables, external tables, and virtual views. In this example, the numerator is 1 + 2 + 3 = 6.

        This statistic does not include the materialized view because the hive.repl.dump.include.materialized.views advanced configuration snippet is false by default.

      • The denominator is the sum total of all the managed tables, external tables, virtual views, and materialized views. In this example, the denominator is 1 + 2 + 3 + 4 = 10.
    • The TARGET displays the LOAD operation details the following ways:
      • The numerator is the number of Hive ACID tables (managed tables). In this example, the numerator is 1.
      • The denominator is the number of subdirectories created in the staging location. Subdirectories are created for Hive ACID tables (managed tables) and virtual views. Therefore, in this example, the denominator is 1 + 3 = 4.