Replicating data to Impala clusters
Impala metadata replication is performed as a part of Hive replication. Impala replication is only supported between two CDH clusters. The Impala and Hive services must be running on both clusters.
Replicating Impala metadata
To enable Impala metadata replication, you must set the Replicate Impala Metadata option is set to Yes in the tab.
When you set the Replicate Impala Metadata option to Yes, the Hive replication policy replicates the the Impala UDFs (user-defined functions). As part of replicating the UDFs, the binaries in which they are defined are also replicated.
Invalidating Impala metadata
For Impala clusters that do not use LDAP authentication, you can
configure Hive/Impala replication jobs to automatically invalidate
Impala metadata after replication completes. If the clusters use Sentry,
the Impala user should have permissions to run
The configuration causes the Hive/Impala replication job to run the
INVALIDATE METADATA statement per table on the
destination cluster after completing the replication. The statement
purges the metadata of the replicated tables and views within the
destination cluster's Impala upon completion of replication, allowing
other Impala clients at the destination to query these tables
successfully with accurate results. However, this operation is
potentially unsafe if DDL operations are being performed on any of the
replicated tables or views while the replication is running. In general,
directly modifying replicated data/metadata on the destination is not
recommended. Ignoring this can lead to unexpected or incorrect behavior
of applications and queries using these tables or views.
To invalidate Impala metadata, you must select the Invalidate Impala Metadata on Destination option on the tab.
Alternatively, you can run the
INVALIDATE METADATA statement manually for