Synchronization between Impala Clusters

Describes the event that notifies other Impala clusters about the changes of LOAD data statement.

When a LOAD statement is run from an Impala in a cluster, an INSERT event is generated. This INSERT event generated notifies other Impala clusters or any other systems that consume HMS events (e.g. Hive Replication) about the changes of LOAD DATA. The other Impala clusters will refresh the table or partitions based on the event. This allows the metadata to be always consistent locally and remotely. However, there can be delays till the event is processed but eventually, it will be consistent. Also, an auto-refresh on large tables takes time. If you want the metadata to be synced up immediately, manual REFRESH/INVALIDATE is a better choice and has a better guarantee.

When the LOAD DATA statement operates on a partitioned table, it always operates on one partition at a time. So for unpartitioned tables, the granularity of the refresh is the whole table. For partitioned tables, only the partition will be reloaded.