Impala metadata collection

Atlas can collect metadata for queries from Impala. It collects metadata for affected data assets from Hive Metastore (HMS).

An Atlas hook runs in each Impalad instance. This hook sends metadata to Atlas for Impala operations, which are represented by process and process execution entities in Atlas.

In addition, an Atlas hook runs in Hive Metastore (HMS). Before sending metadata to Atlas, Impala synchronizes its metadata with HMS. This synchronization makes sure that Impala uses the same names and IDs as HMS. It also ensures that the metadata for data assets that Atlas collects from HMS already exist in Atlas by the time Impala sends metadata for operations.



  1. When an action occurs in the Impala instance...
  2. It updates HMS with information about the assets affected by the action.
  3. The Atlas hook corresponding to HMS collects information for the changed and new assets and forms it into metadata entities. It publishes the metadata to a Kafka topic.
  4. The Atlas hook corresponding to the Impala instance collects information for the action and forms it into metadata entities. It publishes the metadata to a Kafka topic.
  5. Atlas reads the messages from the topic and determines what information will create new entities and what information updates existing entities.
  6. Atlas creates the appropriate entities and determines lineage from existing entities to the new entities.