Impala metadata collection
Atlas can collect metadata for queries from Impala. It collects metadata for affected data assets from Hive Metastore (HMS).
An Atlas hook runs in each
Impalad instance. This hook sends metadata to Atlas for Impala
operations, which are represented by process and process execution
entities in Atlas.
In addition, an Atlas hook runs in Hive Metastore (HMS). Before sending metadata to Atlas, Impala synchronizes its metadata with HMS. This synchronization makes sure that Impala uses the same names and IDs as HMS.
- When an action occurs in the Impala instance...
- Impala updates HMS with information about the assets affected by the action.
- The Atlas hook for HMS collects information for the changed and new assets and forms it into metadata entities. It publishes the metadata to a Kafka topic.
- The Atlas hook for the Impala instance collects information for the action and forms it into metadata entities. It publishes the metadata to a Kafka topic.
- Atlas reads the messages from the topic and determines what information will create new entities and what information updates existing entities. Atlas is able to determine the correct entities regardless of the order in which Atlas receives messages from the Kafka topic.
- Atlas creates the appropriate entities and determines lineage from existing entities to the new entities.