Impala entities created in Atlas

Each Impala entity in Atlas includes detailed metadata for Impala queries.

The following diagrams show a summary of the entities created in Atlas for Impala operations. The supertypes that contribute attributes to the entity types are shaded.

Figure 1. Atlas Entity Types for Impala Operations


The metadata collected for each entity type is as follows:

Impala Process

Identifier Example content
typeName impala_process
guid System generated ID. This value is used to identify the entity in the Atlas Dashboard URL.
qualifiedName

<database>.<target table>@<clustername>:<generated ID>

The generated ID is distinct from the GUID.

name Text of the query.
inputs List of the input tables or views, including each entity’s type name and the qualified name.
outputs List of the output objects, including each entity’s type name and the qualified name.
recentQueries Last query executed (duplicated in process_execution).
operationType One of the operations that triggers metadata collection.
queryPlan Reserved for future use.
startTimeMost recent query start time.
endTimeMost recent query end time.
Relationship: Process Execution One process to one or more process executions. impala_process_process_execution
Relationship: Column Lineage One process to one or more column lineages. impala_process_column_lineage

Impala Process Execution

Identifier Example Content
typeName impala_process_execution
guid System generated ID. This value is used to identify the entity in the Atlas Dashboard URL.
qualifiedName <database>.<target table>@<clustername>:<ID from process qualified name>:<ID from the process execution name>:<generated ID for this process execution>
name Text of the query with a system-generated ID added to the end.
queryText Text of the query.
queryPlan Reserved for future use.
queryId impala_<date as yyyymmddhhmmss>_<generated id>
startTime Query start time.
endTime Query end time.
userName The user who ran the query.
Relationship: Process One process to one or more process executions. impala_process_process_execution

Impala Column Lineage

Identifier Example Content
typeName impala_column_lineage
dependencyType The type of relationship between the input and output columns; one of SIMPLE, EXPRESSION, or SCRIPT.
name <database>.<table>@<clustername>:<generated ID>:<output_column>
inputs

List of 0 or more hive_column entities that contributed to the output columns. This is a legacy model component: the more current model uses a relationship attribute.

outputs This is a legacy model component: the more current model uses a relationship attribute.
qualifiedName Same as name.
query Name of the impala_process entity that produced this lineage. This is a legacy model component: the more current model uses a relationship attribute.
Relationship: Process Name of the impala_process entity that produced this lineage. impala_process_column_lineage
Relationship: inputToProcesses List of 0 or more hive_column entities that contributed to the output columns.
Relationship: outputFromProcesses List of 0 or more hive_column entities that were produced in the process.