Impala entities created in Atlas
Each Impala entity in Atlas includes detailed metadata for Impala queries.
The following diagrams show a summary of the entities created in Atlas for Impala operations. The supertypes that contribute attributes to the entity types are shaded.
![](../images/atlas-model-impala.png)
The metadata collected for each entity type is as follows:
Impala Process
Identifier | Example content |
---|---|
typeName | impala_process |
guid | System generated ID. This value is used to identify the entity in the Atlas Dashboard URL. |
qualifiedName |
The generated ID is distinct from the GUID. |
name | Text of the query. |
inputs | List of the input tables or views, including each entity’s type name and the qualified name. |
outputs | List of the output objects, including each entity’s type name and the qualified name. |
recentQueries | Last query executed (duplicated in
process_execution ). |
operationType | One of the operations that triggers metadata collection. |
queryPlan | Reserved for future use. |
startTime | Most recent query start time. |
endTime | Most recent query end time. |
Relationship: Process Execution | One process to one or more process executions.
impala_process_process_execution |
Relationship: Column Lineage | One process to one or more column lineages.
impala_process_column_lineage |
Impala Process Execution
Identifier | Example Content |
---|---|
typeName | impala_process_execution |
guid | System generated ID. This value is used to identify the entity in the Atlas Dashboard URL. |
qualifiedName | <database>.<target
table>@<clustername>:<ID from process qualified
name>:<ID from the process execution name>:<generated
ID for this process execution> |
name | Text of the query with a system-generated ID added to the end. |
queryText | Text of the query. |
queryPlan | Reserved for future use. |
queryId | impala_<date as
yyyymmddhhmmss>_<generated id> |
startTime | Query start time. |
endTime | Query end time. |
userName | The user who ran the query. |
Relationship: Process | One process to one or more process executions.
impala_process_process_execution |
Impala Column Lineage
Identifier | Example Content |
---|---|
typeName | impala_column_lineage |
dependencyType | The type of relationship between the input and output columns; one of SIMPLE, EXPRESSION, or SCRIPT. |
name | <database>.<table>@<clustername>:<generated
ID>:<output_column> |
inputs |
List of 0 or more |
outputs | This is a legacy model component: the more current model uses a relationship attribute. |
qualifiedName | Same as name. |
query | Name of the impala_process entity that
produced this lineage. This is a legacy model component: the
more current model uses a relationship attribute. |
Relationship: Process | Name of the impala_process entity that
produced this lineage.
impala_process_column_lineage |
Relationship: inputToProcesses | List of 0 or more hive_column entities
that contributed to the output columns. |
Relationship: outputFromProcesses | List of 0 or more hive_column entities
that were produced in the process. |