Impala entities created in Atlas
Each Impala entity in Atlas includes detailed metadata for Impala queries.
The following diagrams show a summary of the entities created in Atlas for Impala operations. The supertypes that contribute attributes to the entity types are shaded.
The metadata collected for each entity type is as follows:
Impala Process
Identifier | Example content |
---|---|
typeName | impala_process |
guid | System generated ID. This value is used to identify the entity in the Atlas Dashboard URL. |
qualifiedName |
The generated ID is distinct from the GUID. |
name | Text of the query. |
inputs | List of the input tables or views, including each entity’s type name and the qualified name. |
outputs | List of the output objects, including each entity’s type name and the qualified name. |
recentQueries | Last query executed (duplicated in
process_execution ). |
operationType | One of the operations that triggers metadata collection. |
queryPlan | Reserved for future use. |
startTime | Most recent query start time. |
endTime | Most recent query end time. |
Relationship: Process Execution | One process to one or more process executions.
impala_process_process_execution |
Relationship: Column Lineage | One process to one or more column lineages.
impala_process_column_lineage |
Impala Process Execution
Identifier | Example Content |
---|---|
typeName | impala_process_execution |
guid | System generated ID. This value is used to identify the entity in the Atlas Dashboard URL. |
qualifiedName | <database>.<target
table>@<clustername>:<ID from process qualified
name>:<ID from the process execution name>:<generated
ID for this process execution> |
name | Text of the query with a system-generated ID added to the end. |
queryText | Text of the query. |
queryPlan | Reserved for future use. |
queryId | impala_<date as
yyyymmddhhmmss>_<generated id> |
startTime | Query start time. |
endTime | Query end time. |
userName | The user who ran the query. |
Relationship: Process | One process to one or more process executions.
impala_process_process_execution |
Impala Column Lineage
Identifier | Example Content |
---|---|
typeName | impala_column_lineage |
dependencyType | The type of relationship between the input and output columns; one of SIMPLE, EXPRESSION, or SCRIPT. |
name | <database>.<table>@<clustername>:<generated
ID>:<output_column> |
inputs |
List of 0 or more |
outputs | This is a legacy model component: the more current model uses a relationship attribute. |
qualifiedName | Same as name. |
query | Name of the impala_process entity that
produced this lineage. This is a legacy model component: the
more current model uses a relationship attribute. |
Relationship: Process | Name of the impala_process entity that
produced this lineage.
impala_process_column_lineage |
Relationship: inputToProcesses | List of 0 or more hive_column entities
that contributed to the output columns. |
Relationship: outputFromProcesses | List of 0 or more hive_column entities
that were produced in the process. |