Spark entities created in Apache Atlas
Each Spark entity in Atlas includes detailed metadata collected from Spark.
The following diagrams show a summary of the entities created in Atlas for Spark operations. The data assets that Spark operations act upon are collected through HMS. The supertypes that contribute attributes to the entity types are shaded.
The metadata collected for each entity type is as follows:
Spark Process
Identifier | Example content |
---|---|
typeName | spark_process |
guid | System generated ID. This value is used to identify the entity in the Atlas Dashboard URL. |
qualifiedName |
The generated ID is distinct from the GUID. |
name | process_<generated ID> |
description | Metadata from Spark. |
owner | Metadata from Spark. |
ownerType | Metadata from Spark. |
inputs | List of the input tables or views, including each entity’s type name and the qualified name. |
outputs | List of the output objects, including each entity’s type name and the qualified name. |
executionId | Metadata from Spark. |
currUser | Metadata from Spark. In a Kerberized environment, this value contains the principal name. |
remoteUser | Metadata from Spark. In a Kerberized environment, this value contains the principal name. |
executionTime | Metadata from Spark. |
details | Query plan text, including parsed logical plan, analyzed logical plan, optimized logical plan, and physical plan. |
sparkPlanDescription | Physical plan text. |