Spark entity metadata migration

Spark metadata entities are fully migrated from Navigator to Atlas.

The following sections describe how metadata is mapped from Navigator to Atlas; if Atlas requires metadata that wasn't available in Navigator, the migration notes describe how the Atlas metadata values are generated.

Migrated entities include:

For entity metadata that is common to all entities, see System metadata migration.

Spark Process

Navigator spark_operation entities are migrated to Atlas spark_process entities.
Navigator Metadata Atlas Metadata Migration Notes
inputs relationshipAttributes.inputs Points to the input directories as relationship attributes.
outputs relationshipAttributes.outputs Points to the output directory as a relationship attribute.
principal Not mapped.
attributes.currUser Defaults to null.
attributes.details Defaults to null.
attributes.executionId Defaults to null.
attributes.remoteUser Defaults to null.
attributes.sparkPlanDescription
attributes.qualifiedName Generated as a string in the format process_id@clustername. For example, application_1589303388872_0001@clustercolor.20324.

Spark Process Execution

Navigator spark_operation_execution entities are migrated to Atlas spark_process_execution entities.

Navigator Metadata Atlas Metadata Migration Notes
inputs relationshipAttributes.inputs Points to the input files as relationship attributes.
outputs relationshipAttributes.outputs Points to the output files as relationship attributes.
ended attributes.endTime
operation attributes.process Points to the parent spark_process entity as a relationship attribute.
originalName attributes.queryText
principal attributes.userName
started attributes.startTime
attributes.hostName Defaults to null.
attributes.qualifiedName Generated as a string in the format process_id-exec.timestamp@clustername. For example, application_1589303388872_0001-exec.1589331800580@clustercolor.