Spark actions that produce Atlas entities

Spark jobs create Spark application and process entities and create, update, or delete the data assets affected by those operations will affect Atlas entities; operations that only affect data do not show up in Atlas.

The following table lists the Spark actions that produce or update metadata in Atlas.

This Action in Spark... ...Produces metadata for these Atlas entities

CREATE TABLE USING
CREATE TABLE AS SELECT,
CREATE TABLE USING ... AS SELECT

spark_application, spark_column_lineage, spark_process, hive_table, hive_column, hive_storagedesc
CREATE VIEW AS SELECT, spark_application, spark_process, hive_table, hive_column, hive_storagedesc

INSERT INTO (SELECT),
LOAD DATA [LOCAL] INPATH

spark_application, spark_process

Notable actions in Spark that do NOT produce process entities in Atlas, meaning that no lineage is produced for these operations:

  • LOAD DATA INPATH (when not coming from a local file source)
  • CREATE TABLE (hive_table metadata produced by HMS)
  • ALTER VIEW (hive_table metadata produced by HMS)
  • SELECT or other queries that don’t change table metadata