Spark actions that produce Atlas entities
Spark jobs create Spark application and process entities and create, update, or delete the data assets affected by those operations will affect Atlas entities; operations that only affect data do not show up in Atlas.
The following table lists the Spark actions that produce or update metadata in Atlas.
|This Action in Spark...||...Produces metadata for these Atlas entities|
CREATE TABLE USING
|spark_application, spark_column_lineage, spark_process, hive_table, hive_column, hive_storagedesc|
|CREATE VIEW AS SELECT,||spark_application, spark_process, hive_table, hive_column, hive_storagedesc|
INSERT INTO (SELECT),
Notable actions in Spark that do NOT produce process entities in Atlas, meaning that no lineage is produced for these operations:
- LOAD DATA INPATH (when not coming from a local file source)
- CREATE TABLE (hive_table metadata produced by HMS)
- ALTER VIEW (hive_table metadata produced by HMS)
- SELECT or other queries that don’t change table metadata