Spark actions that produce Atlas entities
Spark jobs create Spark application and process entities and create, update, or delete the data assets affected by those operations will affect Atlas entities; operations that only affect data do not show up in Atlas.
The following table lists the Spark actions that produce or update metadata in Atlas.
This Action in Spark... | ...Produces metadata for these Atlas entities |
---|---|
CREATE TABLE USING |
spark_application, spark_column_lineage, spark_process, hive_table, hive_column, hive_storagedesc |
CREATE VIEW AS SELECT, | spark_application, spark_process, hive_table, hive_column, hive_storagedesc |
INSERT INTO (SELECT), |
spark_application, spark_process |
Notable actions in Spark that do NOT produce process entities in Atlas, meaning that no lineage is produced for these operations:
- LOAD DATA INPATH (when not coming from a local file source)
- CREATE TABLE (hive_table metadata produced by HMS)
- ALTER VIEW (hive_table metadata produced by HMS)
- SELECT or other queries that don’t change table metadata