Coding in Spark for automatic metadata management
The Spark API, which saves data to a specified location, does not generate events in the Hive metastore so it is not supported by automatic metadata management.
By: Manish Maheshwari, Data Architect and Data Scientist at Cloudera, Inc.
For example here is an example of a Spark Scala API code, which does not generate events in the Hive metastore:
Seq((1, 2)).toDF("i",
"j").write.save("/user/hive/warehouse/spark_etl.db/customers/date=01012019")
Instead, use the below Spark SQL code to ensure that Hive metastore events are generated and sent to the metastore:
Spark.sql(" INSERT OVERWRITE TABLE xxx PARTITION (date = , …) as select * from spark_dataframe“ )