Non-support of replication of Hive-Managed tables written by Spark applications.
DLM Hive replication for Managed tables relies on replication events being published by Hive in Hive Metastore for every change that is made by Hive.
In case of External table replication, DLM replication does not rely on events being published and checks every table/partition directory for any new file that might have been added.
Important | |
---|---|
Applications other than Hive do not always publish events for new
data file addition to Managed tables. The list of such applications includes Spark.
This can result in data loss if these applications write to a Managed table in HDP
2.6.5. External tables should be used for data written by such applications. While
replication for External table has some overheads, it will capture files that have
been added without any event generation as well. |
Note | |
---|---|
With Spark, the use of hive.metastore.dml.events is not supported
in HDP. Spark should be treated as an application that does not reliably publish
events for the changes. |