Fixed Issues in Spark

Review the list of Spark issues that are resolved in Cloudera Runtime 7.1.4.

CDPD-12541: If an insert statement specifies partition statically and dynamically, there is a chance for data loss. To prevent data loss, Spark service displays an exception error.
This issue is now resolved. Spark allows you to specify a mix of static and dynamic partitions in an insert statement.
CDPD-14203: Introduced the spark.executor.rpc.bindToAll property to support multihoming when the new configuration is set to true bind to 0.0.0.0. The configuration defaults to false.
This issue is now resolved.
CDPD-10532: Update log4j to address CVE-2019-17571
Replaced log4j with an internal version to fix CVE-2019-17571.
CDPD-10515: Incorrect version of jackson-mapper-asl
Use an internal version of jackson-mapper-asl to address CVE-2017-7525.
CDPD-7882: If an insert statement specifies partitions both statically and dynamically, there is a potential for data loss
To prevent data loss, this fix throws an exception if partitions are specified both statically and dynamically. You can follow the workarounds provided in the error message.
CDPD-15773: Spark/Hive interaction causes deadlock.
In the previous versions, applications that share a Spark Session across multiple threads experienced a deadlock accessing the HMS. This issue is now resolved.
CDPD-14906: Spark reads or writes TIMESTAMP data for values before the start of the Gregorian calendar. This happens when Spark is:
  • Using dynamic partition inserts.
  • Reading or writing from an ORC table when spark.sql.hive.convertMetastoreOrc=false (the default is true).
  • Reading or writing from an Orc table when spark.sql.hive.convertMetastoreOrc=true but spark.sql.orc.impl=hive (the default is native).
  • Reading or writing from a Parquet table when spark.sql.hive.convertMetastoreParquet=false (the default is true).
This issue is now resolved.