Behavioral Changes in Spark

Behavioral changes denote a marked change in behavior from the previously released version to this version of Spark.

Cloudera Runtime 7.3.1.706 SP3 CHF 2

There are no behavioral changes in this release.

Cloudera Runtime 7.3.1.600 SP3 CHF 1

Summary:
Added the spark_shuffle_aes_enabled configuration property
Previous behavior:
None.
New behavior:
The spark_shuffle_aes_enabled new configuration property is now available to enable AES for the Spark shuffle service.
Summary:
Added the spark3.network.crypto.enabled configuration property
Previous behavior:
None.
New behavior:
The spark3.network.crypto.enabled new configuration property is now available to enable AES-based encryption.
Summary:
Changes in the generation of the spark.yarn.historyServer.address value
Previous behavior:
Configuration files displayed Spark History Server URL as HTTP.
New behavior:
The Spark CSD generates the configuration with the HTTPS address when SSL/TLS is enabled.

Cloudera Runtime 7.3.1.500 SP3

Summary:
Added super-interface java.nio.file.PathMatcher to the IOFileFilter API.
Previous behavior:
None.
New behavior:
Unless updated to use the new super-interface, recompilation of a client program may be terminated with the following message: A client class C is not abstract and does not override abstract method in java.nio.file.PathMatcher.
Summary:
Added java.io.IOException exception thrown to the CountingInputStream class in the CountingInputStream.afterRead API.
Previous behavior:
None.
New behavior:
Unless updated to handle the new exception, recompilation of a client program may be terminated with the following message: Unreported exception java.io.IOException must be caught or declared to be thrown.
Summary:
Added java.io.FileNotFoundException exception thrown to the FileUtils.isFileOlder class in the FileUtils API.
Previous behavior:
None.
New behavior:
Unless updated to handle the new exception, recompilation of a client program may be terminated with the following message: Unreported exception java.io.FileNotFoundException must be caught or declared to be thrown.
Summary:
Return value type has been changed in the StreamIterator class in the StreamIterator.iterator API.
Previous behavior:
Return value was java.util.Iterator<T>.
New behavior:
New return value is org.apache.commons.io.StreamIterator<T>
Summary:
Removed the java.io.EOFException exception thrown in the SwappedDataInputStream class in the SwappedDataInputStream.skipBytes API.
Previous behavior:
Application code could handle the java.io.EOFException exception thrown.
New behavior:
Unless removed from the application code, recompilation of a client program may be terminated with the following message: Cannot override <span class='iname_b'>skipBytes&#160;<span class='sym_pd'><span>(&#160;int lt;/span>&#160;)</span></span> in <b>org.apache.commons.io.input.SwappedDataInputStream</b>; overridden method does not throw java.io.EOFException.
Summary:
Return value type has been changed in the StreamIterator class in the StreamIterator.iterator API.
Previous behavior:
Return value was java.util.Iterator<T>
New behavior:
New return value is org.apache.commons.io.StreamIterator<T>.
The previous method has been removed because the return type is part of the method signature. A client program may be interrupted by a NoSuchMethodError exception.

Cloudera Runtime 7.3.1.400 SP2

There are no behavioral changes in this release.

Cloudera Runtime 7.3.1.300 SP1 CHF 1

There are no behavioral changes in this release.

Cloudera Runtime 7.3.1.200 SP1

Summary:
Rebase Spark3 to Apache Spark 3.5.4 in Cloudera Runtime.
Previous behavior:
Spark 3.4.1 was the default version in Cloudera Runtime.
New behavior:

Spark 3.5.4 is the default Spark version in Cloudera Runtime.

Cloudera Runtime 7.3.1.100 CHF 1

There are no behavioral changes in this release.

Cloudera Runtime 7.3.1

Summary:
Spark 2 has been removed from Cloudera Runtime.
Previous behavior:

Spark 2 was the default version in Cloudera Runtime, Spark 3 was available as an add-on parcel.

New behavior:

Spark 3 is the default Spark version in Cloudera Runtime. Spark 2 has been removed and no longer available in 7.3.1.0.

Summary:
Third-party JDBC drivers used with the Spark JDBC DataFrame data source can require explicit JAR distribution and driver class configuration.
Previous behavior:
Some deployments relied on JDBC driver JARs on a shared cluster or Spark classpath so jobs ran without explicitly passing the driver JAR or setting the JDBC driver class in Spark options.
New behavior:

In Spark 3.x, classpath handling for external JDBC drivers is stricter. Read and write paths that use spark.read.format("jdbc"), spark.write.format("jdbc"), or equivalent APIs can fail with java.sql.SQLException: No suitable driver when the JDBC driver is not on the classpath for the Spark driver and executors as it was before the upgrade. This issue appears often in YARN cluster mode.

Supply the JDBC driver JAR explicitly for the job (for example with --jars and a path on shared storage that all nodes can read). Set the JDBC driver class with the driver option (for example .option("driver", "fully.qualified.jdbc.Driver")). If required for your environment, set spark.driver.extraClassPath and spark.executor.extraClassPath.