Known issues with Apache Spark.
Scala sessions can fail if dependencies take longer than 15 minutes
If the dependencies in spark-defaults.conf (spark.jars, spark.packages, etc) take longer than 15 minutes to resolve, then scala sessions will fail the first time.
- Restart the session.
- Mount the Spark dependency directory from the CDSW host machines.
Spark UI does not work on HDP and CDP
The Spark UI in CDSW does not work on HDP and CDP clusters.
On TLS-enabled CDSW deployments, the embedded Spark UI does not work
If you have a TLS-enabled CDSW deployment, the embedded Spark UI tab does not render as expected.
Workaround: To work around this issue, launch the
Spark UI in a separate tab and append '/jobs' after the URL. For example, if your engineID is tb0z9ydiua5q9v2d and the DOMAIN is example.com
then view the Spark UI at:
Alternative workaround: To view running Spark jobs, navigate to
- CDH 5: CDS 2.4 release 2 (and lower)
- CDH 6: Versions of Spark that ship with CDH 6.0.x, CDH 6.1.x, CDH 6.2.1 (and lower), CDH 6.3.2 (and lower)
- CDH version 6.4.0, 6.2.2, 6.3.3 or higher
- CDH 5 with Spark 2.4 release 3
Spark lineage collection is not supported with Cloudera Data Science Workbench
Lineage collection is enabled by default in Spark 2.3. This feature does not work with Cloudera Data Science Workbench because the lineage log directory is not automatically mounted into CDSW engines when a session/job is started.
Affected Versions: CDS 2.3 release 2 (and higher) Powered By Apache Spark
With Spark 2.3 release 3 (or higher), if Spark cannot find the lineage log directory, it will automatically disable lineage collection for that application. Spark jobs will continue to run in Cloudera Data Science Workbench, but lineage information will not be collected.
With Spark 2.3 release 2, Spark jobs will fail in Cloudera Data Science Workbench. Either upgrade to Spark 2.3 release 3 which includes a partial fix (as described above) or use one of the following workarounds to disable Spark lineage:
Workaround 1: Disable Spark Lineage Per-Project in Cloudera Data Science Workbench
To do this, set
false in a
spark-defaults.conf file in your Cloudera Data Science Workbench project.
This will need to be done individually for each project as required.
Workaround 2: Disable Spark Lineage for the Cluster
- Log in to Cloudera Manager and go to the Spark 2 service.
- Click Configuration.
- Search for the Enable Lineage Collection property and uncheck the checkbox to disable lineage collection.
- Click Save Changes.
- Go back to the Cloudera Manager homepage and restart the CDSW service for this change to go into effect.
Cloudera Bug: DSE-3720, CDH-67643