Unsupported Apache Spark Features
The following Apache Spark features are not supported in Cloudera Data Platform.
- Apache Spark experimental features/APIs are not supported unless stated otherwise.
- Spark Streaming (DStreams) reading from Kafka topics containing transactions such as idempotent producer being used to publish records.
- Using the JDBC Datasource API to access Hive or Impala.
- Spark with Kudu is not supported for ADLS data.
- IPython / Jupyter notebooks is not supported. The IPython notebook system (renamed to Jupyter as of IPython 4.0) is not supported.
- Certain Spark Streaming features, such as the
mapWithState
method, are not supported. - Thrift JDBC/ODBC server (also known as Spark Thrift Server or STS)
- Spark SQL CLI
- GraphX
- SparkR
- GraphFrames
- Structured Streaming is supported, but the following features of it are not:
- Continuous processing, which is still experimental, is not supported.
- Stream static joins with HBase have not been tested and therefore are not supported.
- Hudi
- Push-based shuffle
- ZSTD compression in ORC data source (SPARK-33978)
spark.hadoopRDD.ignoreEmptySplits (SPARK-34809)
- LDAP authentication for
livy-server
(LIVY-356) - Thrift ldap authentication, based on ldapurl, basedn, domain (LIVY-678)