Unsupported Apache Spark Features

The following Apache Spark features are not supported in Cloudera Data Platform.

  • Apache Spark experimental features/APIs are not supported unless stated otherwise.
  • Spark Streaming (DStreams) reading from Kafka topics containing transactions such as idempotent producer being used to publish records.
  • Using the JDBC Datasource API to access Hive or Impala.
  • Spark with Kudu is not supported for ADLS data.
  • IPython / Jupyter notebooks is not supported. The IPython notebook system (renamed to Jupyter as of IPython 4.0) is not supported.
  • Certain Spark Streaming features, such as the mapWithState method, are not supported.
  • Thrift JDBC/ODBC server (also known as Spark Thrift Server or STS)
  • Spark SQL CLI
  • GraphX
  • SparkR
  • GraphFrames
  • Structured Streaming is supported, but the following features of it are not:
    • Continuous processing, which is still experimental, is not supported.
    • Stream static joins with HBase have not been tested and therefore are not supported.
  • Hudi
  • Push-based shuffle
  • ZSTD compression in ORC data source (SPARK-33978)
  • spark.hadoopRDD.ignoreEmptySplits (SPARK-34809)
  • LDAP authentication for livy-server (LIVY-356)
  • Thrift ldap authentication, based on ldapurl, basedn, domain (LIVY-678)