Apache Kudu OverviewPDF version

Spark integration limitations

Here are the limitations that you should consider while integrating Kudu and Spark.

  • Spark 2.2 (and higher) requires Java 8 at runtime even though Kudu Spark 2.x integration is Java 7 compatible. Spark 2.2 is the default dependency version as of Kudu 1.5.0.

  • Kudu tables with a name containing upper case or non-ASCII characters must be assigned an alternate name when registered as a temporary table.

  • Kudu tables with a column name containing upper case or non-ASCII characters must not be used with SparkSQL. Columns can be renamed in Kudu to work around this issue.

  • <> and ORpredicates are not pushed to Kudu, and instead will be evaluated by the Spark task. Only LIKE predicates with a suffix wildcard are pushed to Kudu. This means LIKE "FOO%" will be pushed, but LIKE "FOO%BAR" won't.

  • Kudu does not support all the types supported by Spark SQL. For example, Date and complex types are not supported.

  • Kudu tables can only be registered as temporary tables in SparkSQL.

  • Kudu tables cannot be queried using HiveContext.