Spark integration known issues and limitations
Here are the limitations that you should consider while integrating Kudu and Spark.
-
Spark 2.2 (and higher) requires Java 8 at runtime even though Kudu Spark 2.x integration is Java 7 compatible. Spark 2.2 is the default dependency version as of Kudu 1.5.0.
-
Kudu tables with a name containing upper case or non-ASCII characters must be assigned an alternate name when registered as a temporary table.
-
Kudu tables with a column name containing upper case or non-ASCII characters must not be used with SparkSQL. Columns can be renamed in Kudu to work around this issue.
-
<>
andOR
predicates are not pushed to Kudu, and instead will be evaluated by the Spark task. OnlyLIKE
predicates with a suffix wildcard are pushed to Kudu. This meansLIKE "FOO%"
will be pushed, butLIKE "FOO%BAR"
won't. -
Kudu does not support all the types supported by Spark SQL. For example,
Date
and complex types are not supported. -
Kudu tables can only be registered as temporary tables in SparkSQL.
-
Kudu tables cannot be queried using HiveContext.