Accessing Hive from Spark
The host from which the Spark application is submitted or on which
spark-shell
or pyspark
runs must have
a Hive gateway role defined in Cloudera Manager and client configurations
deployed.
When a Spark job accesses a Hive view, Spark must have privileges to
read the data files in the underlying Hive tables. Currently, Spark cannot
use fine-grained privileges based on the columns or the
WHERE
clause in the view definition. If Spark does not
have the required privileges on the underlying data files, a SparkSQL
query against the view returns an empty result set, rather than an error.