Preventing SparkSQL incompatibility
You need to be aware of two SparkSQL incompatibilities and how to work around these problems. The upgrade process converted all CDH Hive tables to external tables, however, if you moved managed, non-ACID tables preventing conversion to external tables, these are not compatible with native SparkSQL. Also, you might encounter a problem reading Hive 2 external ORC tables from Spark.
Managed, non-ACID table problem
- Convert ACID tables to external tables after the CDP upgrade.
- Use the Hive Warehouse Connector.
Run the SHOW CREATE TABLE statement on the original table to get the full definition of the table.
SHOW CREATE TABLE <tablename>;
Rename the managed table to
Migrate data from *_old to <new> external table using the original name in
the historical, or the default, location
CREATE EXTERNAL TABLE new_t AS SELECT * FROM old_t;
Reading a Hive external table in ORC from Spark
- You created the table using Hive CTAS (create table as select).
- One of more selected tables included UNION ALL.
When creating a table under these circumstances, subdirectories are named /1 /2 /3. Subdirectories do not include the HIVE_UNION_SUBDIR_ prefix as Hive 3-created tables do. You cannot read these tables from Spark if the tables are in ORC format.
The following workaround configures Spark to overcome this problem.
If you already started the Spark shell, quit the shell.
You cannot perform workaround configuration for the session if the session is already started.
Start the Spark shell with
spark-shell ... --conf spark.sql.hive.convertMetastoreOrc=false