CDP One Known Issues and Limitations

HMS Partition Discovery is disabled for both DL and DH

Issue:

HMS Partition Discovery is disabled for both data lakes and data hubs.

Workaround

Run the MSCK (metastore consistency check) Hive command every time you need to synchronize a partition with the file system. For more information see Partitioned tables.

Spark connection via SparklyR can only read external hive tables

Issue:

Spark connections via SparklyR can only read external hive tables (not managed ACID hive tables).

Workaround

To access Hive from Spark, you need to use the Hive Warehouse Connector (HWC) implicitly or explicitly.

Spark and Hive tables interoperate using the Hive Warehouse Connector and Spark Direct Reader to access ACID managed tables. The Hive Warehouse Connector is designed to access managed ACID v2 Hive tables from Spark. HWC is a Spark library/plugin that is launched with the Spark app.

You can access external tables from Spark directly using SparkSQL. You do not need HWC to read or write Hive external tables. Spark users just read from or write to Hive directly. You can read Hive external tables in ORC or Parquet formats. You can write Hive external tables in ORC format only.

Use the Spark Direct Reader and HWC for ETL jobs. For other jobs, consider using Apache Ranger and the HiveWarehouseConnector library to provide row and column, fine-grained access to the data.

HWC supports spark-submit and pyspark. The spark thrift server is not supported.

To deploy JAR files for Sqoop, contact Cloudera Support

Issue:

Errors may occur if you attempt to add JAR files to the Sqoop directory.

Workaround

Contact Cloudera support for help with deploying JAR files for Sqoop.