Migration paths for Spark users
If you are on HDP and executing Spark workloads in Hive LLAP mode, and you want to upgrade to CDP, you can follow the migration path that matches your security needs. It is recommended that you upgrade to CDP Private Cloud Base and choose either Hive Warehouse Connector (HWC) or native Spark readers to query Hive from Spark.
As a replacement for the HWC LLAP execution mode in HDP, you can use the HWC Secure Access Mode in CDP Private Cloud Base that offers fine-grained access control (FGAC) column masking and row filtering to secure managed (ACID), or even external, Hive table data that you read from Spark.
The following migration paths are recommended based on certain factors:
- Use HWC JDBC Cluster mode — If the user does not have access to data in the file system and if the database query returns are less than 1 GB of data.
- Use HWC Secure access mode — If the user does not have access to data in the file system and if the database query returns are more than 1 GB of data.
- Use HWC Direct reader mode — If the user has access to data in the file system and if you are querying Hive managed tables.
- Use native Spark reader — If the user has access to data in the file system and if you are querying Hive external tables.
The following diagram shows these recommended paths: