Migrating Spark to CDP Private Cloud
Migrating Spark workloads to Cloudera
Spark 1.6 to Spark 2.4 Refactoring
Handling prerequisites
Spark 1.6 to Spark 2.4 changes
New Spark entry point SparkSession
Dataframe API registerTempTable deprecated
union replaces unionAll
Empty schema not supported
Referencing a corrupt JSON/CSV record
Dataset and DataFrame API explode deprecated
CSV header and schema match
Table properties support
CREATE OR REPLACE VIEW and ALTER VIEW not supported
Managed table location
Write to Hive bucketed tables
Rounding in arithmetic operations
Precedence of set operations
HAVING without GROUP BY
CSV bad record handling
Spark 1.4 - 2.3 CSV example
Configuring storage locations
Querying Hive managed tables from Spark
Compiling and running Spark workloads
Compiling and running a Java-based job
Compiling and running a Scala-based job
Running a Python-based job
Running a job interactively
Post-migration tasks
Spark 2.3 to Spark 2.4 Refactoring
Handling prerequisites
Spark 2.3 to Spark 2.4 changes
Empty schema not supported
CSV header and schema match
Table properties support
Managed table location
Precedence of set operations
HAVING without GROUP BY
CSV bad record handling
Spark 1.4 - 2.3 CSV example
Configuring storage locations
Querying Hive managed tables from Spark
Compiling and running Spark workloads
Post-migration tasks