Migrating Spark to CDP Private Cloud
Migrating Spark workloads to CDP
Spark 1.6 to Spark 2.4 Refactoring
Handling prerequisites
Spark 1.6 to Spark 2.4 changes
New Spark entry point SparkSession
Dataframe API registerTempTable deprecated
union replaces unionAll
Empty schema not supported
Referencing a corrupt JSON/CSV record
Dataset and DataFrame API explode deprecated
CSV header and schema match
Table properties support
Managed table location
Write to Hive bucketed tables
Rounding in arithmetic operations
Precedence of set operations
HAVING without GROUP BY
CSV bad record handling
Spark 2.4 CSV example
Configuring storage locations
Querying Hive managed tables from Spark
Compiling and running Spark workloads
Compiling and running a Java-based job
Compiling and running a Scala-based job
Running a Python-based job
Running a job interactively
Post-migration tasks
Spark 2.3 to Spark 2.4 Refactoring
Handling prerequisites
Spark 2.3 to Spark 2.4 changes
Empty schema not supported
CSV header and schema match
Table properties support
Managed table location
Precedence of set operations
HAVING without GROUP BY
CSV bad record handling
Spark 2.4 CSV example
Configuring storage locations
Querying Hive managed tables from Spark
Compiling and running Spark workloads
Post-migration tasks
Spark 2.4 to Spark 3.2 Refactoring
Migrating Spark CDP to Cloudera Data Engineering
Cloudera Data Engineering Concepts
Convert Spark Submit commands to CDE CLI Spark Submit commands
Using the Cloudera Data Engineering CLI
Convert Spark Submits to CDE API Requests
Using Swagger Page
Getting Started with CDE Airflow
Using Airflow
Using spark-submit drop-in migration tool for migrating Spark workloads to CDE