Running Apache Spark Applications
Introduction
Apache Spark 3 Requirements
Running Spark 3 Applications
Updating Spark 2 apps for Spark 3
Running your first Spark application
Enabling Spark rolling event log files in CDP
Running sample Spark applications
Configuring Spark Applications
Configuring Spark application properties in spark-defaults.conf
Configuring Spark application logging properties
Submitting Spark applications
spark-submit command options
Spark cluster execution overview
Canary test for pyspark command
Fetching Spark Maven dependencies
Accessing the Spark History Server
Running Spark applications on YARN
Spark on YARN deployment modes
Submitting Spark Applications to YARN
Monitoring and Debugging Spark Applications
Example: Running SparkPi on YARN
Configuring Spark on YARN Applications
Dynamic allocation
Submitting Spark applications using Livy
Using the Livy API to run Spark jobs
Running an interactive session with the Livy REST API
Livy objects for interactive sessions
Setting Python path variables for Livy
Livy API reference for interactive sessions
Submitting batch applications using the Livy REST API
Livy batch object
Livy API reference for batch jobs
Submitting a Spark job to a Data Hub cluster using Livy
Configuring the Livy Thrift Server
Connecting to the Apache Livy Thrift Server
Using Livy with Spark
Using Livy with interactive notebooks
Using PySpark
Running PySpark in a virtual environment
Running Spark Python applications
Automating Spark Jobs with Oozie Spark Action