Cloudera Docs
Spark Guide
Spark Guide
Also available as:
1. Introduction
2. Prerequisites
3. Installing and Configuring Spark
Installing Spark Over Ambari
(Optional) Configuring Spark for Hive Access
(Optional) Configuring Spark for a Kerberos-Enabled Cluster
Accessing the Hive Metastore in Secure Mode
Validating the Spark Installation
4. Developing Spark Applications
Spark Pi Program
WordCount Program
5. Using the Spark DataFrame API
Additional DataFrame API Examples
Specify Schema Programmatically
6. Accessing ORC Files from Spark
Accessing ORC in Spark
Reading and Writing with ORC
Column Pruning
Predicate Push-down
Partition Pruning
DataFrame Support
Additional Resources
7. Adding Libraries to Spark
8. Using Spark with HDFS
Specifying Compression
Accessing HDFS from PySpark: Setting HADOOP_CONF_DIR
9. Accessing Hive Tables from Spark
10. Tuning and Troubleshooting Spark
Hardware Provisioning
Checking Job Status
Checking Job History
Configuring Spark JVM Memory Allocation
Configuring YARN Memory Allocation for Spark
Specifying codec Files
« Prev
Next »
Validating the Spark Installation
To validate the Spark installation, run the following Spark jobs:
Spark Pi example
WordCount example
© 2012–2021 by Cloudera, Inc.
Document licensed under the
Creative Commons Attribution ShareAlike 4.0 License