Chapter 9. Troubleshooting Spark

When you run a Spark job, you will see a standard set of console messages.

In addition, the following information is available:

A list of running applications, where you can retrieve the application ID and check the application log:
yarn application –list
yarn logs -applicationId <app_id>
For information about a specific job, check the Spark web UI:
http://<host>:8088/proxy/<job_id>/environment/

The following paragraphs describe specific issues and possible solutions.

Issue: Spark YARN jobs don’t seem to start. YARN Resource Manager logs show an application with "bad substitution” errors in its logs.

Solution: Make sure that your $SPARK_HOME/config/spark-defaults.conf file includes your HDP version. For example:

   spark.driver.extraJavaOptions 
   -Dhdp.version=2.3.0.0-2557 
   spark.yarn.am.extraJavaOptions
   -Dhdp.version=2.3.0.0-2557

To check the HDP version for an Ambari-managed cluster, navigate to http://$AMBARI_SERVER:8080/#/main/admin/stack/versions, where $AMBARI_SERVER is your Ambari Web URL.

To check the version via bash, run the following command:

> bash-4.1# hdp-select status hadoop-client | sed 's/hadoop-client - $.*$/\1/'

2.3.0.0-2557

Issue: Job stays in "accepted" state; it doesn't run. This can happen when a job requests more memory or cores than available.

Solution: Assess workload to see if any resources can be released. You might need to stop unresponsive jobs to make room for the job.

Issue: Insufficient HDFS access. This can lead to errors such as the following:

   “Loading data to table default.testtable
   Failed with exception 
   Unable to move sourcehdfs://blue1:8020/tmp/hive-spark/hive_2015-06-04_
   12-45-42_404_3643812080461575333-1/-ext-10000/kv1.txt to destination 
   hdfs://blue1:8020/apps/hive/warehouse/testtable/kv1.txt”

Solution: Make sure the user or group running the job has sufficient HDFS privileges to the location.

Issue: Wrong host in Beeline, shows error as invalid URL:

   Error: Invalid URL: jdbc:hive2://localhost:10001 (state=08S01,code=0)

Solution: Specify the correct Beeline host assignment.

Issue: Error: closed SQLContext.

Solution: Restart the Thrift server.

​Chapter 9. Troubleshooting Spark

Chapter 9. Troubleshooting Spark