Configuring and Running Spark (Standalone Mode)
Configuring Spark
You can change the default configuration by modifying /etc/spark/conf/spark-env.sh. You can change the following:
- SPARK_MASTER_IP, to bind the master to a different IP address or hostname
- SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
- SPARK_WORKER_CORES, to set the number of cores to use on this machine
- SPARK_WORKER_MEMORY, to set how much memory to use (for example 1000MB, 2GB)
- SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT
- SPARK_WORKER_INSTANCE, to set the number of worker processes per node
- SPARK_WORKER_DIR, to set the working directory of worker processes
Starting, Stopping, and Running Spark
- To start Spark:
$ sudo service spark-master start $ sudo service spark-worker start
Note
: Start the master on only one node.
- To stop
Spark:
$ sudo service spark-worker stop $ sudo service spark-master stop
Service logs are stored in /var/log/spark.
You can use the GUI for the Spark master at <master_host>:18080.
Testing the Spark Service
To test the Spark service, start spark-shell on one of the nodes. You can, for example, run a word count application:
val file = sc.textFile("hdfs://namenode:8020/path/to/input") val counts = file.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://namenode:8020/output")
You can see the application by going to the Spark Master UI, by default at http://spark-master:18080, to see the Spark Shell application, its executors and logs.
Running Spark Applications
For details on running Spark applications in the YARN Client and Cluster modes, see Running Spark Applications.Page generated September 3, 2015.
<< Installing and Upgrading Spark | Running Spark Applications >> | |