Configuring and Running Spark (Standalone Mode)
Configuring Spark
You can change the default configuration by modifying /etc/spark/conf/spark-env.sh. You can change the following:
- SPARK_MASTER_IP, to bind the master to a different IP address or hostname
- SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
- SPARK_WORKER_CORES, to set the number of cores to use on this machine
- SPARK_WORKER_MEMORY, to set how much memory to use (for example 1000MB, 2GB)
- SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT
- SPARK_WORKER_INSTANCE, to set the number of worker processes per node
- SPARK_WORKER_DIR, to set the working directory of worker processes
Starting, Stopping, and Running Spark
- To start Spark:
$ sudo service spark-master start $ sudo service spark-worker start
Note: Start the master on only one node.
- To stop
Spark:
$ sudo service spark-worker stop $ sudo service spark-master stop
Service logs are stored in /var/log/spark.
You can use the GUI for the Spark master at <master_host>:18080.
Testing the Spark Service
To test the Spark service, start spark-shell on one of the nodes. You can, for example, run a word count application:
val file = sc.textFile("hdfs://namenode:8020/path/to/input") val counts = file.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://namenode:8020/output")
You can see the application by going to the Spark Master UI, by default at http://spark-master:18080, to see the Spark Shell application, its executors and logs.
Running Spark Applications
For details on running Spark applications in the YARN Client and Cluster modes, see Running Spark Applications.<< Installing and Upgrading Spark | Running Spark Applications >> | |