Running a job interactively
- Log into a Spark gateway node.
- Ensure the required security token is authorized to compile and execute the workload (if your cluster is Kerberized).
-
Launch the “spark-shell”.
For example:
spark-shell --jars target/mylibrary-1.0-SNAPSHOT-jar-with-dependencies.jar
-
Create a Spark context and run workload scripts.
cala> import org.apache.spark.sql.hive.HiveContext scala> val sqlContext = new HiveContext(sc) scala> sqlContext.sql("CREATE TABLE IF NOT EXISTS default.sales_spark_1(Region string, Country string,Item_Type string,Sales_Channel string,Order_Priority string,Order_Date date,Order_ID int,Ship_Date date,Units_sold string,Unit_Price string,Unit_cost string,Total_revenue string,Total_Cost string,Total_Profit string) row format delimited fields terminated by ','") scala> sqlContext.sql("load data local inpath '/tmp/sales.csv' into table default.sales_spark_1") scala> sqlContext.sql("show tables") scala> sqlContext.sql("select * from default.sales_spark_1 limit 10").show() scala> sqlContext.sql ("select count(*) from default.sales_spark_1").show()
- Go to the Spark History server web UI at http://<spark_history_server>:18088, and check the status and performance of the workload.