Running a job interactively

Log into a Spark gateway node.

Ensure the required security token is authorized to compile and execute the workload (if your cluster is Kerberized).

Launch the “spark-shell”.

For example:

 spark-shell --jars target/mylibrary-1.0-SNAPSHOT-jar-with-dependencies.jar

Create a Spark context and run workload scripts.

cala> import org.apache.spark.sql.hive.HiveContext
scala> val sqlContext = new HiveContext(sc)
scala> sqlContext.sql("CREATE TABLE IF NOT EXISTS default.sales_spark_1(Region string, Country string,Item_Type string,Sales_Channel string,Order_Priority string,Order_Date date,Order_ID int,Ship_Date date,Units_sold string,Unit_Price string,Unit_cost string,Total_revenue string,Total_Cost string,Total_Profit string) row format delimited fields terminated by ','")
scala> sqlContext.sql("load data local inpath '/tmp/sales.csv' into table default.sales_spark_1")
scala> sqlContext.sql("show tables")
scala> sqlContext.sql("select * from default.sales_spark_1 limit 10").show()
scala> sqlContext.sql ("select count(*) from default.sales_spark_1").show()

Go to the Spark History server web UI at http://<spark_history_server>:18088, and check the status and performance of the workload.