Closing HiveWarehouseSession operations
You need to know how to release locks that Apache Spark operations puts on Apache Hive resources. An example shows how and when to release these locks.
Spark can invoke operations, such as cache()
,
persist()
, and rdd()
, on a DataFrame you
obtain from running a HiveWarehouseSession .table()
or
.sql()
(or alternatively, .execute()
or
.executeQuery()
). The Spark operations can lock Hive resources.
You can release any locks and resources by calling the HiveWarehouseSession
close()
. Calling close()
invalidates the
HiveWarehouseSession instance and you cannot perform any further operations on the
instance.
close()
when you finish running all other operations on the instance of HiveWarehouseSession.
import com.hortonworks.hwc.HiveWarehouseSession
import com.hortonworks.hwc.HiveWarehouseSession._
val hive = HiveWarehouseSession.session(spark).build()
hive.setDatabase("tpcds_bin_partitioned_orc_1000")
val df = hive.sql("select * from web_sales")
. . . //Any other operations
.close()
close()
at the end of an iteration if the
application is designed to run in a microbatch, or iterative, manner that does
not need to share previous states.table()
or sql()
(or alternatively,
.execute()
or .executeQuery()
).