Upsert option in Kudu Spark
The upsert operation in kudu-spark supports an extra write option of ignoreNull
. If set to true, it will avoid setting
existing column values in Kudu table to Null if the corresponding DataFrame column values are
Null. If unspecified, ignoreNull
is false by
default.
val dataFrame = spark.read
.options(Map("kudu.master" -> "kudu.master:7051", "kudu.table" -> simpleTableName))
.format("kudu").load
dataFrame.createOrReplaceTempView(simpleTableName)
dataFrame.show()
// Below is the original data in the table 'simpleTableName'
+---+---+
|key|val|
+---+---+
| 0|foo|
+---+---+
// Upsert a row with existing key 0 and val Null with ignoreNull set to true
val nullDF = spark.createDataFrame(Seq((0, null.asInstanceOf[String]))).toDF("key", "val")
val wo = new KuduWriteOptions
wo.ignoreNull = true
kuduContext.upsertRows(nullDF, simpleTableName, wo)
dataFrame.show()
// The val field stays unchanged
+---+---+
|key|val|
+---+---+
| 0|foo|
+---+---+
// Upsert a row with existing key 0 and val Null with ignoreNull default/set to false
kuduContext.upsertRows(nullDF, simpleTableName)
// Equivalent to:
// val wo = new KuduWriteOptions
// wo.ignoreNull = false
// kuduContext.upsertRows(nullDF, simpleTableName, wo)
df.show()
// The val field is set to Null this time
+---+----+
|key| val|
+---+----+
| 0|null|
+---+----+