Shell action for Spark 3
Learn about how to execute Spark 3's spark3-submit through Oozie's Shell action.
Similar to the support for executing Spark 2's spark-submit through Oozie's Shell action, Cloudera also provides full support for executing Spark 3's spark3-submit through Oozie's Shell action.
As Oozie utilizes delegation tokens instead of Kerberos tickets in its YARN applications, it is
recommended to unset the
HADOOP_TOKEN_FILE_LOCATION
environment variable in your
Shell script before executing spark3-submit, if you intend to use spark3-submit without relying
on Oozie's default delegation tokens. This is because spark3-submit might not function properly
with both delegation tokens and Kerberos tickets. However, to ensure the successful completion of
your Shell action, please ensure that you reset the HADOOP_TOKEN_FILE_LOCATION
environment variable after the execution of your custom Shell script segment. The following
example illustrates how you can accomplish
this:#!/usr/bin/env bash
# By executing the commands within brackets,
# we can ensure that the parent environment remains untouched
(
unset HADOOP_TOKEN_FILE_LOCATION
kinit -kt /var/keytabs/user.keytab user
/usr/bin/spark3-submit --master yarn --deploy-mode cluster \
create_table_with_data_spark3.py tableUsingSpark3FromShellAction
/usr/bin/spark3-submit --master yarn --deploy-mode cluster \
read_created_table.py tableUsingSpark3FromShellAction
)