Use Spark 3 actions with a custom Python executable
Learn how to use a custom Python executable in a given Spark 3 action.
Similar to Spark 2, Spark 3 also provides the capability to define a custom Python
executable for use with spark3-submit through the spark.pyspark.python
Spark 3
conf argument. For more details, please see the Latest Spark3 documentation. Consequently, if you include the
spark.pyspark.python
Spark 3 conf in your Oozie Spark 3 action, the Python
executable you specify is used when executing the Spark 3 action through Oozie.
To simplify the usage of a customized Python executable with Oozie's Spark 3 action,
you can use the oozie.service.Spark3ConfigurationService.spark.pyspark.python
property. This property functions similar to Spark 3's spark.pyspark.python
conf
argument, allowing you to specify a custom Python executable. Oozie then passes this executable
to the underlying Spark 3 application executed through Oozie.
You can specify this configuration in different ways.
Setting Spark 3 actions with a custom Python executable globally
- Navigate to Oozie's configuration page in Cloudera Manager.
- Search for
Python Executable for Spark3 Actions
. - Specify its value to point to your custom Python executable.For example, if you installed Python 3.7 to
/opt/python37-for-oozie
, then specify the value as/opt/python37-for-oozie/bin/python3
. - Save the modifications.
- Allow Cloudera Manager some time to recognize the changes.
- Redeploy Oozie.
Setting Spark 3 actions with a custom Python executable per workflows
<workflow-app name="spark_workflow" xmlns="uri:oozie:workflow:1.0">
<global>
<configuration>
<property>
<name>oozie.service.Spark3ConfigurationService.spark.pyspark.python</name>
<value>/opt/python37-for-oozie/bin/python3</value>
</property>
</configuration>
</global>
<start to="spark_action"/>
<action name="spark_action">
...
The same workflow-level Python executable can be achieved if you set the property in
your job.properties
file.
Setting Spark 3 actions with a custom Python executable for a given Spark action only
<workflow-app name="spark_workflow" xmlns="uri:oozie:workflow:1.0">
<start to="spark_action"/>
<action name="spark_action">
<spark3 xmlns="uri:oozie:spark3-action:1.0">
<resource-manager>${resourceManager}</resource-manager>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>oozie.service.Spark3ConfigurationService.spark.pyspark.python</name>
<value>/opt/python37-for-oozie/bin/python3</value>
</property>
</configuration>
...
- Oozie does not override the configuration of
spark.pyspark.python
in the<spark-opts>
tag of your action definition if you have already set it. - If you have configured the property at the action level, it takes precedence over all other settings, and the remaining configurations are disregarded.
- If you have configured the property in the global configuration of the workflow, the value from there is used.
- If the setting is not available in either of the previous locations, the
value configured in your
job.properties
file is used. - Lastly, the global setting in Cloudera Manager comes into effect.
Python Executable for Spark3 Actions
property is
set in Cloudera Manager to /opt/python37-for-oozie/bin/python3
, but
in a workflow or in a specific action you want to use the default Python executable
configured for Spark 3, you can set the value of the property to
default
. For
example:<workflow-app name="spark_workflow" xmlns="uri:oozie:workflow:1.0">
<global>
<configuration>
<property>
<name>oozie.service.Spark3ConfigurationService.spark.pyspark.python</name>
<value>default</value>
</property>
</configuration>
</global>
<start to="spark_action"/>
<action name="spark_action">
...
spark.pyspark.python
Spark 3 conf is not set at all.