Spark on ML Runtimes

Spark is supported for ML Runtimes with Python 3.6 and Python 3.7 kernels given that the following workaround is applied on the cluster:

Python must be installed on the CDH cluster YARN Node Manager nodes which should match the Python version of the selected ML Runtime (i.e., 3.6 or 3.7)
This Python version must be specified by its path for Spark using the pyspark_python environment variable
As an example for 3.7, one could specify the environment variable like this for the CDSW project:
- "PYSPARK_PYTHON": "/usr/local/bin/python3.7"