Spark on ML Runtimes
Spark is supported for ML Runtimes with Python 3.6 and Python 3.7 kernels given that the following workaround is applied on the cluster.
- Python must be installed on the CDH cluster YARN Node Manager nodes which should match the Python version of the selected ML Runtime (i.e., 3.6 or 3.7)
- This Python version must be specified by its path for Spark using the pyspark_python environment variable
- As an example for 3.7, one could specify the environment variable like this for the
CDSW project:
- "PYSPARK_PYTHON": "/usr/local/bin/python3.7"