Using Spark 2 from Python Cloudera Data Science Workbench supports using Spark 2 from Python via PySpark. Setting Up a PySpark ProjectThe default Cloudera Data Science Workbench engine currently includes Python 2.7.18 and Python 3.6.10. Spark on ML RuntimesSpark is supported for ML Runtimes with Python 3.6 and Python 3.7 kernels given that the following workaround is applied on the cluster:Example: Montecarlo EstimationWithin the template PySpark project, pi.py is a classic example that calculates Pi using the Montecarlo Estimation.Example: Locating and Adding JARs to Spark 2 ConfigurationThis example shows how to discover the location of JAR files installed with Spark 2, and add them to the Spark 2 configuration. Example: Distributing Dependencies on a PySpark ClusterAlthough Python is a popular choice for data scientists, it is not straightforward to make a Python library available on a distributed PySpark cluster. To determine which dependencies are required on the cluster, you must understand that Spark code applications run in Spark executor processes distributed throughout the cluster. If the Python code you are running uses any third-party libraries, Spark executors require access to those libraries when they run on remote executors.