Architecture Overview Cloudera ManagerCloudera Manager is an end-to-end application used for managing CDH clusters. When a CDH service (such as Impala, Spark, etc.) is added to the cluster, Cloudera Manager configures cluster hosts with one or more functions, called roles. Legacy Engines engines are responsible for running R, Python, and Scala code written by users and intermediating access to the CDH cluster. Cloudera Data Science Workbench Web ApplicationThe Cloudera Data Science Workbench web application is typically hosted on the master host, at http://cdsw.<your_domain>.com. CDS 2.x Powered by Apache SparkApache Spark is a general purpose framework for distributed computing that offers high performance for both batch and stream processing. It exposes APIs for Java, Python, R, and Scala, as well as an interactive shell for you to run jobs.