Core Capabilities of Cloudera Data Science Workbench
Cloudera Manager is an end-to-end application used for managing CDH clusters.
For Data Scientists
- Projects
- Organize your data science efforts as isolated projects, which might include reusable code, configuration, artifacts, and libraries. Projects can also be connected to GitHub repositories for integrated version control and collaboration.
- Workbench
- A workbench for data scientists and data engineers that includes support for:
- Interactive user sessions with Python, R, and Scala through flexible and extensible engines.
- Project workspaces powered by Docker containers for control over environment configuration. You can install new packages or run command-line scripts directly from the built-in terminal.
- Distributing computations to your Cloudera Manager cluster using CDS 2.x Powered by Apache Spark and Apache Impala.
- Sharing, publishing, and collaboration of projects and results.
- Jobs
- Automate analytics workloads with a lightweight job and pipeline scheduling system that supports real-time monitoring, job history, and email alerts.
- Batch Experiments
Demo - Experiments
Use batch jobs to train and compare versioned, reproducible models. With experiments, data scientists can:- Create versioned snapshots of model code, dependencies, and any configuration parameters required to train the model.
- Build and execute each training run in an isolated container.
- Track model metrics, performance, and model artifacts as required.
- Models
-
Demo - Model Deployment
Deploy and serve models as REST APIs. Data scientists can select a specific Python or R function within a project file to be deployed as a model, and Cloudera Data Science Workbench will:- Create a snapshot of the model code, saved model parameters, and dependencies.
- Build an immutable executable container with the trained model and serving code.
- Deploy the model as a REST API along with a specified number of replicas, automatically load balanced.
- Save the built model container, along with metadata such as who built or deployed it.
- Allow data scientists to test and share the model
For IT Administrators
- Native Support for the Cloudera Enterprise Data Hub
- Direct integration with the Cloudera Enterprise Data Hub makes it easy for end users to interact with existing clusters, without having to bother IT or compromise on security. No additional setup is required. They can just start coding.
- Enterprise Security
- Cloudera Data Science Workbench can leverage your existing authentication systems such as SAML or LDAP/Active Directory. It also includes native support for Kerberized Hadoop clusters.
- Native Spark 2 Support
- Cloudera Data Science Workbench connects to existing Spark-on-YARN clusters with no setup required.
- Flexible Deployment
- Deploy on-premises or in the cloud (on IaaS) and scale capacity as workloads change.
- Multitenancy Support
- A single Cloudera Data Science Workbench deployment can support different business groups sharing common infrastructure without interfering with one another, or placing additional demands on IT.