What's New

Cloudera on premises 1.5.4 includes the following features for Cloudera AI.

New features

Cloudera AI Service Accounts are available in Cloudera AI on premises

In Cloudera AI, the Kerberos principal for the Service Account may not be the same as your login information. Therefore, ensure you provide the Kerberos identity when you sign in to the Service Account. For more information, see Authenticating Hadoop for Cloudera AI Service Accounts.

Cloudera AI Registry is available in Cloudera on premises

Cloudera AI Registry is now generally available (GA) in on premises. Cloudera AI Registry in on premises uses Apache Ozone to store model artifacts. For creating a Cloudera AI Registry you need the Ozone S3 gateway endpoint, the Ozone access key, and the Ozone secret key.

If you deploy Cloudera AI Registry in an environment that contains one or more Cloudera AI Workbenches, you must synchronize the Cloudera AI Registry with the workbenches. For more information, see Prerequisites for creating Cloudera AI Registry and Synchronizing the Cloudera AI Registry with a workbench.

Heterogeneous GPU usage

When using heterogeneous GPU clusters to run sessions and jobs, the available GPU accelerator labels need to be selected during workload creation. For more information, see Heterogeneous GPU clusters.

Data connections without auto discovery

Cloudera AI is a flexible, open platform, supporting connections to many data sources. The provided code samples demonstrate how to access local data for Cloudera AI workloads. For more information, see Connecting to Cloudera Data Warehouse.

Spark Log4j Configuration

Cloudera AI allows you to update Spark’s internal logging configuration on a per-project basis. Spark logging properties can be customized for every session, and job with a default file path found at the root of your project. You can also specify a custom location with a custom environment variable. For more information, see Spark Log4j Configuration.

ML Metrics Collector service

The Metrics Collector service gathers data about how users and groups use resource quota, like how much CPU, Memory and GPU capacity (if any) is allocated, and what the users or groups utilize from that. The Metrics Collector service is running by default, but to collect data about resource quota metrics, you need to enable the Quota Management feature. For more information, see ML Metrics Collector Service overview.

Quota Management for group level

Quota Management Technical Preview (TP) release enables you to control how resources are allocated within your Cloudera AI Workbench on user and on group level. Yunikorn Gang Scheduling is also available, which is the default scheduling mechanism in Cloudera AI. For more information, see Quota Management overview and Yunikorn Gang Scheduling.

Restarting a failed AMP setup

You can now retry failed AMP deployment steps and continue the AMP setup to handle intermittent and configuration issues. For more information, see Restarting a failed AMP setup.

New Hadoop CLI Runtime Add-on versions are available

The HadoopCLI 7.1.8.3-601 Runtime Add-on is released for the Cloudera on premises.