New Features and Changes in Cloudera Manager 6.2.0

The following sections describe new and changed features for Cloudera Manager 6.2.0:

Virtual Private Clusters - Separation of Compute and Storage services

A Virtual Private Cluster uses the Cloudera Shared Data Experience (SDX) to simplify deployment of both on-premise and cloud-based applications and enable workloads running in different clusters to securely and flexibly share data.

A new type of cluster is available in CDH 6.2, called a Compute cluster. A Compute cluster runs computational services such as Impala, Spark, or YARN but you configure these services to access data hosted in another Regular CDH cluster, called the Base cluster. Using this architecture you can separate compute and storage resources in a variety of ways to flexibly maximize resources.

See Virtual Private Clusters and Cloudera SDX.

Ubuntu 18 Support

Support for Ubuntu 18.04 has been added for Cloudera Manager and CDH 6.2 and higher.

Cloudera Issue: OPSAPS-48410

Backup and Disaster Recovery (BDR)

Hive Direct Replication to S3/ADLS Backed Cluster

BDR now supports Hive direct replication from on-premise to S3/ADLS clusters and metadata replication to the Hive Metastore.

Using a single replication process, BDR enables Hive data to be pulled from HDFS to S3/ADLS clusters and use the "Hive-on-cloud" mode, where the target Hive Metastore updates the table locations to point to S3/ADLS clusters. This process facilitates easy data migration and synchronisation between the cloud and on-premise clusters.

For more information, see Hive/Impala Replication.

Replication to and from ADLS Gen2

You can now replicate HDFS files and Hive data to and from Microsoft ADLS Gen2. To use ADLS Gen2 as the source or destination, you must add Azure credentials to Cloudera Manager. Note that the URI format for ADLS Gen2 is not the same as ADLS Gen1. For ADLS Gen2 use the following URI format: abfs[s]://<file_system>@<account_name>.dfs.core.windows.net/<path>/.

Hosts

Duplicate Host Detection and Hostname Migration

Cloudera Manager now detects and rejects duplicate hosts from joining a cluster and gracefully tolerates > changes in hostnames for managed hosts, better supporting automated deployments

Installation

Accumulo Initialization

An Initialize Accumulo checkbox now displays in the Installation wizard.

Cloudera Issue: OPSAPS-48619

JDBC URL for the Hive Metastore Database Connection

You can now specify a JDBC URL when establishing a connection from the Hive service to a supported backend database (MySQL, PostgreSQL, or OracleDB). Enter the JDBC URL on the Setup Database page in the Create Cluster and Create Service wizards in Cloudera Manager.

Cloudera Issue: OPSAPS-48668

Licensing

Start and Deactivation Dates for Cloudera Enterprise Licenses

Cloudera Enterprise licenses now include a start date and a deactivation date. Enterprise-only features are enabled on the start date and will be disabled after the deactivation date. If you install the license before the start date, a banner displays in the Cloudera Manager Admin console showing the number of days until the license becomes effective.

Cloudera Issue: OPSAPS-47500

Enhanced License Enforcement - Node Limit

When an Enterprise license expires, Cloudera Manager reverts to the Express version. This includes enforcing a maximum of 100 nodes across all CDH 6 clusters.

Cloudera Issue: OPSAPS-48611

Enhanced License Enforcement - Feature Availability

Features only available with a Cloudera Enterprise license are turned off after the deactivation date has passed. For legacy licenses that do not have a deactivation date, the features are turned off on the expiration date.

Cloudera Issue: OPSAPS-46864

Enhanced License Enforcement - KMS Configuration

Cloudera Manager will not allow KMS configuration changes after the deactivation date specified in the new license file although the KMS will remain functional. For legacy licenses, the deactivation date defaults to the expiration date specified in the license.

Cloudera Issue: OPSAPS-48501

Cloudera Manager API

Cross-Cluster Network Bandwidth Test

Cloudera Manager now has an API to test network bandwidth between clusters, helping determine if the infrastructure is suitable for separating storage and compute services.

API for Managing Expiring Cloudera Manager Sessions

There is a new Cloudera Manager API endpoint, /users/expireSessions/{UserName} that can be invoked by a user with the Full administrator or User administrator role that expires all of a particular user's active Cloudera Manager Admin console sessions - local or external. Please refer to the Cloudera Manager REST API documentation for more information.

Cloudera Issue: OPSAPS-43756

Service Type Information in the ApiServiceRef

The Cloudera Manager API endpoint ApiServiceRef now returns the service type. Please refer to the Cloudera Manager REST API documentation for more information.

Cloudera Issue: OPSAPS-48369

API to Emit All Features Available

A new attribute/property features has been added to the API endpoint /cm/license. It lists all the features that are available in the product for the given license. For example:
{ ""owner"" : ""John Smith"", ""uuid"" : ""12c8052f-d78f-4a8e-bba4-a55a2d141fcc"", ""features"" : [ { ""name"" : ""PEERS"", ""description"" : ""Peers"" }, { ""name"" : ""BDR"", ""description"" : ""BDR"" }, { ""name"" : ""KERBEROS"", ""description"" : ""Kerberos"" }, . . .

Please refer to the Cloudera Manager REST API documentation for more information.

Cloudera Issue: OPSAPS-49060

New Name Attribute for ApiAuthRole

ApiAuthRole entities can now be specified and looked up with a name string for the role, as specified in the API documentation. Please refer to the Cloudera Manager REST API documentation for more information.

Cloudera Issue: OPSAPS-46780

Kafka Configuration and Monitoring

New Kafka Metrics

The following metrics have been added:
  • kafka_topic_unclean_leader_election_enable_rate_and_time_ms
  • kafka_incremental_fetch_session_evictions_rate -
  • kafka_num_incremental_fetch_partitions_cached -
  • kafka_num_incremental_fetch_sessions
  • kafka_groups_completing_rebalance
  • kafka_groups_dead
  • kafka_groups_empty
  • kafka_groups_preparing_rebalance
  • kafka_groups_stable
  • kafka_zookeeper_request_latency
  • kafka_zookeeper_auth_failures
  • kafka_zookeeper_disconnects
  • kafka_zookeeper_expires
  • kafka_zookeeper_read_only_connects
  • kafka_zookeeper_sasl_authentications
  • kafak_zookeeper_sync_connects

The following metric is deprecated: kafka_responses_being_sent

Cloudera Issue: OPSAPS-48911, OPSAPS-48798, OPSAPS-48311, OPSAPS-48656

Kafka Broker ID Display

Kafka Broker IDs are now displayed on the Cloudera Manager's Kafka Instances page.

Cloudera Issue: OPSAPS-44331

Kafka Topics in the diagnostic bundle

Diagnostic bundles for Kafka will now include the output of the following commands:
  • kafka-topics --describe
  • kafka-topics --list

Cloudera Issue: OPSAPS-36755

Kafka Configuration Properties for Delegation Tokens

The following new configuration parameters required to configure Kafka delegation tokens have been added:
  • delegation.token.max.lifetime.ms

    The token has a maximum lifetime beyond which it cannot be renewed anymore. Default value 7 days.

  • Delegation.token.expiry.time.ms

    The token validity time in seconds before the token needs to be renewed. Default value 1 day.

Cloudera Issue: OPSAPS-47051

Enhanced Security for Kafka in Zookeeper with ACLs

A new script, zookeeper-security-migration.sh script is now available to lock down Kafka data in Zookeeper. See Kafka Security Hardening with Zookeeper ACLs.

Cloudera Issue: OPSAPS-47988

Hive Server 2

New Graph for the Compilation Metrics

A new graph, Operations Awaiting Compilation for HiveServer2 compilation metrics has been added.

Cloudera Issue: OPSAPS-47506

Secured ADLS Credentials for HS2

ADLS credentials are now stored securely via Cloudera Manager for use with HS2. This enables multi-user Hive-with-ADLS clusters.

Learn more at Configuring ADLS Access Using Cloudera Manager.

Cloudera Issue: OPSAPS-49076

Secured S3 Credentials HS2 on S3

S3 credentials are now stored securely by Cloudera Manager for use with Hive. This enables multi-user Hive-on-S3 clusters.

Learn more at Configuring the Amazon S3 Connector.

The following sub-tasks are related to this feature:

  • Distribute the path of the HDFS credential store file and decryption password to HS2

    Adds job credstore path and decryption password propagation for HS2.

    Cloudera Issue: OPSAPS-48662

  • Manage an encrypted credential store in HDFS for HS2

    Adds a job specific credstore for HS2.

    Cloudera Issue: OPSAPS-48661

  • Rotate the password and the encrypted credential file in HDFS on every HS2 restart

    Adds password and credstore file rotation on every HS2 role restart.

    Cloudera Issue: OPSAPS-48663

delegation.token.master.key Generation

delegation.token.master.key is now automatically generated by Cloudera Manager/.

Cloudera Issue: OPSAPS-48525

New Warning for Hue Advanced Configuration Snippet

Warnings will be emitted if the values for Hue Service Advanced Configuration Snippet or Hue Server Advanced Configuration Snippet are not formatted properly. For example, if it does not contain a configuration section like [desktop].

Cloudera Issue: OPSAPS-27606

Increased Default Value for dfs.client.block.write.locateFollowingBlock.retries configuration

The default value for the HDFS configuration dfs.client.block.write.locateFollowingBlock.retries configuration's has been changed from 5 to 7.

Cloudera Issue: OPSAPS-48170

Support GPU Scheduling and Isolation for YARN

Added support to enable usage of GPUs in YARN applications and for custom YARN resource types.

Cloudera Issue: OPSAPS-48685

Health Test for Erasure Coding Policies

A new Verify Erasure Coding Policies For Cluster Topology health test has been introduced. The health test fails with a yellow status if there are not enough data nodes or racks to support all enabled erasure coding policies.

Cloudera Issue: OPSAPS-48526

Disk Caching Configurations in Spark Service

Disk caching for the Spark History Server can now be enabled from Cloudera Manager.

Cloudera Issue: OPSAPS-48385

Decimal Support for Sqoop Clients

Sqoop decimal support for Parquet and Avro imports will now be turned on by default for new CDH 6.2 (or higher) clusters. In the case of an newly upgraded cluster, decimal support must be enabled manually.
  • Setting the following property to enable decimal support in Avro: sqoop.avro.logical_types.decimal.enable=true
  • Setting the following properties to enable decimal support in Parquet:

    sqoop.parquet.logical_types.decimal.enable=true

    parquetjob.configurator.implementation=hadoop

Please note that changing any of these properties might break existing Sqoop jobs, or alter their output in a way that disrupts consumers further down the chain.

Cloudera Issue: OPSAPS-48938

TLS

Apply Auto-TLS Configuration to Existing Services

You can now use Auto-TLS to add TLS to an existing cluster. This functionality is available in both the Cloudera Manager Admin Console and by using the API. See Configuring TLS Encryption for Cloudera Manager and CDH Using Auto-TLS,

There is a new cluster Cloudera Manager API command ConfigureAutoTlsServices which will enable Auto-TLS for services in a single cluster. Please refer to the Cloudera Manager REST API documentation for more information.

Cloudera Issue: OPSAPS-47349

HTTP Strict Transport Security

When TLS is enabled for the Cloudera Manager Admin Console web requests now include the HTTP Strict-Transport-Security header. For more details about this header, see Strict-Transport-Security (Mozilla).

Cloudera Issue: OPSAPS-282290

Support for TLS proto/ciphers in Custom Service Descriptors (CSD)

Added the ability to specify the TLS protocol and the TLS cipher suites in CSDs.

Cloudera Issue: OPSAPS-48214

Expose the configurations to use TLS encryption to the Hive Metastore Database on the Hive Metastore (Hive) Configurations Page

Exposes properties that can be used to configure TLS from the Hive Metastore Server to the Hive Metastore Database. As a minimum configuration requirement, the Enable TLS/SSL to the Hive Metastore Database checkbox must be enabled. (The default value is disabled.) If the Hive Metastore TLS/SSL Client Truststore properties are provided, then those will be used. Otherwise, the default list of well-known certificate authorities will be used. Additionally, ability to provide a JDBC URL override to use when connecting to the database is also exposed. This will override all other values used to create the JDBC URL. This is an advanced configuration option and should only be used as a safety-valve.

Cloudera Issue: OPSAPS-48666

Enable Auto-TLS Globally

There is now a Cloudera Manager API command, GenerateCmcaCommand, which will enable Auto-TLS for an existing Cloudera Manager deployment. This command creates an internal Cloudera Manager Certificate Authority (CMCA) and certificates for all existing hosts. Please refer to the Cloudera Manager REST API documentation for more information.

Cloudera Issue: OPSAPS-43102

Kafka/Flume Auto-TLS enhancements

Flume now supports Auto-TLS when used with Kafka.

Cloudera Issue: OPSAPS-46339

License Enforcement - Auto TLS

Auto-TLS is not available when using a Trial license. To enable Auto-TLS, you must have an Enterprise license.

Cloudera Issue: OPSAPS-48981

Custom certificates for Cloudera Manager Certificate Authority (CMCA)

When using Auto-TLS with custom certificates, you can use the new AddCustomCerts command to add certificates associated with a hostname to the Auto-TLS certificate database. Please refer to the Cloudera Manager REST API documentation for more information. details.

Cloudera Issue: OPSAPS-48678