What's New in CDH 5.11.x
Continue reading:
What's New in CDH 5.11.2
This is a maintenance release that fixes some important issues. For details, see Issues Fixed in CDH 5.11.2.
What's New in CDH 5.11.1
This is a maintenance release that fixes some important issues. For details, see Issues Fixed in CDH 5.11.1.
What's New In CDH 5.11.0
The following sections describe new features introduced in 5.11.0.
Apache Hadoop
- Supported Apache Tomcat TLS ciphers for HttpFS are configurable using the HTTPFS_SSL_CIPHERS environment variable.
- Supported Apache Tomcat TLS ciphers for the KMS are configurable using the KMS_SSL_CIPHERS environment variable.
- Amazon S3 Consistency with Metadata Caching (S3Guard)
Data written to Amazon S3 buckets is subject to the "eventual consistency" guarantee provided by Amazon Web Services (AWS), which means that data written to S3 may not be immediately available for queries and listing operations. This can cause failures in multi-step ETL workflows, where data from a previous step is not available to the next step. To mitigate these consistency issues you can now configure metadata caching for data stored in Amazon S3 using S3Guard. S3Guard requires that you provision a DynamoDB database from Amazon Web Services and configure S3Guard using the Cloudera Manager Admin Console or command-line tools. See Configuring and Managing S3Guard.
- Amazon S3 Server-side Encryption with SSE-KMS
Clusters that use Amazon S3 storage can now use Amazon Server-Side Encryption with AWS KMS–Managed Keys (SSE-KMS) to encrypt data, so you now have two choices for data-at-rest encryption on Amazon S3 (SSE-S3, SSE-KMS). Use Cloudera Manager Admin Console to configure the cluster to use this new feature as detailed in How to Configure Encryption for Amazon S3.
Apache HBase
- CDH 5.11.0 includes the introduction of “region server groups”. This feature allows for specific tables to be tied to specific region servers. The primary benefit of region server groups is application isolation. Multiple applications on HBase can be assured that there will be no i/o, cpu, and memory contention as long as they only access tables in mutually exclusive region server groups. There are trade-offs when using region server groups, notably performance. Tables that are assigned to region server groups will not be able to recruit the full hardware resources of the cluster, and the maximum possible throughput for tables assigned to region server groups composed of a subset of the region servers that comprise the cluster will be slower than those that are not.
- Use MOB_COMPACT_PARTITION_POLICY options to reduce the number of MOB files stored in HDFS. You can choose from daily, weekly, and monthly options.
Apache Hive
-
Hive on Amazon S3 performance optimizations for:
-
HIVE-14204 : Dynamic partitioning writes and the INSERT OVERWRITE statement
-
HIVE-15546 : Parallel input path listing
-
HIVE-13901, HIVE_15879 : Performance and stability of the MSCK command for recovering partitions
-
-
Support for Microsoft Azure Data Lake Store (ADLS) as a secondary filesystem for Hive on MapReduce2 (YARN). You can use Hive on MapReduce2 or Hive-on-Spark to read and write data stored on ADLS.
-
AWS cloud clusters can now share a single persistent instance of Amazon Relational Database Service (RDS) as the Hive metastore backend database, enabling persistent sharing of metadata beyond a cluster's life cycle.
See How To Set Up a Shared Amazon RDS as Your Hive Metastore for CDH
Hue
-
Integrate Navigator with Hue: Phase 1, Metadata Discovery
- Search and tag partitions, databases, views, tables, columns.
- Off by default. Check both "Enable" fields in .
- See How to Use Governance-Based Data Discovery.
-
Embed new create table wizard within Editor and Assist
- Safely import multiple formats such as Kudu, Parquet, JSON, and CSV.
- More easily create table partitions.
- Continued SQL improvements
- Visually more pleasant colors and text.
- No more hanging spinner in the Editor.
-
HUE-5742: Allow non-public PostgreSQL schemas.
-
HUE-5608: Add ability to DESC table without TABLE level privilege
Apache Impala
Apache Oozie
- Supported TLS ciphers for Apache Tomcat are configurable using the OOZIE_HTTPS_CIPHERS environment variable.
Apache Spark
Blacklisting. This feature reduces the chance of application failure, by not scheduling work on hosts that are experiencing intermittent disk failures. See this blog post for background information.
You can enable Kerberos authentication and TLS/SSL encryption for the Spark History Server through Cloudera Manager configuration settings, rather than including the password in clear text in an Advanced Configuration Snippet field. See these settings in the Cloudera Manager user interface:
- history_server_spnego_enabled - for Kerberos authentication
- history_server_admin_users
- spark.ssl.historyServer.enabled
- spark.ssl.historyServer.protocol
- spark.ssl.historyServer.port
- spark.ssl.historyServer.enabledAlgorithms
- spark.ssl.historyServer.keyStore
- spark.ssl.historyServer.keyStorePassword
With authentication enabled, only Kerberos-authorized users can read data from the Spark History Server, and non-admin users can only see information about their own jobs.
With TLS/SSL enabled, you provide the location of the keystore and its password, similar to the security configuration for other components.
Navigator lineage. The former Spark lineage extractor that was enabled through a safety valve is superceded by a more robust lineage collection mechanism. See Apache Spark Known Issues for some limitations and restrictions with this feature.
Support for Azure Data Lake Store (ADLS) as a secondary filesystem. You can use Spark jobs to read and write data stored on ADLS. Hive-on-Spark and Spark with Kudu are not currently supported for ADLS data.
Cloudera Search
- Supported TLS ciphers for Apache Tomcat are configurable using the SOLR_CIPHERS_CONFIG environment variable.
ZooKeeper
Server-Server Mutual Authentication
All ZooKeeper servers in an ensemble can now be configured to support quorum peer (server-server) mutual authentication, mitigating risk of spoofing by a rogue server on an unsecured network. The feature leverages Kerberos authentication through the SASL framework, so Kerberos is required.
This feature is easy to enable using Cloudera Manager Admin Console. See Enabling Server-Server Mutual Authentication in the ZooKeeper Authentication page of Cloudera Security for details.