What's New In CDH 5.5.x

Continue reading:

What's New in CDH 5.5.0
What's New in CDH 5.5.1
What's New in CDH 5.5.2
What's New in CDH 5.5.4
What's New in CDH 5.5.5
What's New in CDH 5.5.6

What's New in CDH 5.5.0

The following sections describe new features introduced in CDH 5.5.0.

Operating System and Database Support
Apache Flume
Apache Hadoop
Apache HBase
Cloudera Search
Apache Sentry (incubating)
Apache Spark
Apache Sqoop

Operating System and Database Support

Operating Systems - Support for RHEL/CentOS 6.6 (in SE Linux mode), 6.7, and 7.1, and Oracle Enterprise Linux 7.1.
Important: Cloudera supports RHEL 7 with the following limitations:
- Only RHEL 7.1 is supported. RHEL 7.0 is not supported.
- Only new installations of RHEL 7.1 are supported by Cloudera. For upgrades to RHEL 7.1, contact your OS vendor and see Does Red Hat support upgrades between major versions of Red Hat Enterprise Linux?
Databases - Supports MariaDB 5.5, Oracle 12c, and PostgreSQL 9.4.

Apache Flume

CDH 5.5 release is rebased on Flume 1.6.
FLUME-2498 Taildir source.
FLUME-2215 ResettableFileInputStream support for ucs-4 character.
FLUME-2729 PollableSource backoff times made configurable.
FLUME-2628 Netcat source support for different source encodings.
FLUME-2753 Support for empty replace string in Search and Replace interceptor.
FLUME-2763 Flume_env script support to handle JVM parameters.
FLUME-2095 JMS source support for username and password.

Apache Hadoop

HADOOP-1540 - DistCp supports file exclusions with a new filter option, -exclusions <argument>, to prevent files from being copied. The argument is a file that contains a list of Java regex patterns (one per line). If an exclusion pattern is matched, the file is not copied. To use, pass -filters <pathToFileterFile> to the distcp command.
HADOOP-8989 - The Hadoop shell now has a find utility, like that in UNIX, that allows users to search for files by name. Run hadoop fs -help find for more info.
HADOOP-11219, HADOOP-7280 - WebImageViewer was upgraded to Netty 4. This does not affect the external classpath of Hadoop.
HADOOP-11827 - DistCp buildListing() now uses a threadpool to improve performance. To use, pass --numListstatusThreads <numThreads> to the distcp command. The default value is 1.
HDFS-6133 - HDFS balancer supports the exclusion of subtrees because running the HDFS balancer can destroy local data that is important for applications such as the HBase RegionServer.
HDFS-8828 - DistCp leverages HDFS snapshot diff to more easily build file and directory lists. The snapshot diff report provides diff information between two snapshots or between a snapshot and a non-HDFS directory.
Improvements to HDFS scalability and performance:
- HDFS-7279 - In CDH 5.5.0 and higher, DataNode WebHDFS implementation uses Netty as an HTTP server instead of Jetty. With improved buffer and connection management, Netty lowers the risk for DataNode latency and OutOfMemoryError (OOM).
- HDFS-7435 and HDFS-8867 add more efficient over-the-wire encoding.
- HDFS-7923 adds rate-limiting for block reports so that the NameNode is not swamped by DataNodes sending too many block reports at once.
- HDFS-7923 and HDFS-7999 eliminate some cases on the DataNode side where I/O errors lead to scans being repeated on the local disks.
- HDFS-8581 fixes some cases where a lock is held for too long.
- HDFS-8792 and HDFS-7609 optimize data structures on the NameNode side.
- HDFS-9107 fixes a bug that could limit scalability on larger clusters by causing the NameNode to falsely consider DataNodes to be dead.
- Other bugs included: HADOOP-11785, HADOOP-12172, HADOOP-11659, HDFS-8845

Apache HBase

CDH now includes a scanner heartbeat check, which enforces a time limit on the execution of scan RPC requests. When the server receives a scan RPC request, a time limit is calculated to be half of the smaller of the two values hbase.client.scanner.timeout.period and hbase.rpc.timeout. When the time limit is reached, the server will return the results it has accumulated up to that point. For more information, see Configuring the HBase Scanner Heartbeat.

Cloudera Search

Cloudera Search adds support for Kerberos authentication for hosts running Solr behind a proxy server. For additional information, see:
Cloudera Search adds support for using LDAP and Active Directory for authentication. For additional information, see:
- Solr Authentication
- Enabling LDAP Authentication for Solr

solrctl supports the Config API.

solrctl includes a config command that uses the Config API to directly manage configurations represented in Config objects. Config objects represent collection configuration information as specified by the solrctl collection --create -c configName command. instancedir and Config objects handle the same information, meeting the same need from the Solr server perspective, but there a number of differences between these two implementations.

Config and instancedir Comparison
Attribute	Config	instancedir
Security	Security support provided. In a Kerberos-enabled cluster, the ZooKeeper hosts associated with configurations created using the Config API automatically have proper ZooKeeper ACLs. Sentry can be used to control access to the Config API, providing access control. For more information, see Configuring Sentry Authorization for Cloudera Search.	No ZooKeeper security support. Any user can create, delete, or modify an `instancedir` directly in ZooKeeper. Because `instancedir` updates ZooKeeper directly, it is the client's responsibility to add the proper ACLs, which can be cumbersome.
Creation method	Generated from existing `config` or `instancedir` in ZooKeeper using the ConfigSet API.	Manually edited locally and re-uploaded directly to ZooKeeper using `solrctl` utility.
Template support	Several predefined templates are available. These can be used as the basis for creating additional configs. Additional templates can be created by creating configs that are immutable. Mutable templates that use a Managed Schema can be modified using the Schema API as opposed to being manually edited. As a result, configs are less flexible, but they are also less error-prone than instancedirs.	One standard template.
Sentry support	Configs include a number of templates, each with Sentry-enabled and non-Sentry-enabled versions. To enable Sentry, choose a Sentry-enabled template.	instancedirs include a single template that supports enabling Sentry. To enable Sentry with instancedirs, overwrite the original `solrconfig.xml` file with `solrconfig.xml.secure` as described in Enabling Solr as a Client for the Sentry Service Using the Command Line.

Solr includes a set of built-in immutable configurations.

These templates are instantiated when Solr is initialized. This means these templates are not automatically available after an upgrade. To enable these templates on upgraded installations, use solrctl init or initialize Solr using Cloudera Manager. The newly included templates and the functionality each template supports are as follows:

Available Config Templates and Attributes
Template Name	Supports Schema API	Uses Schemaless Solr	Supports Sentry
predefinedTemplate
managedTemplate
schemalessTemplate
predefinedTemplateSecure
managedTemplateSecure
schemalessTemplateSecure

Apache Sentry (incubating)

Sentry is rebased on Apache Sentry 1.5.1.
Sentry introduces column-level access control for tables in Hive and Impala. Previously, Sentry supported privilege granularity only at the table level. To restrict access to a column of sensitive data, you needed to first create a view for a subset of columns, and then grant privileges on that view. Instead, Sentry now allows you to assign the SELECT privilege on a subset of columns in a table. See, Hive SQL Syntax for Use with Sentry.
Support for enabling Kerberos authentication for the Sentry web server.

Apache Spark

Spark is rebased on Apache Spark 1.5.0.
Dynamic allocation is enabled by default. You can explicitly disable dynamic allocation by using the option: spark.dynamicAllocation.enabled = false. Dynamic allocation is implicitly disabled if --num-executors is specified in the job.
The following Spark libraries are now supported:
- Spark SQL (including DataFrames). The following Spark SQL features are not supported:
  - Thrift JDBC/ODBC server
  - Spark SQL CLI
  See Using Spark SQL.
- MLlib. The following MLlib features are not supported:
  - spark.ml
  - ML pipeline APIs
- See Using Spark MLlib.

Apache Sqoop

Sqoop is rebased on Apache Sqoop 1.4.6.

What's New in CDH 5.5.1

This is a maintenance release that fixes important issues in Apache Commons and Apache HBase; for details, see Issues Fixed in CDH 5.5.1.

What's New in CDH 5.5.2

This is a maintenance release that fixes some important issues. For details, see Issues Fixed in CDH 5.5.2.

What's New in CDH 5.5.4

This is a maintenance release that fixes some important issues. For details, see Issues Fixed in CDH 5.5.4.

What's New in CDH 5.5.5

This is a maintenance release that fixes some important issues. For details, see Issues Fixed in CDH 5.5.5 .

What's New in CDH 5.5.6

This is a maintenance release that fixes some important issues. For details, see Issues Fixed in CDH 5.5.6.

What's New In CDH 5.6.x

What's New In CDH 5.4.x