What's New in CDH 5.1.x

What's New in CDH 5.1.0

This is a minor release, which includes new features, changes, and fixed issues. See also Issues Fixed in CDH 5.1.0.

New Features and Changes in CDH 5.1.0

This is a minor release which introduces the following new features and changes, organized by component. See also What's New in Apache Impala.

Operating System Support

CDH 5.1 adds support for version 6.5 of RHEL and related platforms. See CDH and Cloudera Manager Supported Operating Systems.

Apache Crunch

  • CDH 5.1.0 implements Crunch 0.10.0.

Apache Flume

  • CDH 5.1.0 implements Flume 1.5.0.

Apache Hadoop

HDFS

POSIX Access Control Lists: As of CDH 5.1, HDFS supports POSIX Access Control Lists (ACLs), an addition to the traditional POSIX permissions model already supported. ACLs provide fine-grained control of permissions for HDFS files by providing a way to set different permissions for specific named users or named groups. For more information, see HDFS Extended ACLs.

NFS Gateway Improvements:CDH 5.1 makes the following improvements to the HDFS NFS gateway capability:
  • Subdirectory mounts :
    • Previously, clients could mount only the HDFS root directory.
    • As of CDH 5.1. a single mount point, configured via the nfs.export.point property in hdfs-site.xml on the NFS gateway node, is available to clients.
  • Improved support for Kerberized clusters (HDFS-5898):
    • Previously the NFS Gateway could connect to a secure cluster, but didn’t support logging in from a keytab.
    • As of CDH 5.1, set the nfs.kerberos.principal and nfs.keytab.file properties in hdfs-site.xml to allow users to log in from a keytab.
  • Support for port monitoring (HDFS-6406):
    • Previously, the NFS Gateway would always accept connections from any client.
    • As of CDH 5.1, set nfs.port.monitoring.disabled to false in hdfs-site.xml to allow connections only from privileged ports (those with root access).
  • Static uid/gid mapping for NFS clients that are not in synch with the NFS Gateway (HDFS-6435):
    • NFS sends UIDs and GIDs over the network from client to server, meaning that the UIDs and GIDs must be in synch between clients and server machines in order for users and groups to be set appropriately for file access and file creation; this is usually but not always the case.
    • As of CDH 5.1, you can configure a static UID/GID mapping file, by default /etc/nfs.map.
    • You can change the default (to use a different file path) by means of the nfs.static.mapping.file property in hdfs-site.xml.
    • The following sample entries illustrate the format of the file:
      uid 10 100 # Map the remote UID 10 the local UID 100
      gid 11 101 # Map the remote GID 11 to the local GID 101
  • Hadoop portmap, or insecure system portmap, no longer required:
    • Many supported OS have portmap bugs detailed here.
    • CDH 5.1 allows you to circumvent the problems by starting the NFS gateway as root, whether you install CDH from packages or parcels.
    • Cloudera Manager starts the gateway as root by default.
  • Support for AIX NFS clients (HDFS-6549):
    • To deploy AIX NFS clients, set nfs.aix.compatibility.mode.enabled to true in hdfs-site.xml.
    • This enables code that handles bugs in the AIX implementation of NFS.
For more information, see Configuring an NFSv3 Gateway Using the Command Line.
MapReduce and YARN

YARN with Impala supports Dynamic Prioritization.

Apache HBase

  • CDH 5.1.0 implements HBase 0.98.
  • As of CDH 5.1.0, HBase fully supports BucketCache, which was introduced as an experimental feature in CDH 5 Beta 1.
  • HBase now supports access control for EXEC permissions.
  • CDH 5.1.0 HBase introduces a reverse scan API; allowing you to scan a table in reverse.
  • You can now run a MapReduce job over a snapshot from HBase, rather than being limited to live data.
  • A new stateless streaming scanner is available over the REST API.
  • The delete* methods of the Delete class of the HBase Client API now use the timestamp from the constructor, the same behavior as the Put class. (In HBase versions before CDH 5.1, the delete* methods ignored the constructor's timestamp, and used the value of HConstants.LATEST_TIMESTAMP. This behavior was different from the behavior of the add() methods of the Put class.)
  • The SnapshotInfo tool has been enhanced in the following ways:
    • A new option, -list-snapshots, has been added to the SnapshotInfo command. This option allows you to list snapshots on either a local or remote server.
    • You can now pass the -size-in-bytes flag to print the size of snapshot files in bytes rather than the default human-readable format.
    • The size of each snapshot file in bytes is checked against the size reported in the manifest, and if the two sizes differ, the tool reports the file as corrupt.
  • A new -target option for ExportSnapshot allows you to specify a different name for the target cluster from the snapshot name on the source cluster.
In addition, Cloudera has fixed some binary incompatibilities between HBase 0.96 and 0.98. As a result, the incompatibilities introduced by HBASE-10452 and HBASE-10339 do not affect CDH 5.1 HBase, as explained below:
  • HBASE-10452 introduced a new exception and error message in setTimeStamp(), for an extremely unlikely event when where getting a TimeRange could fail because of an integer overflow. CDH 5.1 suppresses the new exception to retain compatibility with HBase 0.96, but logs the error.
  • HBASE-10339 contained code which inadvertently changed the signatures of the getFamilyMap method. CDH 5.1 restores these signatures to those used in HBase 0.96, to retain compatibility.

Apache Hive

  • Permission inheritance fixes
  • Support for decimal computation, and for reading and writing decimal-format data from and to Parquet and Avro

Hue

CDH 5.1.0 implements Hue 3.6.

New Features:

  • Search App v2:
    • 100% Dynamic dashboard
    • Drag-and-Drop dashboard builder
    • Text, Timeline, Pie, Line, Bar, Map, Filters, Grid and HTML widgets
    • Solr Index creation wizard (from a file)
  • Ability to view compressed Snappy, Avro and Parquet files
  • Impala HA
  • Close Impala and Hive sessions queries and commands

Apache Mahout

  • CDH 5.1.0 implements Mahout 0.9.

See also Apache Mahout Incompatible Changes and Limitations.

Apache Oozie

  • You can now submit Sqoop jobs from the Oozie command line.
  • LAST_ONLY execution mode now works correctly (OOZIE-1319).

Cloudera Search

New Features:

  • A Quick Start script that automates using Search to query data from the Enron Email dataset. The script downloads the data, expands it, moves it to HDFS, indexes, and pushes the results live. The documentation now also includes a companion quick start guide, which describes the tasks the script completes, as well as customization options.
  • solrctl now has built-in support for schema-less Solr. For more information, see Schemaless Mode Overview and Best Practices.
  • Sentry-based document-level security for role-based access control of a collection. Document-level access control associates authorization tokens with each document in the collection, enabling granting Sentry roles access to sets of documents in a collection.
  • Cloudera Search includes a version of Kite 0.10.0, which includes all morphlines-related backports of all fixes and features in Kite 0.15.0. For additional information on Kite, see:
  • Support for the Parquet file format is included with this version of Kite 0.10.0.
  • Inclusion of hbase-indexer-1.5.1, a new version of the Lily HBase Indexer. This new version of the indexer includes the 0.10.0 version of Kite mentioned above. This 0.10.0 version of Kite includes the backports and fixes included in Kite 0.15.0.

Apache Sentry (incubating)

  • CDH 5.1.0 implements Sentry 1.2. This includes a database-backed Sentry service which uses the more traditional GRANT/REVOKE statements instead of the previous policy file approach making it easier to maintain and modify privileges.
  • Revised authorization privilege model for Hive and Impala. For more details, see The Sentry Service.

Apache Spark

  • CDH 5.1.0 implements Spark 1.0.
  • The spark-submit command abstracts across the variety of deployment modes that Spark supports and takes care of assembling the classpath for you.
  • Application History Server (SparkHistoryServer) improves monitoring capabilities.
  • You can launch PySpark applications against YARN clusters. PySpark currently only works in YARN Client mode.
Other improvements include:
  • Streaming integration with Kerberos
  • Addition of more algorithms to MLLib (Sparse Vector Support)
  • Improvements to Avro integration
  • Spark SQL alpha release (new SQL engine). Spark SQL allows you to run SQL statements inside a Spark application that manipulate and produce RDDs.
  • Authentication of all Spark communications

What's New in CDH 5.1.2

This is a maintenance release which fixes several issues. See Issues Fixed in CDH 5.1.2

What's New in CDH 5.1.3

This is a maintenance release that fixes several issues. See Issues Fixed in CDH 5.1.3.

What's New in CDH 5.1.4

This is a maintenance release that fixes important security issues. See Issues Fixed in CDH 5.1.4,

What's New in CDH 5.1.5

This is a maintenance release that fixes several issues. See Issues Fixed in CDH 5.1.5.