Issues Fixed in CDH 5.1.x

The following topics describe issues fixed in CDH 5.1.x, from newest to oldest release. You can also review What's New in CDH 5.1.x or Known Issues in CDH 5.

Issues Fixed in CDH 5.1.7
Issues Fixed in CDH 5.1.5
Issues Fixed in CDH 5.1.4
Issues Fixed in CDH 5.1.3
Issues Fixed in CDH 5.1.2
Issues Fixed in CDH 5.1.0

Issues Fixed in CDH 5.1.7

Upstream Issues Fixed

CDH 5.1.7 includes the following issues fixed upstream:

HDFS-7960 - The full block report should prune zombie storages even if they're not empty
HDFS-7278 - Add a command that allows sysadmins to manually trigger full block reports from a DN
HDFS-6831 - Inconsistency between 'hdfs dfsadmin' and 'hdfs dfsadmin -help'
HDFS-7596 - NameNode should prune dead storages from storageMap
HDFS-7208 - NN doesn't schedule replication when a DN storage fails
HDFS-7575 - Upgrade should generate a unique storage ID for each volume
HDFS-6529 - Trace logging for RemoteBlockReader2 to identify remote datanode and file being read
YARN-570 - Time strings are formated in different timezone
YARN-2251 - Avoid negative elapsed time in JHS/MRAM web UI and services
YARN-2588 - Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception.
HIVE-8634 - HiveServer2 fair scheduler queue mapping doesn't handle the secondary groups rules correctly
HIVE-8634 - HiveServer2 fair scheduler queue mapping doesn't handle the secondary groups rules correctly
HIVE-6403 - uncorrelated subquery is failing with auto.convert.join=true
HIVE-5945 - ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.
HIVE-8916 - Handle user@domain username under LDAP authentication
HIVE-8874 - Error Accessing HBase from Hive via Oozie on Kerberos 5.0.1 cluster
HIVE-9716 - Map job fails when table's LOCATION does not have scheme
HIVE-8784 - Querying partition does not work with JDO enabled against PostgreSQL
HUE-2484 - [beeswax] Configure support for Hive Server2 LDAP authentication
HUE-2446 - Migrating from CDH 4.7 to CDH 5.0.1+/Hue 3.5+ will fail
PARQUET-107 - Add option to disable summary metadata aggregation after MR jobs
SOLR-6268 - HdfsUpdateLog has a race condition that can expose a closed HDFS FileSystem instance and should close it's FileSystem instance if either inherited close method is called.
SOLR-6393 - Improve transaction log replay speed on HDFS.
SOLR-6403 - TransactionLog replay status logging.
IMPALA-1801 - external-data-source-executor leaking global jni refs
IMPALA-1794 - Fix infinite loop opening/closing file w/ invalid metadata
IMPALA-1674 - Fix serious memory leak in TSaslTransport
IMPALA-1668 - Fix leak of transport objects in TSaslServerTransport::Factory
IMPALA-1556 - TSaslTransport.read() should return available data before next frame
IMPALA-1565 - Python sasl client transport perf issue
IMPALA-1556 - Sasl transport should be wrapped with buffered transport
IMPALA-1442 - Better fix for non-buffered SASL transports The Thrift SASL implementation relies on the

Apache Commons Collections deserialization vulnerability

Cloudera has learned of a potential security vulnerability in a third-party library called the Apache Commons Collections. This library is used in products distributed and supported by Cloudera (“Cloudera Products”), including core Apache Hadoop. The Apache Commons Collections library is also in widespread use beyond the Hadoop ecosystem. At this time, no specific attack vector for this vulnerability has been identified as present in Cloudera Products.

In an abundance of caution, we are currently in the process of incorporating a version of the Apache Commons Collections library with a fix into the Cloudera Products. In most cases, this will require coordination with the projects in the Apache community. One example of this is tracked by HADOOP-12577.

The Apache Commons Collections potential security vulnerability is titled “Arbitrary remote code execution with InvokerTransformer” and is tracked by COLLECTIONS-580. MITRE has not issued a CVE, but related CVE-2015-4852 has been filed for the vulnerability. CERT has issued Vulnerability Note #576313 for this issue.

Releases affected: CDH 5.5.0, CDH 5.4.8 and lower, CDH 5.3.8 and lower, CDH 5.2.8 and lower, CDH 5.1.7 and lower, Cloudera Manager 5.5.0, Cloudera Manager 5.4.8 and lower, Cloudera Manager 5.3.8 and lower, and Cloudera Manager 5.2.8 and lower, Cloudera Manager 5.1.6 and lower, Cloudera Manager 5.0.7 and lower, Cloudera Navigator 2.4.0, Cloudera Navigator 2.3.8 and lower.

Users affected: All

Impact: This potential vulnerability may enable an attacker to execute arbitrary code from a remote machine without requiring authentication.

Immediate action required: Upgrade to Cloudera Manager 5.5.1 and CDH 5.5.1, Cloudera Manager 5.4.9 and CDH 5.4.9, Cloudera Manager 5.3.9 and CDH 5.3.9, and Cloudera Manager 5.2.9 and CDH 5.2.9, and Cloudera Manager 5.1.7 and CDH 5.1.7, and Cloudera Manager 5.0.8 and CDH 5.0.8.

Issues Fixed in CDH 5.1.5

This is a maintenance release that fixes the following issues:

HDFS-7960 - The full block report should prune zombie storages even if they're not empty
HDFS-7278 - Add a command that allows sysadmins to manually trigger full block reports from a DN
HDFS-6831 - Inconsistency between 'hdfs dfsadmin' and 'hdfs dfsadmin -help'
HDFS-7596 - NameNode should prune dead storages from storageMap
HDFS-7208 - NN doesn't schedule replication when a DN storage fails
HDFS-7575 - Upgrade should generate a unique storage ID for each volume
HDFS-6529 - Trace logging for RemoteBlockReader2 to identify remote datanode and file being read
YARN-570 - Time strings are formated in different timezone
YARN-2251 - Avoid negative elapsed time in JHS/MRAM web UI and services
YARN-2588 - Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception.
HIVE-8634 - HiveServer2 fair scheduler queue mapping doesn't handle the secondary groups rules correctly
HIVE-8634 - HiveServer2 fair scheduler queue mapping doesn't handle the secondary groups rules correctly
HIVE-6403 - uncorrelated subquery is failing with auto.convert.join=true
HIVE-5945 - ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.
HIVE-8916 - Handle user@domain username under LDAP authentication
HIVE-8874 - Error Accessing HBase from Hive via Oozie on Kerberos 5.0.1 cluster
HIVE-9716 - Map job fails when table's LOCATION does not have scheme
HIVE-8784 - Querying partition does not work with JDO enabled against PostgreSQL
HUE-2484 - [beeswax] Configure support for Hive Server2 LDAP authentication
HUE-2446 - Migrating from CDH 4.7 to CDH 5.0.1+/Hue 3.5+ will fail
PARQUET-107 - Add option to disable summary metadata aggregation after MR jobs
SOLR-6268 - HdfsUpdateLog has a race condition that can expose a closed HDFS FileSystem instance and should close it's FileSystem instance if either inherited close method is called.
SOLR-6393 - Improve transaction log replay speed on HDFS.
SOLR-6403 - TransactionLog replay status logging.
IMPALA-1801 - external-data-source-executor leaking global jni refs
IMPALA-1794 - Fix infinite loop opening/closing file w/ invalid metadata
IMPALA-1674 - Fix serious memory leak in TSaslTransport
IMPALA-1668 - Fix leak of transport objects in TSaslServerTransport::Factory
IMPALA-1556 - TSaslTransport.read() should return available data before next frame
IMPALA-1565 - Python sasl client transport perf issue
IMPALA-1556 - Sasl transport should be wrapped with buffered transport
IMPALA-1442 - Better fix for non-buffered SASL transports The Thrift SASL implementation relies on the

Issues Fixed in CDH 5.1.4

CDH 5.1.4 fixes the following issues, organized by component. See What's New in CDH 5.1.x for a list of the most important upstream problems fixed in this release.

HTTPS does not work on the HTTPS configured port

If you enable HTTPS (TLS/SSL) for YARN services, these services (including ResourceManager, NodeManager, and Job History Server) will not continue to use non-secure HTTP, but HTTPS does not work on the HTTPS configured port.

Bug: YARN-1553

Workaround: None.

Upstream Issues Fixed

In addition to the above, CDH 5.1.4 includes the following issues fixed upstream:

DATAFU-68 - SampleByKey can throw NullPointerException
HADOOP-11243 - SSLFactory shouldn't allow SSLv3
HADOOP-11156 - DelegateToFileSystem should implement getFsStatus(final Path f).
HDFS-7391 - Reenable SSLv2Hello in HttpFS
HDFS-7235 - DataNode#transferBlock should report blocks that don't exist using reportBadBlock
HDFS-7274 - Disable SSLv3 in HttpFS
HDFS-7005 - DFS input streams do not timeout
HDFS-6376 - Distcp data between two HA clusters requires another configuration
HDFS-6621 - Hadoop Balancer prematurely exits iterations
YARN-2273 - NPE in ContinuousScheduling thread when we lose a node
YARN-2566 - DefaultContainerExecutor should pick a working directory randomly
YARN-2588 - Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception.
YARN-2641 - Decommission nodes on -refreshNodes instead of next NM-RM heartbeat
YARN-2608 - FairScheduler: Potential deadlocks in loading alloc files and clock access
HBASE-12376 - HBaseAdmin leaks ZK connections if failure starting watchers (ConnectionLossException)
HBASE-12366 - Add login code to HBase Canary tool
HBASE-12098 - User granted namespace table create permissions can'apos;t create a table
HBASE-12087 - [0.98] Changing the default setting of hbase.security.access.early_out to true
HBASE-11896 - LoadIncrementalHFiles fails in secure mode if the namespace is specified
HBASE-12054 - bad state after NamespaceUpgrade with reserved table names
HBASE-12460 - Moving Chore to hbase-common module
HIVE-5643 - ZooKeeperHiveLockManager.getQuorumServers incorrectly appends the custom zk port to quorum hosts
HIVE-8675 - Increase thrift server protocol test coverage
HIVE-8827 - Remove SSLv2Hello from list of disabled protocols protocols
HIVE-8182 - beeline fails when executing multiple-line queries with trailing spaces
HIVE-8330 - HiveResultSet.findColumn() parameters are case sensitive
HIVE-5994 - ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits )
HIVE-7629 - Problem in SMB Joins between two Parquet tables
HIVE-6670 - ClassNotFound with Serde
HIVE-6409 - FileOutputCommitterContainer::commitJob() cancels delegation tokens too early.
HIVE-7647 - Beeline does not honor --headerInterval and --color when executing with \
HIVE-7441 - Custom partition scheme gets rewritten with hive scheme upon concatenate
HIVE-5871 - Use multiple-characters as field delimiter
HIVE-1363 - SHOW TABLE EXTENDED LIKE command does not strip single/double quotes
HIVE-5989 - Hive metastore authorization check is not threadsafe
HUE-2438 - [core] Disable SSLv3 for Poodle vulnerability
HUE-2291 - [oozie] Faster dashboard display
IMPALA-1334 - Impala does not map principals to lowercase, affecting Sentry authorisation
IMPALA-1251 - High-offset queries hang
IMPALA-1338 - HDFS does not return all ACLs in getAclStatus()
IMPALA-1279 - Impala does not employ ACLs when checking path permissions for LOAD and INSERT
OOZIE-2034 - Disable SSLv3 (POODLEbleed vulnerability)
OOZIE-2063 - Cron syntax creates duplicate actions
SENTRY-428 - Sentry service should periodically renew the server kerberos ticket
SENTRY-431 - Sentry db provider client should attempt to refresh kerberos ticket before connection
SPARK-3606 - Spark-on-Yarn AmIpFilter does not work with Yarn HA

Issues Fixed in CDH 5.1.3

CDH 5.1.3 fixes the following issues, organized by component. See New Features and Changes in CDH 5 for a list of the most important upstream problems fixed in this release.

Apache Hadoop

The default setting of dfs.client.block.write.replace-datanode-on-failure.policy can cause an unrecoverable error in small clusters

The default setting of dfs.client.block.write.replace-datanode-on-failure.policy (DEFAULT) can cause an unrecoverable error in a small cluster during HBase rolling restart.

Bug: HDFS-4257

Workaround: Set dfs.client.block.write.replace-datanode-on-failure.policy to NEVER for 1- 2- or 3-node clusters, and leave it as DEFAULT for all other clusters. Leave dfs.client.block.write.replace-datanode-on-failure.enable set to true.

Upstream Issues Fixed

In addition to the above, CDH 5.1.3 includes the following issues fixed upstream.

HADOOP-11035 - distcp on mr1(branch-1) fails with NPE using a short relative source path.
HBASE-10188 - Hide ServerName constructor
HBASE-10012 - Hide ServerName constructor
HBASE-11349 - [Thrift] support authentication/impersonation
HBASE-11446 - Reduce the frequency of RNG calls in SecureWALCellCodec#EncryptedKvEncoder
HBASE-11457 - Increment HFile block encoding IVs accounting for cipher's internal use
HBASE-11474 - [Thrift2] support authentication/impersonation
HBASE-11565 - Stale connection could stay for a while
HBASE-11627 - RegionSplitter's rollingSplit terminated with "/ by zero", and the _balancedSplit file was not deleted properly
HBASE-11788 - hbase is not deleting the cell when a Put with a KeyValue, KeyValue.Type.Delete is submitted
HBASE-11828 - Callers of ServerName.valueOf should use equals and not ==
HDFS-4257 - The ReplaceDatanodeOnFailure policies could have a forgiving option
HDFS-6776 - Using distcp to copy data between insecure and secure cluster via webhdfs does not work
HDFS-6908 - Incorrect snapshot directory diff generated by snapshot deletion
HUE-2247 - [Impala] Support pass-through LDAP authentication
HUE-2295 - [librdbms] External oracle DB connection is broken due to a typo
HUE-2273 - [desktop] Blacklisting apps with existing document will break home page
HUE-2318 - [desktop] Documents shared with write group permissions are not editable
HIVE-5087 - Rename npath UDF to matchpath
HIVE-6820 - HiveServer(2) ignores HIVE_OPTS
HIVE-7635 - Query having same aggregate functions but different case throws IndexOutOfBoundsException
IMPALA-958 - Excessively long query plan serialization time in FE when querying huge tables
IMPALA-1091 - Improve TScanRangeLocation struct and associated code
OOZIE-1989 - NPE during a rerun with forks
YARN-1458 - FairScheduler: Zero weight can lead to livelock

Issues Fixed in CDH 5.1.2

CDH 5.1.2 fixes the following issues, organized by component. See What's New in CDH 5.1.x for a list of the most important upstream problems fixed in this release.

Apache Hadoop

Jobs can hang on NodeManager decommission owing to a race condition when continuous scheduling is enabled.

Bug: YARN-2273

Workaround: Disable continuous scheduling by setting yarn.scheduler.fair.continuous-scheduling-enabled to false

Apache HBase

Sending a large amount of invalid data to the Thrift service can cause it to crash

Bug: HBASE-11052.

Workaround: None. This is a longstanding problem, not a new issue in CDH 5.1.

The metric `ageOfLastShippedOp` never decreases

This can cause it to appear as though the cluster is in an inconsistent state even when there is no problem.

Bug: HBASE-11143.

Workaround: None.

Upstream Issues Fixed

In addition to the above, CDH 5.1.2 includes the following issues fixed upstream.

Issues Fixed in CDH 5.1.0

CDH 5.1.0 fixes the following issues, organized by component.

Apache Hadoop
Apache HBase
Hue
Apache Oozie

Apache Hadoop

HDFS

The same DataNodes may appear in the NameNode web UI in both the live and dead node lists

Bug: HDFS-6180

Workaround: None

MapReduce

YARN Fair Scheduler's Cluster Utilization Threshold check is broken

Bug: YARN-1640

Workaround: Set the yarn.scheduler.fair.preemption.cluster-utilization-threshold property in yarn-site.xml to -1.

ResourceManager High Availability with manual failover does not work on secure clusters

Bug: YARN-2155

Workaround: Enable automatic failover; this requires ZooKeeper.

Apache HBase

MapReduce over HBase Snapshot bypasses HBase-level security

The MapReduce over HBase Snapshot bypasses HBase-level security completely since the files are read from the HDFS directly. The user who is running the scan/job has to have read permissions to the data and snapshot files.

Bug: HBASE-8369

Workaround: MapReduce users must be trusted to process/view all data in HBase.

HBase snapshots now saved to the /<hbase>/.hbase-snapshot directory

HBase snapshots are now saved to the /<hbase>/.hbase-snapshot directory instead of the /.snapshot directory. This was a conflict introduced by the HDFS snapshot feature in Hadoop 2.2/CDH 5 HDFS.

Bug: HBASE-8352

Workaround: This should be handled in the upgrade process.

Hue

Oozie jobs don't support ResourceManager HA in YARN

If the ResourceManager fails, the workflow will fail.

Bug: None

Severity: Medium

Workaround: None

Apache Oozie

Oozie HA does not work properly with HCatalog integration or SLA notifications

This issue appears when you are using HCatalog as a data dependency in a coordinator; using HCatalog from an action (for example, Pig) works correctly.

Bug: OOZIE-1492

Workaround: None

Issues Fixed in CDH 5.2.x

Issues Fixed in CDH 5.0.x

Issues Fixed in CDH 5.1.x

Issues Fixed in CDH 5.1.7

Upstream Issues Fixed

Apache Commons Collections deserialization vulnerability

Issues Fixed in CDH 5.1.5

Issues Fixed in CDH 5.1.4

HTTPS does not work on the HTTPS configured port

Upstream Issues Fixed

Issues Fixed in CDH 5.1.3

Apache Hadoop

The default setting of dfs.client.block.write.replace-datanode-on-failure.policy can cause an unrecoverable error in small clusters

Upstream Issues Fixed

Issues Fixed in CDH 5.1.2

Apache Hadoop

Jobs can hang on NodeManager decommission owing to a race condition when continuous scheduling is enabled.

Apache HBase

Sending a large amount of invalid data to the Thrift service can cause it to crash

The metric ageOfLastShippedOp never decreases

Upstream Issues Fixed

Issues Fixed in CDH 5.1.0

Apache Hadoop

HDFS

The same DataNodes may appear in the NameNode web UI in both the live and dead node lists

MapReduce

YARN Fair Scheduler's Cluster Utilization Threshold check is broken

ResourceManager High Availability with manual failover does not work on secure clusters

Apache HBase

MapReduce over HBase Snapshot bypasses HBase-level security

HBase snapshots now saved to the /<hbase>/.hbase-snapshot directory

Hue

Oozie jobs don't support ResourceManager HA in YARN

Apache Oozie

Oozie HA does not work properly with HCatalog integration or SLA notifications

The metric `ageOfLastShippedOp` never decreases