Issues Fixed in CDH 5.0.x

Issues Fixed in CDH 5.0.6

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.0.6:
  • HDFS-7960 - The full block report should prune zombie storages even if they're not empty
  • HDFS-7278 - Add a command that allows sysadmins to manually trigger full block reports from a DN
  • HDFS-6831 - Inconsistency between hdfs dfsadmin and hdfs dfsadmin -help
  • HDFS-7596 - NameNode should prune dead storages from storageMap
  • HDFS-7208 - NN does not schedule replication when a DN storage fails
  • HDFS-7575 - Upgrade should generate a unique storage ID for each volume
  • YARN-570 - Time strings are formatted in different timezone
  • YARN-2251 - Avoid negative elapsed time in JHS/MRAM web UI and services
  • HIVE-8874 - Error Accessing HBase from Hive via Oozie on Kerberos 5.0.1 cluster
  • SOLR-6268 - HdfsUpdateLog has a race condition that can expose a closed HDFS FileSystem instance and should close its FileSystem instance if either inherited close method is called.
  • SOLR-6393 - Improve transaction log replay speed on HDFS.
  • SOLR-6403 - TransactionLog replay status logging.

Issues Fixed in CDH 5.0.5

“POODLE” Vulnerability on TLS/SSL enabled ports

The POODLE (Padding Oracle On Downgraded Legacy Encryption) attack takes advantage of a cryptographic flaw in the obsolete SSLv3 protocol, after first forcing the use of that protocol. The only solution is to disable SSLv3 entirely. This requires changes across a wide variety of components of CDH and Cloudera Manager in all current versions. CDH 5.0.5 provides these changes for CDH 5.0.x deployments.

For more information, see the Cloudera Security Bulletin.

Apache Hadoop Distributed Cache Vulnerability

The Distributed Cache Vulnerability allows a malicious cluster user to expose private files owned by the user running the YARN NodeManager process. For more information, see the Cloudera Security Bulletin.

Upstream Issues Fixed

CDH 5.0.4 includes the following issues fixed upstream.
  • HADOOP-11243 - SSLFactory shouldn't allow SSLv3
  • HDFS-7274 - Disable SSLv3 in HttpFS
  • HDFS-7391 - Reenable SSLv2Hello in HttpFS
  • HBASE-12376 - HBaseAdmin leaks ZK connections if failure starting watchers (ConnectionLossException)
  • HIVE-8675 - Increase thrift server protocol test coverage
  • HIVE-8827 - Remove SSLv2Hello from list of disabled protocols protocols
  • HUE-2438 - [core] Disable SSLv3 for Poodle vulnerability
  • OOZIE-2034 - Disable SSLv3 (POODLEbleed vulnerability)
  • OOZIE-2063 - Cron syntax creates duplicate actions

Issues Fixed in CDH 5.0.4

Upstream Issues Fixed

Issues Fixed in CDH 5.0.3

The following topics describe known issues fixed in CDH 5.0.3. See What's New in CDH 5.0.x for a list of the most important upstream problems fixed in this release.

Apache Hadoop

MapReduce

YARN Fair Scheduler's Cluster Utilization Threshold check is broken

Bug: YARN-2155

Workaround: Set the yarn.scheduler.fair.preemption.cluster-utilization-threshold property in yarn-site.xml to -1.

Apache Oozie

When Oozie is configured to use MRv1 and TLS/SSL, YARN / MRv2 libraries are erroneously included in the classpath instead

This problem causes much of the configured Oozie functionality to be unusable.

Bug: None

Workaround: Use a different configuration (non-TLS/SSL or YARN), if possible.

Upstream Issues Fixed

Issues Fixed in CDH 5.0.2

The following topics describe known issues fixed in CDH 5.0.2. See What's New in CDH 5.0.x for a list of the most important upstream problems fixed in this release.

Apache Hadoop

CDH 5 clients running releases 5.0.1 and earlier cannot use WebHDFS to connect to a CDH 4 cluster

For example, a hadoop fs -ls webhdfs command run from the CDH 5 client to the CDH 4 cluster produces an error such as the following:
Found 21 items
ls: Invalid value  for webhdfs parameter  "op": No  enum const class org.apache.hadoop.hdfs.web.resources.GetOpParam.Op.GETACLSTATUS

Bug: HDFS-6326

Workaround: None; note that this is fixed as of CDH 5.0.2.

Apache HBase

Endless Compaction Loop

If an empty HFile whose max timestamp is past its TTL (time-to-live) is selected for compaction, it is compacted into another empty HFile, which is selected for compaction, creating an endless compaction loop.

Bug: HBASE-10371

Workaround: None

Upstream Issues Fixed

In addition to the above, CDH 5.0.2 includes the following issues fixed upstream.

Issues Fixed in CDH 5.0.1

CDH 5.0.1 fixes the following issues, organized by component..

Apache Hadoop

HDFS

NameNode LeaseManager may crash

Bug: HDFS-6148/HDFS-6094

Workaround: Restart the NameNode.

Some group mapping providers can cause the NameNode to crash

In certain environments, some group mapping providers can cause the NameNode to segfault and crash.

Bug: HADOOP-10442

Workaround: Configure either ShellBasedUnixGroupsMapping in Hadoop or configure SSSD in the operating system on the NameNode.

Apache Hive

CREATE TABLE AS SELECT (CTAS) does not work with Parquet files

Since CTAS does not work with Parquet files, the following example will return null values.
CREATE TABLE test_data(column1 string);
LOAD DATA LOCAL INPATH './data.txt' OVERWRITE INTO TABLE test_data;

CREATE TABLE parquet_test
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
   STORED AS
    INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
    OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'
   AS
     SELECT column1 FROM test_data;

SELECT * FROM parquet_test;
SELECT column1 FROM parquet_test;

Bug: HIVE-6375

Workaround: A workaround for this is to follow up a CREATE TABLE query with an INSERT OVERWRITE TABLE SELECT * as in the example below.
CREATE TABLE parquet_test (column1 string)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
  STORED AS
   INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
   OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';
INSERT OVERWRITE TABLE parquet_test SELECT * from test_data;

Apache Oozie

The oozie-workflow-0.4.5 schema has been removed

Workflows using schema 0.4.5 will no longer be accepted by Oozie because this schema definition version has been removed.

Bug: OOZIE-1768

Workaround: Use schema 0.5. It's backwards compatible with 0.4.5, so updating the workflow is as simple as changing the schema version number.

Upstream Issues Fixed

In addition to the above, CDH 5.0.1 includes the following issues fixed upstream.
  • HADOOP-10442 - Group look-up can cause segmentation fault when a certain JNI-based mapping module is used.
  • HADOOP-10456 - Bug in Configuration.java exposed by Spark (ConcurrentModificationException)
  • HDFS-5064 - Standby checkpoints should not block concurrent readers
  • HDFS-6039 - Uploading a File under a Dir with default ACLs throws "Duplicated ACLFeature"
  • HDFS-6094 - The same block can be counted twice towards safe mode threshold
  • HDFS-6231 - DFSClient hangs infinitely if using hedged reads and all eligible DataNodes die
  • HIVE-6495 - TableDesc.getDeserializer() should use correct classloader when calling Class.forName()
  • HIVE-6575 - select * fails on parquet table with map data type
  • HIVE-6648 - Fixed permission inheritance for multi-partitioned tables
  • HIVE-6740 - Fixed addition of Avro JARs to classpath
  • HUE-2061 - Task logs are not retrieved if containers not on the same host
  • OOZIE-1794 - java-opts and java-opt in the Java action don't always work properly in YARN
  • SOLR-5608 - Frequently reproducible failures in CollectionsAPIDistributedZkTest#testDistribSearch
  • YARN-1924 - STATE_STORE_OP_FAILED happens when ZKRMStateStore tries to update app(attempt) before storing it

Issues Fixed in CDH 5.0.0

CDH 5.0.0 fixes the following issues, organized by component.

Apache Flume

AsyncHBaseSink does not work in CDH 5 Beta 1 and CDH 5 Beta 2

Bug: None

Workaround: Use the HBASE sink (org.apache.flume.sink.hbase.HBaseSink) to write to HBase in CDH 5 Beta releases.

Apache Hadoop

HDFS

DataNode can consume 100 percent of one CPU

A narrow race condition can cause one of the threads in the DataNode process to get stuck in a tight loop and consume 100 percent of one CPU.

Bug: HDFS-5922

Workaround: Restart the DataNode process.

HDFS NFS gateway does not work with Kerberos-enabled clusters

Bug: HDFS-5898

Workaround: None.

Cannot browse filesystem via NameNode Web UI if any directory has the sticky bit set

When listing any directory which contains an entry that has the sticky bit permission set, for example /tmp is often set this way, nothing will appear where the list of files or directories should be.

Bug: HDFS-5921

Workaround: Use the Hue File Browser.

Appending to a file that has been snapshotted previously will append to the snapshotted file as well

If you append content to a file that exists in snapshot, the file in snapshot will have the same content appended to it, invalidating the original snapshot.

Bug: See also HDFS-5343

Workaround: None

MapReduce

In MRv2 (YARN), the JobHistory Server has no information about a job if the ApplicationMasters fails while the job is running

Bug: None

Workaround: None.

Apache HBase

An empty rowkey is treated as the first row of a table

An empty rowkey is allowed in HBase, but it was treated as the first row of the table, even if it was not in fact the first row. Also, multiple rows with empty rowkeys caused issues.

Bug: HBASE-3170

Workaround: Do not use empty rowkeys.

Apache Hive

Hive queries that combine multiple splits and query large tables fail on YARN

Hive queries that scan large tables, or perform map side joins may fail with the following exception when the query is run using YARN:
java.io.IOException: Max block location exceeded for split:
InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
splitsize: 21 maxsize: 10
at org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
at org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:540)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)

Bug: MAPREDUCE-5186

Workaround: Set mapreduce.job.max.split.locations to a high value such as 100.

Files in Avro tables no longer have .avro extension

As of CDH 4.3.0 Hive no longer creates files in Avro tables with the .avro extension by default. This does not cause any problems in Hive, but could affect downstream components such as Pig, MapReduce, or Sqoop 1 that expect files with the .avro extension.

Bug: None

Workaround: Manually set the extension to .avro before using a query that inserts data into your Avro table. Use the following set statement:

set hive.output.file.extension=".avro";

Apache Oozie

Oozie does not work seamlessly with ResourceManager HA

Oozie workflows are not recovered on ResourceManager failover when ResourceManager HA is enabled. Further, users cannot specify the clusterId for JobTracker to work against either ResourceManager.

Bug: None

Workaround: On non-secure clusters, users are required to specify either of the ResourceManagers' host:port. For secure clusters, users are required to specify the Active ResourceManager's host:port.

When using Oozie HA with security enabled, some znodes have world ACLs

Oozie High Availability with security enabled will still work, but a malicious user or program can alter znodes used by Oozie for locking, possibly causing Oozie to be unable to finish processing certain jobs.

Bug: OOZIE-1608

Workaround: None

Oozie and Sqoop 2 may need additional configuration to work with YARN

In CDH 5, MRv2 (YARN) MapReduce 2.0 is recommended over the Hadoop 0.20-based MRv1. The default configuration may not reflect this in Oozie and Sqoop 2 in CDH 5 Beta 2, however, unless you are using Cloudera Manager.

Bug: None

Workaround: Check the value of CATALINA_BASE in /etc/oozie/conf/oozie-env.sh (if you are running an Oozie server) and /etc/default/sqoop2-server (if you are using a Sqoop 2 server). You should also ensure that CATALINA_BASE is correctly set in your environment if you are invoking /usr/bin/sqoop2-server directly instead of using the service init scripts. For Oozie, CATALINA_BASE should be set to /usr/lib/oozie/oozie-server for YARN, or /usr/lib/oozie/oozie-server-0.20 for MRv1. For Sqoop 2, CATALINA_BASE should be set to /usr/lib/sqoop2/sqoop-server for YARN, or /usr/lib/sqoop2/sqoop-server-0.20 on MRv1.

Cloudera Search

Creating cores using the web UI with default values causes the system to become unresponsive

You can use the Solr Server web UI to create new cores. If you click Create Core without making any changes to the default attributes, the server may become unresponsive. Checking the log for the server shows a repeated error that begins:

ERROR org.apache.solr.cloud.Overseer: Exception in Overseer main queue loop
java.lang.IllegalArgumentException: Path must not end with / character

Bug: Solr-5813

Workaround: To avoid this issue, do not create cores without first updating values for the new core in the web UI. For example, you might enter a new name for the core to be created.

If you created a core with default settings and are seeing this error, you can address the problem by finding which node is having problems and removing that node. Find the problematic node by using a tool that can inspect ZooKeeper, such as the Solr Admin UI. Using such a tool, examine items in the ZooKeeper queue, reviewing the properties for the item. The problematic node will have an item in its queue with the property collection="".

Remove the node with the item with the collection="" property using a ZooKeeper management tool. For example, you can remove nodes using the ZooKeeper command line tool or recent versions of HUE.