Issues Fixed in CDH 5.6.x

The following topics describe issues fixed in CDH 5.6.x, from newest to oldest release. You can also review What's New In CDH 5.6.x or Known Issues in CDH 5.

Issues Fixed in CDH 5.6.1

CDH 5.6.1 fixes the following issues.

Apache Hadoop

FSImage may get corrupted after deleting snapshot

Bug: HDFS-9406

Cloudera Bug: CDH-33224

When deleting a snapshot that contains the last record of a given INode, the fsimage may become corrupt because the create list of the snapshot diff in the previous snapshot and the child list of the parent INodeDirectory are not cleaned.

Apache HBase

The ReplicationCleaner process can abort if its connection to ZooKeeper is inconsistent

Bug: HBASE-15234

If the connection with ZooKeeper is inconsistent, the ReplicationCleaner may abort, and the following event is logged by the HMaster:

WARN org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner: Aborting ReplicationLogCleaner
because Failed to get list of replicators

Unprocessed WALs accumulate.

The seekBefore() method calculates the size of the previous data block by assuming that data blocks are contiguous, and HFile v2 and higher store Bloom blocks and leaf-level INode blocks with the data. As a result, reverse scans do not work when Bloom blocks or leaf-level INode blocks are present when HFile v2 or higher is used.

Workaround: Restart the HMaster occasionally. The ReplicationCleaner restarts if necessary and process the unprocessed WALs.

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.6.1:

  • FLUME-2632 - High CPU on KafkaSink
  • FLUME-2712 - Optional channel errors slows down the Source to Main channel event rate
  • FLUME-2781 - Kafka Channel with parseAsFlumeEvent=true should write data as is, not as flume events
  • FLUME-2886 - Optional Channels can cause OOMs
  • FLUME-2891 - Revert FLUME-2712 and FLUME-2886
  • FLUME-2897 - AsyncHBase sink NPE when Channel.getTransaction() fails
  • HADOOP-7139 - Allow appending to existing SequenceFiles
  • HADOOP-7817 - RawLocalFileSystem.append() should give FSDataOutputStream with accurate .getPos()
  • HADOOP-11171 - Enable using a proxy server to connect to S3a
  • HADOOP-11321 - copyToLocal cannot save a file to an SMB share unless the user has Full Control permissions
  • HADOOP-11687 - Ignore x-* and response headers when copying an Amazon S3 object
  • HADOOP-11722 - Some Instances of Services using ZKDelegationTokenSecretManager go down when old token cannot be deleted
  • HADOOP-12240 - Fix tests requiring native library to be skipped in non-native profile
  • HADOOP-12280 - Skip unit tests based on maven profile rather than NativeCodeLoader.isNativeCodeLoaded
  • HADOOP-12604 - Exception may be swallowed in KMSClientProvider
  • HADOOP-12605 - Fix intermittent failure of TestIPC.testIpcWithReaderQueuing
  • HADOOP-12668 - Support excluding weak Ciphers in HttpServer2 through ssl-server.conf
  • HADOOP-12699 - TestKMS#testKMSProvider intermittently fails during 'test rollover draining'
  • HADOOP-12715 - TestValueQueue#testgetAtMostPolicyALL fails intermittently
  • HADOOP-12718 - Incorrect error message by fs -put local dir without permission
  • HADOOP-12736 - TestTimedOutTestsListener#testThreadDumpAndDeadlocks sometimes times out
  • HADOOP-12825 - Log slow name resolutions
  • HADOOP-12954 - Add a way to change hadoop.security.token.service.use_ip
  • HADOOP-12972 - Lz4Compressor#getLibraryName returns the wrong version number
  • HDFS-6520 - hdfs fsck passes invalid length value when creating BlockReader
  • HDFS-8211 - DataNode UUID is always null in the JMX counter
  • HDFS-8496 - Calling stopWriter() with FSDatasetImpl lock held may block other threads
  • HDFS-8576 - Lease recovery should return true if the lease can be released and the file can be closed
  • HDFS-8785 - TestDistributedFileSystem is failing in trunk
  • HDFS-8855 - Webhdfs client leaks active NameNode connections
  • HDFS-9264 - Minor cleanup of operations on FsVolumeList#volumes
  • HDFS-9289 - Make DataStreamer#block thread safe and verify genStamp in commitBlock
  • HDFS-9347 - Invariant assumption in TestQuorumJournalManager.shutdown() is wrong
  • HDFS-9350 - Avoid creating temprorary strings in Block.toString() and getBlockName()
  • HDFS-9358 - TestNodeCount#testNodeCount timed out
  • HDFS-9514 - TestDistributedFileSystem.testDFSClientPeerWriteTimeout failing; exception being swallowed
  • HDFS-9576 - HTrace: collect position/length information on read operations
  • HDFS-9589 - Block files which have been hardlinked should be duplicated before the DataNode appends to the them
  • HDFS-9612 - DistCp worker threads are not terminated after jobs are done
  • HDFS-9655 - NN should start JVM pause monitor before loading fsimage.
  • HDFS-9688 - Test the effect of nested encryption zones in HDFS downgrade
  • HDFS-9701 - DN may deadlock when hot-swapping under load
  • HDFS-9721 - Allow Delimited PB OIV tool to run upon fsimage that contains INodeReference
  • HDFS-9949 - Add a test case to ensure that the DataNode does not regenerate its UUID when a storage directory is cleared
  • HDFS-10223 - peerFromSocketAndKey performs SASL exchange before setting connection timeouts
  • HDFS-10267 - Extra "synchronized" on FsDatasetImpl#recoverAppend and FsDatasetImpl#recoverClose
  • MAPREDUCE-4785 - TestMRApp occasionally fails
  • MAPREDUCE-6460 - TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
  • MAPREDUCE-6528 - Memory leak for HistoryFileManager.getJobSummary()
  • MAPREDUCE-6580 - Test failure: TestMRJobsWithProfiler
  • MAPREDUCE-6620 - Jobs that did not start are shown as starting in 1969 in the JHS web UI
  • YARN-2749 - Fix some testcases from TestLogAggregationService fails in trunk
  • YARN-2871 - TestRMRestart#testRMRestartGetApplicationList sometime fails in trunk
  • YARN-2902 - Killing a container that is localizing can orphan resources in the DOWNLOADING state
  • YARN-3104 - Fixed RM to not generate new AMRM tokens on every heartbeat between rolling and activation
  • YARN-3446 - FairScheduler headroom calculation should exclude nodes in the blacklist
  • YARN-3727 - For better error recovery, check if the directory exists before using it for localization
  • YARN-4155 - TestLogAggregationService.testLogAggregationServiceWithInterval failing
  • YARN-4168 - Fixed a failing test TestLogAggregationService.testLocalFileDeletionOnDiskFull
  • YARN-4354 - Public resource localization fails with NPE
  • YARN-4380 - TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently
  • YARN-4393 - Fix intermittent test failure for TestResourceLocalizationService#testFailedDirsResourceRelease
  • YARN-4546 - ResourceManager crash due to scheduling opportunity overflow
  • YARN-4573 - Fix test failure in TestRMAppTransitions#testAppRunningKill and testAppKilledKilled
  • YARN-4613 - Fix test failure in TestClientRMService#testGetClusterNodes
  • YARN-4704 - TestResourceManager#testResourceAllocation() fails when using FairScheduler
  • YARN-4717 - TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup
  • HBASE-6617 - ReplicationSourceManager should be able to track multiple WAL paths
  • HBASE-14374 - Stuck FSHLog' issue to 1.1 Also includes HBASE-14807 TestWALLockup is flakey
  • HBASE-14759 - Avoid using Math.abs when selecting SyncRunner in FSHLog
  • HBASE-15019 - Replication stuck when HDFS is restarted
  • HBASE-15052 - Use EnvironmentEdgeManager in ReplicationSource
  • HBASE-15152 - Automatically include prefix-tree module in MR jobs if present
  • HBASE-15157 - Add *PerformanceTest for Append, CheckAnd* Reason: Bug Author: Stack Ref:
  • HBASE-15206 - Fix flakey testSplitDaughtersNotInMeta
  • HBASE-15213 - Fix increment performance regression caused by HBASE-8763 on branch-1.0
  • HBASE-15234 - Don't abort ReplicationLogCleaner on ZooKeeper errors
  • HBASE-15456 - CreateTableProcedure/ModifyTableProcedure needs to fail when there is no family in table descriptor
  • HBASE-15479 - No more garbage or beware of autoboxing
  • HBASE-15582 - SnapshotManifestV1 too verbose when there are no regions
  • HIVE-6099 - Multi insert does not work properly with distinct count
  • HIVE-7653 - Hive AvroSerDe does not support circular references in Schema
  • HIVE-9617 - UDF from_utc_timestamp throws NPE if the second argument is null
  • HIVE-10115 - HS2 running on a Kerberized cluster should offer Kerberos(GSSAPI) and Delegation token(DIGEST) when alternate authentication is enabled
  • HIVE-10213 - MapReduce jobs using dynamic-partitioning fail on commit
  • HIVE-10303 - HIVE-9471 broke forward compatibility of ORC files
  • HIVE-11054 - Handle varchar/char partition columns in vectorization
  • HIVE-11097 - HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases
  • HIVE-11135 - Fix the Beeline set and save command in order to avoid the NullPointerException
  • HIVE-11285 - ObjectInspector for partition columns in FetchOperator in SMBJoin causes exception
  • HIVE-11288 - Avro SerDe InstanceCache returns incorrect schema
  • HIVE-11408 - HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils
  • HIVE-11427 - Location of temporary table for CREATE TABLE SELECT broken by HIVE-7079
  • HIVE-11488 - Combine the following jiras for "Support sessionId and queryId logging"Add sessionId and queryId info to HS2 log
  • HIVE-12456: QueryId can't be stored in the configuration of the SessionState since multiple queries can run in a single session
  • HIVE-11583 - When PTF is used over a large partitions result could be corrupted
  • HIVE-11590 - AvroDeserializer is very chatty
  • HIVE-11828 - beeline -f fails on scripts with tabs between column type and comment
  • HIVE-11919 - Hive Union Type Mismatch
  • HIVE-12315 - Fix Vectorized double divide by zero
  • HIVE-12354 - MapJoin with double keys is slow on MR
  • HIVE-12431 - Support timeout for compile lock
  • HIVE-12469 - Apache Commons Collections
  • HIVE-12506 - SHOW CREATE TABLE command creates a table that does not work for RCFile format
  • HIVE-12706 - Incorrect output from from_utc_timestamp()/to_utc_timestamp when local timezone has DST
  • HIVE-12790 - Metastore connection leaks in HiveServer2
  • HIVE-12885 - LDAP Authenticator improvements
  • HIVE-12941 - Unexpected result when using MIN() on struct with NULL in first field
  • HIVE-12946 - alter table should also add default scheme and authority for the location similar to create table
  • HIVE-13039 - BETWEEN predicate is not functioning correctly with predicate pushdown on Parquet table
  • HIVE-13055 - Add unit tests for HIVE-11512
  • HIVE-13065 - Hive throws NPE when writing map type data to a HBase backed table
  • HIVE-13082 - Enable constant propagation optimization in query with left semi join
  • HIVE-13200 - Aggregation functions returning empty rows on partitioned columns
  • HIVE-13243 - Hive drop table on encyption zone fails for external tables
  • HIVE-13251 - Hive can't read the decimal in AVRO file generated from previous version
  • HIVE-13286 - Query ID is being reused across queries
  • HIVE-13295 - Improvement to LDAP search queries in HS2 LDAP Authenticator
  • HIVE-13401 - Kerberized HS2 with LDAP auth enabled fails kerberos/delegation token authentication
  • HIVE-13527 - Using deprecated APIs in HBase client causes zookeeper connection leaks
  • HIVE-13570 - Some queries with Union all fail when CBO is off
  • HUE-3106 - [filebrowser] Add support for full paths in zip file uploads
  • HUE-3110 - [oozie] Fix bundle submission when coordinator points to multiple bundles
  • HUE-3132 - [core] Fix Sync Ldap users and groups for anonymous binds
  • HUE-3180 - [useradmin] Override duplicate username validation message
  • HUE-3185 - [oozie] Avoid extra API calls for parent information in workflow dashboard
  • HUE-3303 - [core] PostgreSQL requires data update and alter table operations in separate transactions
  • HUE-3310 - [jobsub] Prevent browsing job designs by API
  • HUE-3334 - [editor] Update test, now send empty query instead of error, skip checking for multi queries if there is no semicolon
  • HUE-3398 - [beeswax] Filter out sessions with empty guid or secret key
  • HUE-3436 - [oozie] Retain old dependencies when saving a workflow
  • HUE-3437 - [core] PamBackend does not honor ignore_username_case
  • HUE-3523 - [oozie] Modify find_jobs_with_no_doc method to exclude jobs with no name
  • HUE-3528 - [oozie] Call correct metrics api to avoid 500 error
  • HUE-3594 - [fb] Smarter DOM based XSS filter on hashes
  • HUE-3637 - [sqoop] Avoid decode errors on attribute values
  • HUE-3650 - [beeswax] Notify of caught errors in the watch logs process
  • HUE-3651 - [core] Upgrade Moment.js
  • IMPALA-852, IMPALA-2215 - Analyze HAVING clause before aggregation
  • IMPALA-1092 - Fix estimates for trivial coord-only queries
  • IMPALA-1170 - Fix URL parsing when path contains '@'
  • IMPALA-1934 - Allow shell to retrieve LDAP password from shell cmd
  • IMPALA-2093 - Disallow NOT IN aggregate subqueries with a constant lhs expr
  • IMPALA-2184 - don't inline timestamp methods with try/catch blocks in IR
  • IMPALA-2425 - Broadcast join hint not enforced when low memory limit is set
  • IMPALA-2503 - Add missing String.format() arg in error message
  • IMPALA-2539 - Unmark collections slots of empty union operands
  • IMPALA-2554 - Change default buffer size for RPC servers and clients
  • IMPALA-2565 - Planner tests are flaky due to file size mismatches
  • IMPALA-2592 - DataStreamSender::Channel::CloseInternal() does not close the channel on an error
  • IMPALA-2599 - Pseudo-random sleep before acquiring kerberos ticket possibly not really pseudo-random
  • IMPALA-2711 - Fix memory leak in Rand()
  • IMPALA-2732 - Timestamp formats with non-padded values
  • IMPALA-2734 - Correlated EXISTS subqueries with HAVING clause return wrong results
  • IMPALA-2742 - Avoid unbounded MemPool growth with AcquireData()
  • IMPALA-2749 - Fix decimal multiplication overflow
  • IMPALA-2765 - Preserve return type of subexpressions substituted in isTrueWithNullSlots()
  • IMPALA-2788 - conv(bigint num, int from_base, int to_base) returns wrong result
  • IMPALA-2798 - Bring in AVRO-1617 fix and add test case for it
  • IMPALA-2818 - Fix cancellation crashes/hangs due to BlockOnWait() race
  • IMPALA-2820 - Support unquoted keywords as struct-field names
  • IMPALA-2832 - Fix cloning of FunctionCallExpr
  • IMPALA-2844 - Allow count(*) on RC files with complex types
  • IMPALA-2870 - Fix failing metadata.test_ddl.TestDdlStatements.test_create_table test
  • IMPALA-2894 - Move regression test into a different .test file
  • IMPALA-2906 - Fix an edge case with materializing TupleIsNullPredicates in analytic sorts
  • IMPALA-2914 - Fix DCHECK Check failed: HasDateOrTime()
  • IMPALA-2926 - Fix off-by-one bug in SelectNode::CopyRows()
  • IMPALA-2940 - Fix leak of dictionaries in Parquet scanner
  • IMPALA-3000 - Fix BitReader::Reset()
  • IMPALA-3034 - Verify all consumed memory of a MemTracker is always released at destruction time
  • IMPALA-3047 - Separate create table test with nested types
  • IMPALA-3054 - Disable proble side filters when spilling
  • IMPALA-3071 - Fix assignment of On-clause predicates belonging to an inner join
  • IMPALA-3085 - Unregister data sinks' MemTrackers at their Close() functions
  • IMPALA-3093 - ReopenClient() could NULL out 'client_key' causing a crash
  • IMPALA-3095 - Add configurable whitelist of authorized internal principals
  • IMPALA-3151 - Impala crash for avro table when casting to char data type
  • IMPALA-3194 - Allow queries materializing scalar type columns in RC/sequence files
  • KITE-1114 - Kite CLI json-import HDFS temp file path not multiuser safe, fix missing license header
  • OOZIE-2419 - HBase credentials are not correctly proxied
  • OOZIE-2466 - Repeated failure of TestMetricsInstrumentation.testSamplers
  • OOZIE-2486 - TestSLAEventsGetForFilterJPAExecutor is flakey
  • OOZIE-2490 - Oozie can't set hadoop.security.token.service.use_ip
  • SENTRY-748 - Improve test coverage of Sentry + Hive using complex views
  • SENTRY-835 - Drop table leaves a connection open when using metastorelistener
  • SENTRY-922 - INSERT OVERWRITE DIRECTORY permission not working correctly
  • SENTRY-972 - Include sentry-tests-hive hadoop test script in maven project
  • SENTRY-991 - Roles of Sentry Permission needs to be case insensitive
  • SENTRY-994 - SentryAuthorizationInfoX should override isSentryManaged
  • SENTRY-1002 - PathsUpdate.parsePath(path) will throw an NPE when parsing relative paths
  • SENTRY-1003 - Support "reload" by updating the classpath of Sentry function aux jar path during runtime
  • SENTRY-1007 - Sentry column-level performance for wide tables
  • SENTRY-1008 - Path should be not be updated if the create/drop table/partition event fails
  • SENTRY-1015 - Improve Sentry + Hive error message when user has insufficient privileges
  • SENTRY-1044 - Tables with non-hdfs locations breaks HMS startup
  • SENTRY-1169 - MetastorePlugin#renameAuthzObject log message prints oldpathname as newpathname
  • SENTRY-1184 - Clean up HMSPaths.renameAuthzObject
  • SOLR-6631 - DistributedQueue spinning on calling zookeeper getChildren()
  • SOLR-6820 - The sync on the VersionInfo bucket in DistributedUpdateProcesser#addDocument appears to be a large bottleneck when using replication
  • SOLR-7281 - Add an overseer action to publish an entire node as 'down'
  • SOLR-7332 - Seed version buckets with max version from index
  • SOLR-7493 - Requests aren't distributed evenly if the collection isn't present locally. Merges r1683946 and r1683948 from trunk
  • SOLR-7587 - TestSpellCheckResponse stalled and never timed out -- possible VersionBucket bug?
  • SOLR-7625 - Version bucket seed not updated after new index is installed on a replica
  • SOLR-8215 - Only active replicas should handle incoming requests against a collection
  • SOLR-8371 - Try and prevent too many recovery requests from stacking up and clean up some faulty cancel recovery logic
  • SOLR-8451 - We should not call method.abort in HttpSolrClient or HttpSolrCall#remoteQuery and HttpSolrCall#remoteQuery should not close streams
  • SOLR-8453 - Solr should attempt to consume the request inputstream on errors as we cannot count on the container to do it
  • SOLR-8575 - Fix HDFSLogReader replay status numbers and a performance bug where we can reopen FSDataInputStream too often
  • SOLR-8578 - Successful or not, requests are not always fully consumed by Solrj clients and we count on HttpClient or the JVM
  • SOLR-8615 - Just like creating cores, we should use multiple threads when closing cores
  • SOLR-8633 - DistributedUpdateProcess processCommit/deleteByQuery calls finish on DUP and SolrCmdDistributor, which violates the lifecycle and can cause bugs
  • SOLR-8683 - Always consume the full request on the server, not just in the case of an error, tune down stream closed logging
  • SOLR-8720 - ZkController#publishAndWaitForDownStates should use #publishNodeAsDown
  • SOLR-8771 - Multi-threaded core shutdown creates executor per core
  • SOLR-8855 - The HDFS BlockDirectory should not clean up it's cache on shutdown
  • SOLR-8856 - Do not cache merge or 'read once' contexts in the hdfs block cache
  • SOLR-8857 - HdfsUpdateLog does not use configured or new default number of version buckets and is hard coded to 256
  • SOLR-8869 - Optionally disable printing field cache entries in SolrFieldCacheMBean
  • SPARK-10859 - Predicates pushed to InmemoryColumnarTableScan are not evaluated correctly
  • SPARK-10914 - UnsafeRow serialization breaks when two machines have different Oops size
  • SPARK-11009 - RowNumber in HiveContext returns negative values in cluster mode
  • SPARK-11442 - Reduce numSlices for local metrics test of SparkListenerSuite
  • SPARK-12617 - Socket descriptor leak killing streaming app
  • SPARK-14477 - Allow custom mirrors for downloading artifacts in build/mvn
  • SQOOP-2847 - Sqoop --incremental + missing parent --target-dir reports success with no data

Issues Fixed in CDH 5.6.0

Apache Avro

Concurrent schema parsing can result in processes hanging because of an unsafe shared cache

Bug: AVRO-1781

Cloudera Bug: CDH-36057

Severity: High

Workaround: Use a global lock around all calls to Schema.parse and Schema.Parser#parse to ensure only one thread is parsing at a time.

Apache HBase

Values of some metrics may appear to be negative.

Bug: HBASE-12961

Some metric value are stored in integers, and cannot accommodate real-world values. This causes metric values to appear to be negative.

Workaround: None.

The HBase Shell cannot handle Scan filters which contain non-UTF8 characters.

Bug: HBASE-15032

The HBase Shell incorrectly handles filter strings which contain non-UTF8 characters.

Workaround: None.

HDFS

Checkpointing can fail due to an InvalidSignatureException in a secure cluster

Bug: HDFS-7798

Cloudera Bug: CDH-33411

Severity: High

Workaround: This problem occurs occasionally due to race condition. The error is transient, and a subsequent checkpoint may still succeed.