Fixed Issues in CDH 6.1.0

Hive Jobs Are Submitted to a Single Queue When Sentry is Deployed

Hive jobs are not submitted into the correct YARN queue when Hive is using Sentry because Hive does not use the YARN API to resolve the user or group of the job's original submitter. This causes the job to be placed in a queue using the placement rules based on the Hive user. The HiveServer2 fair scheduler queue mapping used for "non-impersonation" mode does not handle the primary-secondary queue mappings correctly.

Workaround: If you are a Hive and Sentry user, do not upgrade to CDH 6.0.0. This issue will be fixed as soon as possible. If you must use Hive and Sentry in CDH 6.0.0, see YARN Dynamic Resource Pools Do Not Work with Hive When Sentry Is Enabled for additional workarounds.

Affected Version: CDH 6.0.0

Fixed Versions: CDH 6.0.1, CDH 6.1.0 and later

Cloudera Issue: CDH-51596

Hadoop LdapGroupsMapping does not support LDAPS for self-signed LDAP server

Hadoop LdapGroupsMapping does not work with LDAP over SSL (LDAPS) if the LDAP server certificate is self-signed. This use case is currently not supported even if Hadoop User Group Mapping LDAP TLS/SSL Enabled, Hadoop User Group Mapping LDAP TLS/SSL Truststore, and Hadoop User Group Mapping LDAP TLS/SSL Truststore Password are filled properly.

Affected Versions: CDH 5.x and 6.0.x versions

Fixed Versions: CDH 6.1.0

Apache Issue: HADOOP-12862

Cloudera Issue: CDH-37926

ZooKeeper JMX did not support TLS when managed by Cloudera Manager

Technical Service Bulletin 2019-310 (TSB)

The ZooKeeper service optionally exposes a JMX port used for reporting and metrics. By default, Cloudera Manager enables this port, but prior to Cloudera Manager 6.1.0, it did not support mutual TLS authentication on this connection. While JMX has a password-based authentication mechanism that Cloudera Manager enables by default, weaknesses have been found in the authentication mechanism, and Oracle now advises JMX connections to enable mutual TLS authentication in addition to password-based authentication. A successful attack may leak data, cause denial of service, or even allow arbitrary code execution on the Java process that exposes a JMX port. Beginning in Cloudera Manager 6.1.0, it is possible to configure mutual TLS authentication on ZooKeeper’s JMX port.

Products affected: ZooKeeper

Releases affected: Cloudera Manager 6.1.0 and lower, Cloudera Manager 5.16 and lower

Users affected: All

Date/time of detection: June 7, 2018

Severity (Low/Medium/High): 9.8 High (CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H)

Impact: Remote code execution

CVE: CVE-2018-11744

Immediate action required: Upgrade to Cloudera Manager 6.1.0 and enable TLS for the ZooKeeper JMX port by turning on the configuration settings “Enable TLS/SSL for ZooKeeper JMX” and “Enable TLS client authentication for JMX port” on the ZooKeeper service and configuring the appropriate TLS settings. Alternatively, disable the ZooKeeper JMX port via the configuration setting “Enable JMX Agent” on the ZooKeeper service.

Addressed in release/refresh/patch: Cloudera Manager 6.1.0

Spark Streaming jobs loop if missing Kafka topic

Spark jobs can loop endlessly if the Kafka topic is deleted while a Kafka streaming job (which uses KafkaSource) is in progress.

Cloudera Issue: CDH-57903, CDH-64513

Long-running Spark applications on a secure cluster might fail if driver is restarted

If you submit a long-running app on a secure cluster using the --principal and --keytab options in cluster mode, and a failure causes the driver to restart after 7 days (the default maximum HDFS delegation token lifetime), the new driver fails with an error similar to the following:

Exception in thread "main" org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token <token_info> can't be found in cache

Apache Issue: SPARK-23361

Cloudera Issue: CDH-64865

Kafka May Be Stuck with Under-replicated Partitions after ZooKeeper Session Expires

This problem can occur when your Kafka cluster includes a large number of under-replicated Kafka partitions. One or more broker logs include messages such as the following:

[2016-01-17 03:36:00,888] INFO Partition [__samza_checkpoint_event-creation_1,3] on broker 3: Shrinking ISR for partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 (kafka.cluster.Partition)
[2016-01-17 03:36:00,891] INFO Partition [__samza_checkpoint_event-creation_1,3] on broker 3: Cached zkVersion [66] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
        

There will also be an indication of the ZooKeeper session expiring in one or more Kafka broker logs around the same time as the previous errors:

INFO zookeeper state changed (Expired) (org.I0Itec.zkclient.ZkClient)

The log is typically in /var/log/kafka on each host where a Kafka broker is running. The location is set by the property kafka.log4j.dir in Cloudera Manager. The log name is kafka-broker-hostname.log. In diagnostic bundles, the log is under logs/hostname-ip-address/.

Workaround: To move forward after seeing this problem, restart the affected Kafka brokers. You can restart individual brokers from the Instances tab in the Kafka service page in Cloudera Manager.
  • Reduce the potential for long garbage collection pauses by brokers:
    • Use a better garbage collection mechanism in the JVM, such as G1GC. You can do this by adding ‑XX:+UseG1GC in the broker_java_opts.
    • Increase broker heap size if it is too small (broker_max_heap_size). Be careful that you don’t choose a heap size that can cause out-of-memory problems given all the services running on the node.
  • Increase the ZooKeeper session timeout configuration on brokers (zookeeper.session.timeout.ms), to reduce the likelihood that sessions expire.
  • Ensure ZooKeeper itself is well resourced and not overwhelmed so it can respond. For example, it is highly recommended to locate the ZooKeeper log directory on its own disk.

Affected Versions: CDK 1.4.x, 2.0.x, 2.1.x, 2.2.x

Fixed Versions:
  • Full Fix: CDH 6.1.0
  • Partial Fix: CDH 6.0.0, Kafka implementations with CDH 6.0.0 are less likely to encounter this issue.

Apache Issue: KAFKA-2729

Cloudera Issue: CDH-42514

Upstream Issues Fixed

Apache Accumulo

There are no notable fixed issues in this release.

Apache Avro

There are no notable fixed issues in this release.

Apache Crunch

There are no notable fixed issues in this release.

Apache Flume

The following issues are fixed in CDH 6.1.0:

  • FLUME-2442 - Need an alternative to providing clear text passwords in flume config
  • FLUME-2973 - Deadlock in hdfs sink
  • FLUME-2977 - Upgrade RAT to 0.12
  • FLUME-3050 - add counters for error conditions and expose to monitor URL
  • FLUME-3182 - add support for SSL/TLS for syslog (tcp) sources
  • FLUME-3222 - Fix for NoSuchFileException thrown when files are being deleted
  • FLUME-3223 - Flume HDFS Sink should retry close prior recover lease
  • FLUME-3227 - Add Rate Limiter to stresssource
  • FLUME-3239 - Do not rename files in SpoolDirectorySource
  • FLUME-3246 - Validate flume configuration to prevent larger source batchsize than
  • FLUME-3269 - Support JSSE keystore/trustore -D system properties
  • FLUME-3278 - Handling -D keystore parameters in Kafka components

Apache Hadoop

HDFS

The following issues are fixed in CDH 6.1.0:

  • HADOOP-9214 - Enhance the hadoop fs touchz command so that it can now modify atime and mtime.
  • HADOOP-12502 - Fixed an issue where setting the replication of a HDFS folder recursively can run out of memory.
  • HADOOP-13649 - s3guard: Implement time-based (TTL) expiry for LocalMetadataStore.
  • HADOOP-13761 - S3Guard: Implement retries for DDB failures and throttling; translate exceptions.
  • HADOOP-14212 - Expose SecurityEnabled boolean field in JMX for other services besides NameNode.
  • HADOOP-14507 - Extend per-bucket secret key config with explicit getPassword() on fs.s3a.$bucket.secret.key.
  • HADOOP-14758 - Improve S3GuardTool.prune to handle UnsupportedOperationException.
  • HADOOP-14759 - Improve S3GuardTool.prune to prune specific bucket entries.
  • HADOOP-14913 - Implement sticky bit for rename() operation in Azure WASB.
  • HADOOP-14935 - Fix an issue where Azure POSIX permissions are taking effect in access() method even when authorization is enabled.
  • HADOOP-14965 - Change the S3a input stream "normal" fadvise mode to be adaptive.
  • HADOOP-15054 - Upgrade hadoop dependency on commons-codec to 1.11.
  • HADOOP-15086 - Fix an issue where the NativeAzureFileSystem file rename is not atomic.
  • HADOOP-15121 - Fix a NullPointerException when using DecayRpcScheduler.
  • HADOOP-15141 - Support IAM Assumed roles in S3A.
  • HADOOP-15143 - Fix an NPE due to Invalid KerberosTicket in UGI.
  • HADOOP-15151 - Fix an issue where the MapFile.fix creates a wrong index file in case of block-compressed data file.
  • HADOOP-15176 - Enhance IAM Assumed Role support in S3A client.
  • HADOOP-15206 - Fix an issue where BZip2 drops and duplicates records when input split size is small.
  • HADOOP-15209 - Enhance DistCp to eliminate needless deletion of files under already deleted directories.
  • HADOOP-15212 - Add independent secret manager method for logging expired tokens.
  • HADOOP-15215 - Enhance s3guard set-capacity command to fail on read/write of 0.
  • HADOOP-15217 - Enhance FsUrlConnection to handle paths with spaces.
  • HADOOP-15250 - Fix an issue where a multiHomed server network cluster Network IPC Client binds the wrong address.
  • HADOOP-15267 - S3A multipart upload fails when SSE-C encryption is enabled.
  • HADOOP-15391 - Add missing CSS file in hadoop-aws, hadoop-aliyun, hadoop-azure and hadoop-azure-datalake modules.
  • HADOOP-15423 - Merge fileCache and dirCache into one single cache in LocalMetadataStore
  • HADOOP-15441 - Log kms url and token service at debug level.
  • HADOOP-15446 - WASB: PageBlobInputStream.skip breaks HBASE replication.
  • HADOOP-15449 - Increase default timeout of ZK session to avoid frequent NameNode failover.
  • HADOOP-15469 - Fix an issue where the S3A directory committer commit job fails if _temporary directory created under destination.
  • HADOOP-15478 - Fix an issue with WASB that caused an hflush() and hsync() regression.
  • HADOOP-15541 - Fix an issue where the AWS SDK can mistake stream timeouts for EOF and throw SdkClientExceptions.
  • HADOOP-15598 - Fix an issue where the DataChecksum calculate checksum experiences contention on hashtable synchronization.
  • HADOOP-15612 - Improve exception when tfile fails to load LzoCodec.
  • HADOOP-15633 - Fix an issue where fs.TrashPolicyDefault cannot create trash directory.
  • HADOOP-15679 - Enhance ShutdownHookManager shutdown time to be configurable & extended.
  • HADOOP-15684 - Fix an issue where triggerActiveLogRoll stuck on dead NameNode when ConnectTimeoutException happens.
  • HADOOP-15719 - Fail-fast when using OAuth over http.
  • HADOOP-15850 - Enhance CopyCommitter#concatFileChunks to check that the blocks per chunk is not 0.
  • HADOOP-15861 - Move DelegationTokenIssuer to the correct path.
  • HDFS-9049 - Make Datanode Netty reverse proxy port configurable.
  • HDFS-10183 - Prevent race condition during class initialization.
  • HDFS-11701 - Fix an issue where NPE from Unresolved Host causes permanent DFSInputStream failures.
  • HDFS-11719 - Enhance Arrays.fill() wrong index in BlockSender.readChecksum() exception handling.
  • HDFS-11900 - Fix an issue where hedged reads thread pool creation not synchronized.
  • HDFS-12070 - Fix an issue where failed block recovery leaves files open indefinitely and at risk for data loss.
  • HDFS-12574 - Add CryptoInputStream to WebHdfsFileSystem read call.
  • HDFS-12907 - Allow read-only access to reserved raw for non-superusers.
  • HDFS-12978 - Add fine-grained locking while consuming journal stream.
  • HDFS-13027 - Handle possible NPEs due to deleted blocks in race condition.

    HDFS-13048 - Fix an issue where the LowRedundancyReplicatedBlocks metric can be negative.

  • HDFS-13052 - Add support for snasphot diff with WebHDFS.
  • HDFS-13060 - Add a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver.

    HDFS-13081 - Allow SASL and privileged HTTP with Datanode#checkSecureConfig.

  • HDFS-13087 - Make snapshotted encryption zone information immutable.
  • HDFS-13145 - Fix an issue where an SBN crash occurs when transitioning to ANN with in-progress edit tailing enabled.
  • HDFS-13225 - Fix an issue where StripeReader#checkMissingBlocks() 's IOException info is incomplete.
  • HDFS-13280 - Fix NPE in get snasphottable directory list call.
  • HDFS-13330 - Fix an issue where ShortCircuitCache#fetchOrCreate never retries.
  • HDFS-13448 - Ignore locality for First Block Replica.
  • HDFS-13493 - Reduce the HttpServer2 thread count on DataNodes.
  • HDFS-13641 - Add metrics for edit log tailing.
  • HDFS-13658 - Expose HighestPriorityLowRedundancy blocks statistics.
  • HDFS-13668 - FSPermissionChecker may throw rrayIndexOutOfBoundsException when checking inode permission.
  • HDFS-13686 - Add overall metrics for FSNamesystemLock.
  • HDFS-13728 - Fix an issue where the Disk Balancer fails if volume usage is greater than capacity.
  • HDFS-13731 - Fix an issue where ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints.
  • HDFS-13738 - Fix an issue where fsck -list-corruptfileblocks encounters an infinite loop if the user is not privileged.
  • HDFS-13758 - Enhance DatanodeManager to throw exception if it has BlockRecoveryCommand but the block is not under construction.
  • HDFS-13820 - Add an ability to disable CacheReplicationMonitor.
  • HDFS-13830 - Add support for getting snasphottable directory list.
  • HDFS-13831 - Make block increment deletion number configurable.
  • HDFS-13833 - Improve BlockPlacementPolicyDefault's consider load logic.
  • HDFS-13838 - Fix an issue where WebHdfsFileSystem.getFileStatus() does not return correct "snapshot enabled" status.
  • HDFS-13846 - Fix an issue where safe blocks counter is not decremented correctly if the block is striped.
  • HDFS-13868 - Fix an NPE with the GETSNAPSHOTDIFF API when the parameter "snapshotname" is given but "oldsnapshotname" is not.
  • HDFS-13876 - Implement ALLOWSNAPSHOT/DISALLOWSNAPSHOT for HttpFS.
  • HDFS-13877 - Implement GETSNAPSHOTDIFF for HttpFS.
  • HDFS-13878 - Implement GETSNAPSHOTTABLEDIRECTORYLIST for HttpFS.
  • HDFS-13882 - Set a maximum delay for retrying locateFollowingBlock.
  • HDFS-13885 - Add debug logs in dfsclient around decrypting EDEK.
  • HDFS-13886 - Fix an issue where HttpFSFileSystem.getFileStatus() doesn't return "snapshot enabled" bit.
  • HDFS-14009 - Fix an issue where FileStatus#setSnapShotEnabledFlag throws InvocationTargetException when attribute set is emptySet.

MapReduce 2

The following issues are fixed in CDH 6.1.0:

YARN

The following issues are fixed in CDH 6.1.0:

  • YARN-7159 - Normalize unit of resource objects in ResourceManager to avoid unit conversion in critical path.
  • YARN-7237 - Cleanup usages of ResourceProfiles.
  • YARN-7728 - Expose container preemptions related information in Capacity Scheduler queue metrics.
  • YARN-7738 - CapacityScheduler: Support refresh maximum allocation for multiple resource types .
  • YARN-7948 - Enable fair scheduler to refresh maximum allocation for multiple resource types.
  • YARN-8338 - Fixed an issue where TimelineService V1.5 does not come up after HADOOP-15406.
  • YARN-8566 - Add diagnostic message for unschedulable containers .
  • YARN-8842 - Expose metrics for custom resource types in QueueMetrics.
  • YARN-8990 - Fix fair scheduler race condition in app submit and queue cleanup.

Apache HBase

The following issues are fixed in CDH 6.1.0:

  • HBASE-18451 - PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request, fix logging
  • HBASE-18549 - Add metrics for failed replication queue recovery
  • HBASE-19418 - configurable range of delay in PeriodicMemstoreFlusher
  • HBASE-20193 - Basic Replication Web UI - Regionserver
  • HBASE-20375 - Remove use of getCurrentUserCredentials in hbase-spark module
  • HBASE-20469 - Directory used for sidelining old recovered edits files should be made configurable
  • HBASE-20732 - Shutdown scan pool when master is stopped
  • HBASE-20734 - Colocate recovered edits directory with hbase.wal.dir
  • HBASE-20741 - Split of a region with replicas creates all daughter regions
  • HBASE-20792 - info:servername and info:sn inconsistent for OPEN region
  • HBASE-20808 - (Addendum) Remove duplicate calls for cancelling of chores
  • HBASE-20846 - Restore procedure locks when master restarts
  • HBASE-20857 - balancer status tag in jmx metrics
  • HBASE-20865 - CreateTableProcedure is stuck in retry loop in CREATE_TABLE_WRITE_FS_LAYOUT state
  • HBASE-20892 - [UI] Start / End keys are empty on table.jsp
  • HBASE-20942 - Revert "Fix Array Index Out Of Bounds Exception for RpcServer TRACE logging"
  • HBASE-20965 - Separate region server report requests to new handlers
  • HBASE-20985 - add two attributes when we do normalization
  • HBASE-20986 - Separate the config of block size when we do log splitting and write Hlog
  • HBASE-21001 - ReplicationObserver fails to load in HBase 2.0.0
  • HBASE-21023 - Added bypassProcedure() API to HbckService
  • HBASE-21032 - ScanResponses contain only one cell each
  • HBASE-21055 - NullPointerException when balanceOverall() but server balance info is null
  • HBASE-21072 - Addendum do not write lock file when running TestHBaseFsckReplication
  • HBASE-21073 - Redo concept of maintenance mode
  • HBASE-21095 - The timeout retry logic for several procedures are broken after master restarts
  • HBASE-21125 - 'HBASE-20942 Improve RpcServer TRACE logging' to branch-2.1
  • HBASE-21126 - "Add ability for HBase Canary to ignore a configurable number of ZooKeeper down nodes" to branch-2.1
  • HBASE-21127 - TableRecordReader need to handle cursor result too
  • HBASE-21132 - return wrong result in rest multiget
  • HBASE-21144 - AssignmentManager.waitForAssignment is not stable
  • HBASE-21155 - Save on a few log strings and some churn in wal splitter by skipping out early if no logs in dir
  • HBASE-21156 - [hbck2] Queue an assign of hbase:meta and bulk assign/unassign
  • HBASE-21158 - Empty qualifier cell is always returned when using QualifierFilter
  • HBASE-21164 - reportForDuty should do backoff rather than retry
  • HBASE-21171 - [amv2] Tool to parse a directory of MasterProcWALs standalone
  • HBASE-21172 - Reimplement the retry backoff logic for ReopenTableRegionsProcedure
  • HBASE-21174 - [REST] Failed to parse empty qualifier in TableResource#getScanResource
  • HBASE-21179 - Fix the number of actions in responseTooSlow log
  • HBASE-21181 - Use the same filesystem for wal archive directory and wal directory
  • HBASE-21182 - Failed to execute start-hbase.sh
  • HBASE-21185 - WALPrettyPrinter: Additional useful info to be printed by wal printer tool, for debugability purposes
  • HBASE-21190 - Log files and count of entries in each as we load from the MasterProcWAL store
  • HBASE-21191 - Add a holding-pattern if no assign for meta or namespace (Can happen if masterprocwals have been cleared).
  • HBASE-21196 - HTableMultiplexer clears the meta cache after every put operation
  • HBASE-21200 - Memstore flush doesn't finish because of seekToPreviousRow() in memstore scanner.
  • HBASE-21204 - NPE when scan raw DELETE_FAMILY_VERSION and codec is not set
  • HBASE-21206 - Scan with batch size may return incomplete cells
  • HBASE-21207 - Add client side sorting functionality in master web UI for table and region server details
  • HBASE-21208 - Bytes#toShort doesn't work without unsafe
  • HBASE-21212 - Wrong flush time when update flush metric
  • HBASE-21214 - [hbck2] setTableState just sets hbase:meta state, not in-memory state
  • HBASE-21223 - [amv2] Remove abort_procedure from shell
  • HBASE-21228 - Memory leak since AbstractFSWAL caches Thread object and never clean later
  • HBASE-21232 - Show table state in Tables view on Master home page
  • HBASE-21233 - Allow the procedure implementation to skip persistence of the state after a execution
  • HBASE-21242 - Revert "[amv2] Miscellaneous minor log and assign procedure create improvements; ADDENDUM Fix TestHRegionInfo"
  • HBASE-21248 - Implement exponential backoff when retrying for ModifyPeerProcedure
  • HBASE-21249 - Add jitter for ProcedureUtil.getBackoffTimeMs
  • HBASE-21250 - Addendum remove unused modification in hbase-server module
  • HBASE-21250 - Refactor WALProcedureStore and add more comments for better understanding the implementation
  • HBASE-21254 - Need to find a way to limit the number of proc wal files
  • HBASE-21259 - [amv2] Revived deadservers; recreated serverstatenode
  • HBASE-21260 - The whole balancer plans might be aborted if there are more than one plans to move a same region
  • HBASE-21263 - Mention compression algorithm along with other storefile details
  • HBASE-21266 - Not running balancer because processing dead regionservers, but empty dead rs list
  • HBASE-21280 - Add anchors for each heading in UI
  • HBASE-21287 - Allow configuring test master initialization wait time.
  • HBASE-21288 - HostingServer in UnassignProcedure is not accurate
  • HBASE-21292 - IdLock.getLockEntry() may hang if interrupted
  • HBASE-21299 - List counts of actual region states in master UI tables section
  • HBASE-21303 - [shell] clear_deadservers with no args fails
  • HBASE-21323 - Revert "Should not skip force updating for a sub procedure even if"
  • HBASE-21425 - 2.1.1 fails to start over 1.x data; namespace not assigned

Apache Hive

The following issues are fixed in CDH 6.1.0:

Code Changes Might Be Required

The following fixes might require code changes for the CDH 6.1.0 release of Apache Hive:

  • HIVE-14388 - Add number of rows inserted message after insert command in Beeline
  • HIVE-17799 - Add Ellipsis For Truncated Query In Hive Lock
  • HIVE-19344 - Change default value of msck.repair.batch.size

Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

  • HIVE-6980 - Drop table by using direct SQL
  • HIVE-10296 - Cast exception observed when hive runs a multi-join query on metastore (postgres), since postgres pushes the filter into the join, and ignores the condition before applying cast
  • HIVE-13900 - HiveStatement.executeAsync() may not work properly when hive.server2.async.exec.async.compile is turned on
  • HIVE-14162 - Allow disabling of a long-running job on Hive On Spark On YARN
  • HIVE-14560 - Support exchange partition between s3 and HDFS tables
  • HIVE-14690 - Query fail when hive.exec.parallel=true, with conflicting session dir
  • HIVE-14984 - Hive-WebUI access results in Request is a replay (34) attack
  • HIVE-15104 - Hive on Spark generate more shuffle data than hive on mr
  • HIVE-15180 - Extend JSONMessageFactory to store additional information about metadata objects on different table events
  • HIVE-15250 - Reuse partitions info generated in MoveTask to its subscribers (StatsTask)
  • HIVE-15712 - New HiveConf in SQLOperation.getSerDe() impacts CPU on Hiveserver2
  • HIVE-15995 - Syncing metastore table with serde schema
  • HIVE-16071 - HoS RPCServer misuses the timeout in its RPC handshake
  • HIVE-16143 - Improve msck repair batching
  • HIVE-16172 - Switch to a fairness lock to synchronize HS2 thrift client
  • HIVE-16219 - Metastore notification_log contains serialized message with non-functional fields
  • HIVE-16285 - Servlet for dynamically configuring log levels
  • HIVE-16346 - inheritPerms should be conditional based on the target filesystem
  • HIVE-16348 - HoS query is canceled but error message shows RPC is closed
  • HIVE-16431 - Support Parquet StatsNoJobTask for Spark & Tez engine
  • HIVE-16607 - ColumnStatsAutoGatherContext regenerates HiveConf.HIVEQUERYID
  • HIVE-16664 - Add join related Hive blobstore tests
  • HIVE-16736 - General Improvements to BufferedRows
  • HIVE-17300 - WebUI query plan graphs
  • HIVE-17401 - Hive session idle timeout doesn't function properly
  • HIVE-17747 - HMS DropTableMessage should include the full table object
  • HIVE-18031 - Support replication for Alter Database operation
  • HIVE-18118 - Explain Extended should indicate if a file being read is an EC file
  • HIVE-18652 - Print Spark metrics on console
  • HIVE-18690 - Integrate with Spark OutputMetrics
  • HIVE-18696 - The partition folders might not get cleaned up properly in the HiveMetaStore.add_partitions_core method if an exception occurs
  • HIVE-18705 - Improve HiveMetaStoreClient.dropDatabase
  • HIVE-18743 - CREATE TABLE on S3 data can be extremely slow.DO_NOT_UPDATE_STATS workaround is buggy
  • HIVE-18766 - Race condition during shutdown of RemoteDriver, error messages aren't always sent
  • HIVE-18778 - Needs to capture input/output entities in explain
  • HIVE-18906 - Lower Logging for "Using direct SQL".
  • HIVE-18916 - SparkClientImpl doesn't error out if spark-submit fails.
  • HIVE-19008 - Improve Spark session id logging
  • HIVE-19053 - RemoteSparkJobStatus#getSparkJobInfo treats all exceptions as timeout errors
  • HIVE-19079 - Add extended query string to Spark job description
  • HIVE-19370 - Issue: ADD Months function on timestamp datatype fields in Hive
  • HIVE-19371 - Add table ownerType to HMS thrift API
  • HIVE-19372 - Add table ownerType to JDO/SQL and ObjectStore
  • HIVE-19374 - Parse and process ALTER TABLE SET OWNER command syntax
  • HIVE-19477 - Hiveserver2 in HTTP mode not emitting metric default.General.open_connections
  • HIVE-19486 - Discrepancy in HikariCP config naming
  • HIVE-19508 - SparkJobMonitor getReport doesn't print stage progress in order
  • HIVE-19525 - Spark task logs print PLAN PATH excessive number of times
  • HIVE-19559 - SparkClientImpl shouldn't name redirector thread RemoteDriver
  • HIVE-19718 - Adding partitions in bulk also fetches table for each partition
  • HIVE-19733 - RemoteSparkJobStatus#getSparkStageProgress inefficient implementation
  • HIVE-19766 - Show the number of rows inserted when execution engine is Spark
  • HIVE-19783 - Retrieve only locations in HiveMetaStore.dropPartitionsAndGetLocations
  • HIVE-19786 - RpcServer cancelTask log message is incorrect
  • HIVE-19787 - Log message when spark-submit has completed
  • HIVE-19814 - RPC Server port is always random for spark
  • HIVE-19899 - Support stored as JsonFile
  • HIVE-19937 - Intern fields in MapWork on deserialization
  • HIVE-19942 - Hive Notification: All events for indexes should have table name
  • HIVE-19986 - Add logging of runtime statistics indicating when Hdfs Erasure Coding is used by MR
  • HIVE-20032 - Don't serialize hashCode for repartitionAndSortWithinPartitions
  • HIVE-20056 - SparkPartitionPruner shouldn't be triggered by Spark tasks
  • HIVE-20098 - Statistics: NPE when getting Date column partition statistics
  • HIVE-20212 - Hiveserver2 in http mode emitting metric default.General.open_connections incorrectly
  • HIVE-20374 - Write Hive version information to Parquet footer
  • HIVE-20466 - Improve org.apache.hadoop.hive.ql.exec.FunctionTask Experience
  • HIVE-20505 - upgrade org.openjdk.jmh:jmh-core to 1.21
  • HIVE-20544 - TOpenSessionReq logs password and username
  • HIVE-20545 - Exclude parameters that can have potentially large size from HMS notification message JSON
  • HIVE-20601 - EnvironmentContext null in ALTER_PARTITION event in DbNotificationListener
  • HIVE-20603 - "Wrong FS" error when inserting to partition after changing table location filesystem
  • HIVE-20678 - HiveHBaseTableOutputFormat should implement HiveOutputFormat to ensure compatibility
  • HIVE-20695 - HoS Query fails with hive.exec.parallel=true.
  • HIVE-20711 - Race Condition when Multi-Threading in SessionState.createRootHDFSDir
  • HIVE-20742 - SparkSessionManagerImpl maintenance thread only cleans up session once

Hue

The following issues are fixed in CDH 6.1.0:

  • HUE-7407 - [useradmin] Added superuser group priv to useradmin
  • HUE-7698 - [oozie] Added warning when there is a space in the shell action
  • HUE-7698 - [oozie] Files of a Shell document action in a workflow are not being generated in the XML
  • HUE-7860 - [core] Update greenlet from 0.4.12 to 0.4.15
  • HUE-7860 - [core] Add monotonic 1.5
  • HUE-7860 - [core] Update Gunicorn from 19.7.1 to 19.9.0
  • HUE-7860 - [core] Update eventlet from 0.21.0 to 0.24.1
  • HUE-7860 - [core] Add dnspython 1.15.0
  • HUE-8139 - [core] Fix django-debug-toolbar 1.9.1 to work with django_debug_panel
  • HUE-8140 - [editor] Automatically continue execution after DDL statements in batch mode
  • HUE-8330 - [cluster] Keep only external cluster configs in [[clusters]]
  • HUE-8330 - [core] API should not check for remote cloud clusters if they are not configured
  • HUE-8339 - [impala] Fix typo in smart pooling ini configuration
  • HUE-8391 - [importer] Improve Create table from File UX when loading data from parent directory not readable by hive/impala
  • HUE-8488 - [fb] Disable drag&drop when show_upload_button=false
  • HUE-8507 - [editor] Add types to sqlalchemy results.
  • HUE-8507 - [editor] SQL alchemy result set column headers are missing.
  • HUE-8509 - [oozie] Schedule repetitive remote jobs
  • HUE-8509 - [oozie] Support sending a SQL query to a remote cluster
  • HUE-8509 - [jb] Clean-up of the listing of remote jobs
  • HUE-8509 - [oozie] Properly set the capture output flag of shell document action
  • HUE-8509 - [oozie] Remote job action
  • HUE-8509 - [kafka] Do not break left panel
  • HUE-8514 - [core] Log metrics when calling is_alive
  • HUE-8516 - [cluster] List more namespaces and filter out invalide ones
  • HUE-8518 - [editor] Fix sample Kudu
  • HUE-8519 - [jb] Impala API can now directly return json
  • HUE-8521 - [auth] Protect against empty LDAP login username
  • HUE-8522 - [jb] Make paused tasks more obvious. Add queued state to Impala
  • HUE-8523 - [jb] Display Impala backends & instances
  • HUE-8524 - [impala] Provide the root cause of INVALIDATE METADATA failures
  • HUE-8527 - [editor] Fix concatenation type exception in namespace call
  • HUE-8528 - [frontend] Temporarily disable namespace caching
  • HUE-8529 - [frontend] Create a context selector component
  • HUE-8531 - [sqoop] Properly name the table import job
  • HUE-8532 - [core] Fix database migration test.
  • HUE-8533 - [importer] Properly displayed failed import progress bar as red and not orange
  • HUE-8534 - [jb] Django url name does not exist and breaks page
  • HUE-8535 - [sqoop] Use the proper engine name and not the connection nice name as jdbc prefix
  • HUE-8536 - [sqoop] Include hive-site.xml automatically when importing data to hive
  • HUE-8537 - [sqoop] List the proper column type when importing to a hive table
  • HUE-8538 - [sqoop] Allow table preview from manual input not JDBC
  • HUE-8538 - [importer] Automatically fill-up the db driver list when selecting sqoop
  • HUE-8539 - [importer] Clean-up configuration and turn sqoop and solr imports to on by default
  • HUE-8540 - [sqoop] Add ability to set default jdbc driver path for any sqoop job
  • HUE-8541 - [oozie] Workflow rerun does not restart polling for job status
  • HUE-8542 - [frontend] Add a custom left nav for multi cluster mode
  • HUE-8542 - [frontend] Polish cloud cluster and require multi cluster mode to be on
  • HUE-8544 - [importer] Support sending file data into a kafka topic
  • HUE-8545 - [search] Fix filtering in the index selection dropdown
  • HUE-8546 - [assist] Limit assist refresh to the active namespace for DDL statement executions
  • HUE-8546 - [assist] Make sure the assist gets refreshed after multiple DDL statement executions
  • HUE-8547 - [jb] Fix navigation from create schedule to view schedule.
  • HUE-8547 - [jb] Fix refresh on coordinator page.
  • HUE-8548 - [jb] Fix invalid date in workflow task
  • HUE-8549 - [autocomplete] Improve CTE alias suggestions when there's a trailing ";"
  • HUE-8550 - [jb] Use the context selector component in the job browser
  • HUE-8550 - [frontend] Make last selected compute and namespace sticky
  • HUE-8550 - [jb] Default to the last selected type of compute in the job browser
  • HUE-8550 - [jb] Refresh job browser tabs on compute selection
  • HUE-8551 - [importer] Support setting basic Flume configs
  • HUE-8553 - [kafka] Link create topic API to the UI
  • HUE-8553 - [kafka] Add a workaround API for creating a topic
  • HUE-8554 - [indexer] Protect against empty sample data that can be null
  • HUE-8554 - [importer] Support latest Spark version 2 natively
  • HUE-8554 - [manager] Adding a check if service is installed API
  • HUE-8554 - [cluster] Create data warehouse cluster skeleton
  • HUE-8554 - [core] Support dist Spark installed when running envelope via shell
  • HUE-8554 - [cluster] Avoid double escapating of data warehouse results
  • HUE-8554 - [cluster] Rename analytic cluster API command to dataware
  • HUE-8555 - [cluster] Do not submit remote coordinator jobs by default
  • HUE-8555 - [jb] Refactor job browser preview to support multi cluster
  • HUE-8555 - [jb] Support killing data warehouse cluster
  • HUE-8555 - [jb] Sort clusters with the most recents first
  • HUE-8555 - [jb] List data warehouse clusters
  • HUE-8555 - [jb] Auto select the first cluster if possible at init
  • HUE-8556 - [fb] Overuse of trash folder checking
  • HUE-8557 - [sqoop] DB name and table names variables were already present
  • HUE-8557 - [sqoop] Offer to rename the table or selected a different existing Hive database
  • HUE-8558 - [jb] Add tracking URL to Spark Jobs and remove url and killUrl
  • HUE-8559 - [jb] Hue shows incorrect color for failed oozie jobs
  • HUE-8560 - [tb] Make sure the default DB is opened by default in the Table Browser
  • HUE-8560 - [tb] Stick to the same view when switching namespaces in the Table Browser
  • HUE-8561 - [editor] Don't show databases for spark editor
  • HUE-8562 - [frontend] Make sure the context popover is shown above the jobs panel
  • HUE-8564 - [useradmin] Fix last activity update for notebook/api/check_status
  • HUE-8564 - [useradmin] Fix last activity update for jobbrowser/api/jobs requests
  • HUE-8565 - [fb] Parent directory should not be selectable
  • HUE-8565 - [fb] Current directory should not be deletable.
  • HUE-8566 - [useradmin] Update message for duplicate user creation.
  • HUE-8567 - [jb] Fix id max length in mini jb
  • HUE-8568 - [jb] Prevent mini jb actions from taking content width
  • HUE-8568 - [jb] Activate smart file links from the logs by also checking for prefixes
  • HUE-8570 - [assist] Extract a separate column sample component
  • HUE-8570 - [frontend] Right align the Hue dropdown when rendered outside the window
  • HUE-8570 - [assist] Add distinct as an option for column samples in the context popover
  • HUE-8570 - [editor] Enable click to insert from sample popover to SQL variables
  • HUE-8570 - [assist] Add inline autocomplete for column samples
  • HUE-8570 - [editor] Enable optional operation on the sample API endpoints
  • HUE-8570 - [assist] Limit context popover sample operations to Impala and Hive
  • HUE-8570 - [assist] Add min and max to column sample popover
  • HUE-8571 - [sentry] navigator_api ERROR for PRIVILEGE_HIERARCHY[hierarchy[server][SENTRY_PRIVILEGE_KEY]['action']]
  • HUE-8572 - [cluster] Bubble up authentication errors on remote clusters
  • HUE-8572 - [tb] Fix JS exception when clearing table browser selection via pubsub
  • HUE-8572 - [tb] Add compute and namespace to DROP table endpoint
  • HUE-8572 - [tb] Fix log overflow in history panel
  • HUE-8573 - [sqoop] Out of the box import of a MySQL table
  • HUE-8573 - [sqoop] Avoid unrelated casting error when testing the connection
  • HUE-8574 - [importer] Adding Flume flows
  • HUE-8574 - [flume] Support updating Flume agent config
  • HUE-8574 - [importer] Nav Kafka stream import to Solr and Kudu part 1
  • HUE-8574 - [importer] Allow audit logs to be sent to Solr
  • HUE-8574 - [importer] Setup automatically a Flume grapping Hue HTTPD logs and put into the sample collection
  • HUE-8574 - [importer] Feature flag for showing the Field Editor
  • HUE-8574 - [cluster] Auto scaling data warehouse cluster API skeleton
  • HUE-8574 - [importer] Button caret to call for getting the job config
  • HUE-8575 - [importer] Add external multi table support
  • HUE-8575 - [importer] Fix file to table import.
  • HUE-8576 - [editor] Add backticked suggestion to the syntax checked for reserved keywords
  • HUE-8577 - [autocomplete] Add all currently reserved keywords for Impala
  • HUE-8577 - [editor] Rebuild Ace with updated dependencies
  • HUE-8577 - [autocomplete] Add support for Impala SHOW GRANT ROLE/USER statements
  • HUE-8577 - [autocomplete] Add support for Impala ALTER TABLE/VIEW SET OWNER
  • HUE-8577 - [autocomplete] Add Impala METHOD to reserved keywords
  • HUE-8577 - [autocomplete] Make previously non-reserved keywords reserved for Impala
  • HUE-8577 - [autocomplete] Fix issue where the statement type location is added twice
  • HUE-8578 - [importer] Auto select id column if present in Kudu tables
  • HUE-8578 - [importer] Implement Flume output
  • HUE-8578 - [importer] Get basic Flume ingest step integrated
  • HUE-8578 - [manager] Restrict API calls to admin
  • HUE-8579 - [core] Blacklisting certain apps like filebrowser and oozie can fail
  • HUE-8580 - [editor] Fix jdbc assist.
  • HUE-8580 - [importer] Improve usability of table import
  • HUE-8580 - [importer] Fix RDBMS support for scoop configured import.
  • HUE-8581 - [importer] Fix timing related JS exceptions
  • HUE-8581 - [importer] Improve query type selection layout for the field editor
  • HUE-8581 - [importer] Fix JS error on target namespace selection and improve layout for table import
  • HUE-8581 - [importer] Improve the stream import form layout
  • HUE-8581 - [importer] Allow typed paths in the hivechooser binding
  • HUE-8581 - [importer] Fix JS error for field query editor in importer
  • HUE-8582 - [jb] Make back button from editing a file more obvious
  • HUE-8583 - [fb] Surface too many buckets error
  • HUE-8588 - [core] Fix PAM backend has conflict with timer metrics
  • HUE-8589 - [core] Split cluster listing to its own API
  • HUE-8589 - [jb] Switch from compute to the cluster API endpoint in the job browser
  • HUE-8591 - [cluster] Integration skeleton for Data Warehouse v2 API
  • HUE-8591 - [impala] Properly pickup the selected compute cluster
  • HUE-8591 - [cluster] Remove extra debug info
  • HUE-8591 - [cluster] Step of logic simplification of the multi cluster configuration
  • HUE-8591 - [cluster] Display impalad hostname
  • HUE-8591 - [impala] Properly point to the selected cluster hostname
  • HUE-8591 - [cluster] Protect against override of cluster name
  • HUE-8591 - [core] Showing up S3 browser by default in cloud mode
  • HUE-8591 - [cluster] Move port to 21050
  • HUE-8591 - [cluster] Safeguard against localhost
  • HUE-8591 - [cluster] Properly use the correct cluster hostname in the editor
  • HUE-8591 - [cluster] Add hostname check in the cluster hostname log trace
  • HUE-8591 - [cluster] Split cluster template between static and dynamic clusters
  • HUE-8591 - [cluster] Clear the compute cache on namespace refresh from left assist
  • HUE-8591 - [cluster] Avoid failing when cluster is None
  • HUE-8591 - [cluster] Wire in API for listing and creating k8 clusters
  • HUE-8591 - [cluster] Plug in the list of clusters
  • HUE-8591 - [cluster] Adding cluster resize capabilities on the cluster page
  • HUE-8591 - [cluster] Add Thrift client used for the specific query server
  • HUE-8591 - [cluster] Refresh the context selector when namespaces are refreshed
  • HUE-8591 - [cluster] Hook in remote Impala coordinator URL of selected cluster
  • HUE-8591 - [cluster] Use default port if ont in a selected remote cluster
  • HUE-8591 - [cluster] Add impalad link to cluster page
  • HUE-8591 - [cluster] Add logic to get the corresponding Impalad name
  • HUE-8591 - [cluster] Add some progress bar color and effect on cluster resize
  • HUE-8591 - [cluster] Add proper cluster page
  • HUE-8591 - [cluster] Use name as clusterName throughout the calls
  • HUE-8591 - [cluster] Move API url to a config property
  • HUE-8591 - [cluster] Prevent red error popups
  • HUE-8591 - [cluster] Fix name of default cluster
  • HUE-8591 - [cluster] Use properly Impala Thrift Client on remote Impala cluster direct connection
  • HUE-8592 - [frontend] Enable default click to navigate for catalog entries table
  • HUE-8592 - [frontend] Add option to automatically refresh samples in the catalog entries table
  • HUE-8592 - [frontend] Create a polling catalog entries list component that waits until an entity exists
  • HUE-8594 - [editor] Avoid js error when lastSelectedCompute does not exist
  • HUE-8595 - [flume] Collect and ingest Hue balancer logs out of the box
  • HUE-8597 - [frontend] Use the default SQL interpreter as source type in the global search results
  • HUE-8599 - [frontend] Add pubSub to force clear the context catalog from the job browser
  • HUE-8599 - [frontend] Improve stability of the context selector
  • HUE-8600 - [tb] Limit Table Browser namespace selection to namespaces with active computes
  • HUE-8601 - [jb] Fix issue where context selector in mini jb is hidden behind expand text
  • HUE-8602 - [sentry] Remove ALTER and DROP table privileges for now
  • HUE-8602 - [sentry] Remove ALTER and DROP in the Hive section
  • HUE-8603 - [editor] Always show the query compatibility check results
  • HUE-8604 - [frontend] Use the latest opened database by default throughout
  • HUE-8606 - [s3] Opening S3 browser makes a call to HDFS
  • HUE-8607 - [tb] Include namespace when querying a table from the table browser
  • HUE-8607 - [tb] Fix broken drop table action in the table browser
  • HUE-8607 - [tb] Fix query and view table actions in the table browser
  • HUE-8609 - [tb] Fix exception in describe table call from the Table Browser
  • HUE-8610 - [tb] Make sure the created notebook for samples requests has the provided compute
  • HUE-8610 - [tb] Include compute in stats and describe table calls from the table browser
  • HUE-8610 - [core] Always send the full cluster instead of id to the APIs
  • HUE-8610 - [tb] Include compute when fetching samples from the table browser
  • HUE-8611 - [assist] Send cluster parameter with the invalidate calls
  • HUE-8612 - [editor] Improve the editor shortcut search to show results from all categories
  • HUE-8612 - [editor] Add missing keyboard shortcuts to the editor help
  • HUE-8613 - [tb] Send cluster when dropping databases from the table browser
  • HUE-8614 - [tb] Fix the create new database action in the Table Browser
  • HUE-8615 - [frontend] Make sure namespaces and computes always have a name in the context selector
  • HUE-8617 - [frontend] Add pubSub to the context selector for setting cluster/compute/namespace
  • HUE-8618 - [editor] Prevent js exception when typing while the context is loading
  • HUE-8619 - [tb] Include cluster in the partitions API call
  • HUE-8619 - [tb] Switch to POST for partitions API call
  • HUE-8621 - [editor] Add a custom Ace mode for the dark theme
  • HUE-8621 - [editor] Add keyboard shortcut to toggle dark mode
  • HUE-8621 - [editor] Add ace option to toggle dark mode
  • HUE-8621 - [editor] Add dark mode keyboard shortcut to the editor help
  • HUE-8623 - [frontend] Send cluster when checking if a table or database exists in the importer
  • HUE-8624 - [beeswax] Fix tests on create database to redirect on a v4 page
  • HUE-8625 - [editor] Prevent js exception when dragging from top search to the editor after visiting the importer
  • HUE-8626 - [security] Fix navigation issues after visiting the security app
  • HUE-8627 - [frontend] Add partition result view to the top search
  • HUE-8628 - [assist] Indicate context in the left assist filter placeholder
  • HUE-8629 - [assist] Don't show a database icon in the breadcrumb of non sql type assist panels
  • HUE-8629 - [assist] Customise the assist icons for streams
  • HUE-8629 - [assist] Add a dedicated streams assist panel
  • HUE-8629 - [assist] Make sure entries are loaded in left assist for non sql types
  • HUE-8629 - [assist] Improve assist context menu for kafka
  • HUE-8630 - [core] Fix TestMetastoreWithHadoop.test_basic_flow _get_apps
  • HUE-8630 - Fix TestRdbmsIndexer missing RdbmsIndexer
  • HUE-8630 - [fb] Fix TestFileBrowserWithHadoop.test_index home_directory
  • HUE-8634 - HUE-8111 [core] Perform 4.3 release
  • HUE-8635 - [editor] Add the correct styles to the language reference context popover
  • HUE-8639 - [metadata] Do not do Sentry filtering when Sentry is not configured
  • HUE-8639 - [metadata] Include the docstring into the configuration
  • HUE-8650 - [importer] Fix make_notebook default namespace & compute
  • HUE-8652 - [frontend] Fix JS exception in jquery.hiveautocomplete when no namespaces are returned
  • HUE-8654 - [editor] Prevent setting empty object for namespace and compute
  • HUE-8654 - [editor] Guarantee a namespace and compute is set in single cluster mode
  • HUE-8655 - [editor] Have the location handler wait for a compute and namespace to be set
  • HUE-8656 - [tb] Make sure a compute is always set in the table browser
  • HUE-8660 - [assist] Fix file preview in left assist for files with # in the name
  • HUE-8660 - [core] Fix page routing issues with file browser paths containing #
  • HUE-8660 - [assist] Support multiple # in file names for assist preview
  • HUE-8662 - [core] Fix missing static URLs

Apache Impala

The following issues are fixed in CDH 6.1.0:

  • IMPALA-6202 - The mod() function now behaves the same as the % operator.
  • IMPALA-6373 - Allow primitive type widening on parquet tables. Impala only supports conversion to those types without any loss of precision:
    • TINYINT (INT32) -> SMALLINT (INT32), INT (INT32), BIGINT (INT64), DOUBLE
    • SMALLINT (INT32) -> INT (INT32), BIGINT (INT64), DOUBLE
    • INT (INT32) -> BIGINT (INT64), DOUBLE
    • FLOAT -> DOUBLE
  • IMPALA-6442 - Fixed the incorrectly reported Parquet file offset in error messages.
  • IMPALA-6568 - The Query Compilation section was added to profile outputs.
  • IMPALA-6844 - Impala now correctly handles a possible null pointer in the to_date() function.
  • IMPALA-7272 - Fixed potential crash when a min-max runtime filter is generated for a string value.
  • IMPALA-7449 - Fixed network throughput calculation by measuring the network throughput of each individual RPC and uses a summary counter to track avg/min/max of network throughputs.
  • IMPALA-7585 - Now Impala always explicitly sets user credentials after creating RPC proxy.
  • IMPALA-7668 - Now Impala closes URLClassLoader instances and cleans up any open temporary jar files to avoid file descriptor leaks and disk space issues.
  • IMPALA-7824 - Running INVALIDATE METADATA with authorization enabled no longer causes a hang when Sentry is unavailable.

Apache Kafka

The following issues are fixed in CDH 6.1.0:

  • KAFKA-2983 - Remove Scala consumers and related code
  • KAFKA-3702 - Change log level of SSL close_notify failure
  • KAFKA-4950 - Fix ConcurrentModificationException on assigned-partitions metric update
  • KAFKA-5098 - KafkaProducer should reject sends to invalid topics
  • KAFKA-5588 - Remove deprecated --new-consumer tools option
  • KAFKA-5697 - Use nonblocking poll in Streams
  • KAFKA-5891 - Proper handling of LogicalTypes in Cast
  • KAFKA-5919 - Adding checks on "version" field for tools using it
  • KAFKA-6054 - Add 'version probing' to Kafka Streams rebalance
  • KAFKA-6264 - Split log segments as needed if offsets overflow the indexes
  • KAFKA-6538 - Changes to enhance ByteStore exceptions thrown from RocksDBStore with more human readable info
  • KAFKA-6546 - Use LISTENER_NOT_FOUND error for missing listener
  • KAFKA-6562 - Make jackson-databind an optional clients dependency
  • KAFKA-6648 - Fetcher.getTopicMetadata() should return all partitions for each requested topic
  • KAFKA-6697 - Broker should not die if getCanonicalPath fails
  • KAFKA-6704 - InvalidStateStoreException from IQ when StreamThread closes store
  • KAFKA-6711 - GlobalStateManagerImpl should not write offsets of in-memory stores in checkpoint file
  • KAFKA-6726 - Fine Grained ACL for CreateTopics (KIP-277)
  • KAFKA-6730 - Simplify State Store Recovery
  • KAFKA-6743 - ConsumerPerformance fails to consume all messages [KIP-281]
  • KAFKA-6749 - Fixed TopologyTestDriver to process stream processing guarantee as exactly once
  • KAFKA-6750 - Add listener name to authentication context (KIP-282)
  • KAFKA-6760 - Fix response logging in the Controller
  • KAFKA-6782 - solved the bug of restoration of aborted messages for GlobalStateStore and KGlobalTable
  • KAFKA-6805 - Enable broker configs to be stored in ZK before broker start
  • KAFKA-6809 - Count inbound connections in the connection-creation metric
  • KAFKA-6813 - return to double-counting for count topology names
  • KAFKA-6841 - Support Prefixed ACLs (KIP-290)
  • KAFKA-6859 - Do not send LeaderEpochRequest for undefined leader epochs
  • KAFKA-6860 - Fix NPE in Kafka Streams with EOS enabled
  • KAFKA-6884 - Consumer group command should use new admin client
  • KAFKA-6897 - Prevent KafkaProducer.send from blocking when producer is closed
  • KAFKA-6906 - MINOR: code cleanup follow up for
  • KAFKA-6906 - Fixed to commit transactions if data is produced via wall clock punctuation
  • KAFKA-6927 - Chunked down-conversion to prevent out of memory errors on broker [KIP-283]
  • KAFKA-6935 - Add config for allowing optional optimization
  • KAFKA-6936 - Implicit materialized for aggregate, count and reduce
  • KAFKA-6944 - Add system tests testing the new throttling behavior using older clients/brokers
  • KAFKA-6946 - Keep the session id for incremental fetch when fetch responses are throttled
  • KAFKA-6949 - alterReplicaLogDirs() should grab partition lock when accessing log of the future replica
  • KAFKA-6955 - Use Java AdminClient in DeleteRecordsCommand
  • KAFKA-6967 - TopologyTestDriver does not allow pre-populating state stores that have change logging
  • KAFKA-6973 - Validate topic config message.timestamp.type
  • KAFKA-6975 - Fix replica fetching from non-batch-aligned log start offset
  • KAFKA-6979 - Add `default.api.timeout.ms` to KafkaConsumer (KIP-266)
  • KAFKA-6981 - Move the error handling configuration properties into the ConnectorConfig and SinkConnectorConfig classes
  • KAFKA-6986 - Export Admin Client metrics through Stream Threads
  • KAFKA-6991 - Fix ServiceLoader issue with PluginClassLoader
  • KAFKA-6997 - Exclude test-sources.jar when $INCLUDE_TEST_JARS is FALSE
  • KAFKA-7001 - Rename errors.allowed.max property in Connect to errors.tolerance
  • KAFKA-7002 - Add a config property for DLQ topic's replication factor
  • KAFKA-7003 - Set error context in message headers
  • KAFKA-7005 - Remove duplicate resource class.
  • KAFKA-7006 - remove duplicate Scala ResourceNameType in preference to...
  • KAFKA-7007 - Use JSON for /kafka-acl-extended-changes path
  • KAFKA-7010 - Rename ResourceNameType to PatternType
  • KAFKA-7011 - Remove ResourceNameType field from Java Resource class.
  • KAFKA-7012 - Don't process SSL channels without data to process
  • KAFKA-7019 - Make reading metadata lock-free by maintaining an atomically-updated read snapshot
  • KAFKA-7021 - Reuse source based on config
  • KAFKA-7021 - Update upgrade guide section for reusing source topic
  • KAFKA-7023 - Move prepareForBulkLoad() call after customized RocksDBConfigSetter
  • KAFKA-7028 - Properly authorize custom principal objects
  • KAFKA-7029 - Update ReplicaVerificationTool not to use SimpleConsumer
  • KAFKA-7030 - Add configuration to disable message down-conversion (KIP-283)
  • KAFKA-7031 - Connect API shouldn't depend on Jersey
  • KAFKA-7032 - The TimeUnit is neglected by KakfaConsumer#close(long, Tim...
  • KAFKA-7039 - Create an instance of the plugin only it's a Versioned Plugin
  • KAFKA-7043 - Modified plugin isolation whitelist with recently added converters
  • KAFKA-7044 - Fix Fetcher.fetchOffsetsByTimes and NPE in describe consumer group
  • KAFKA-7047 - Added SimpleHeaderConverter to plugin isolation whitelist
  • KAFKA-7048 - NPE when creating connector
  • KAFKA-7050 - Decrease default consumer request timeout to 30s
  • KAFKA-7055 - Update InternalTopologyBuilder to throw TopologyException if a processor or sink is added with no upstream node attached
  • KAFKA-7056 - Moved Connects new numeric converters to runtime
  • KAFKA-7058 - Comparing schema default values using Objects#deepEquals()
  • KAFKA-7066 - added better logging in case of Serialisation issue
  • KAFKA-7068 - Handle null config values during transform
  • KAFKA-7076 - Skip rebuilding producer state when using old message format
  • KAFKA-7080 - pass segmentInterval to CachingWindowStore
  • KAFKA-7082 - Concurrent create topics may throw NodeExistsException
  • KAFKA-7091 - AdminClient should handle FindCoordinatorResponse errors
  • KAFKA-7097 - HOTFIX:; Set create time default to -1L in VerifiableProducer
  • KAFKA-7097 - VerifiableProducer does not work properly with --message-create-time argument
  • KAFKA-7104 - More consistent leader's state in fetch response
  • KAFKA-7111 - Log error connecting to node at a higher log level
  • KAFKA-7112 - MINOR:Only resume restoration if state is still PARTITIONS_ASSIGNED after poll
  • KAFKA-7119 - Handle transient Kerberos errors as non-fatal exceptions
  • KAFKA-7119 - Handle transient Kerberos errors on server side
  • KAFKA-7126 - Reduce number of rebalance for large consumer group after a topic is created
  • KAFKA-7128 - Follower has to catch up to offset within current leader epoch to join ISR
  • KAFKA-7136 - Avoid deadlocks in synchronized metrics reporters
  • KAFKA-7144 - Fix task assignment to be even
  • KAFKA-7147 - ReassignPartitionsCommand should be able to connect to broker over SSL
  • KAFKA-7164 - Follower should truncate after every missed leader epoch change
  • KAFKA-7168 - Treat connection close during SSL handshake as retriable
  • KAFKA-7182 - SASL/OAUTHBEARER client response missing %x01 seps
  • KAFKA-7185 - Allow empty resource name when matching ACLs
  • KAFKA-7192 - Follow-up: update checkpoint to the reset beginning offset
  • KAFKA-7192 - Wipe out if EOS is turned on and checkpoint file does not exist
  • KAFKA-7194 - Fix buffer underflow if onJoinComplete is retried after failure
  • KAFKA-7216 - Ignore unknown ResourceTypes while loading acl cache
  • KAFKA-7228 - Set errorHandlingMetrics for dead letter queue
  • KAFKA-7231 - Ensure NetworkClient uses overridden request timeout
  • KAFKA-7242 - Reverse xform configs before saving
  • KAFKA-7250 - switch scala transform to TransformSupplier
  • KAFKA-7250 - fix transform function in scala DSL to accept TranformerSupplier
  • KAFKA-7255 - Fix timing issue with create/update in SimpleAclAuthorizer
  • KAFKA-7261 - Record 1.0 for total metric when Count stat is used for rate
  • KAFKA-7278 - replaceSegments() should not call asyncDeleteSegment() for segments which have been removed from segments list
  • KAFKA-7280 - Synchronize consumer fetch request/response handling
  • KAFKA-7284 - streams should unwrap fenced exception
  • KAFKA-7285 - Create new producer on each rebalance if EOS enabled
  • KAFKA-7286 - Avoid getting stuck loading large metadata records
  • KAFKA-7287 - Set open ACL for old consumer znode path
  • KAFKA-7296 - Handle coordinator loading error in TxnOffsetCommit
  • KAFKA-7298 - Raise UnknownProducerIdException if next sequence number is unknown
  • KAFKA-7301 - Fix streams Scala join ambiguous overload
  • KAFKA-7316 - Fix Streams Scala filter recursive call #5538
  • KAFKA-7322 - Fix race condition between log cleaner thread and log retention thread when topic cleanup policy is updated
  • KAFKA-7347 - Return not leader error for OffsetsForLeaderEpoch requests to non-replicas
  • KAFKA-7353 - Connect logs 'this' for anonymous inner classes
  • KAFKA-7354 - Fix IdlePercent and NetworkProcessorAvgIdlePercent metric
  • KAFKA-7369 - Handle retriable errors in AdminClient list groups API
  • KAFKA-7385 - Fix log cleaner behavior when only empty batches are retained
  • KAFKA-7386 - streams-scala should not cache serdes
  • KAFKA-7388 - equal sign in property value for password
  • KAFKA-7414 - Out of range errors should never be fatal for follower
  • KAFKA-7434 - Fix NPE in DeadLetterQueueReporter
  • KAFKA-7453 - Expire registered channels not selected within idle timeout
  • KAFKA-7454 - Use lazy allocation for SslTransportLayer buffers and null them on close
  • KAFKA-7459 - Use thread-safe Pool for RequestMetrics.requestRateInternal
  • KAFKA-7460 - Fix Connect Values converter date format pattern

Apache Kudu

The following issues are fixed in CDH 6.1.0:

  • KUDU-844 - [webui]and other /tablet-rowsetlayout-svg improvements
  • KUDU-972 - Fixed an issue where Kudu’s block cache memory tracking (as seen on the /mem-trackers web UI page) wasn’t accounting for all of the overhead of the cache itself.
  • KUDU-1038 - When a tablet is deleted, its write-ahead log recovery directory is also deleted, if it exists.
  • KUDU-2179 - Fixed an issue where kudu cluster ksck running a snapshot checksum scan would use a single snapshot timestamp for all tablets. This caused the checksum process to fail if the checksum process took a long time and the number of tablets was sufficiently large. The tool should now be able to checksum tables even if the process takes many hours.
  • KUDU-2260 - Fixed a rare issue where system failure could leave unexpected null bytes at the end of metadata files, causing Kudu to be unable to restart.
  • KUDU-2293 - Fixed an issue with failed tablet copies that would cause subsequent tablet copies to crash the tablet server.
  • KUDU-2322 - Fixed a bug where leader logged excessively when the followers fell behind.
  • KUDU-2324 - Add gflags to disable individual maintenance ops.
  • KUDU-2335 - Fixed reporting of leader health during lifecycle transitions.
  • KUDU-2364 - When a tablet server was wiped and recreated with the same RPC address, ksck listed it twice, both as healthy, even though only one of them was there. This bug is now fixed by verifying the UUID of the server.
  • KUDU-2406 - Fixed an issue preventing Kudu from starting when using Vormetric’s encrypted filesystem (secfs2) on ext4.
  • KUDU-2414 - Fixed an issue where the C++ client would fail to reopen an expired scanner; instead, the client would retry in a tight loop and eventually timeout.
  • KUDU-2437 - Split a tablet into primary key ranges by size.
  • KUDU-2443 - Fixed moving single-replica tablets.
  • KUDU-2447 - Fixed a tablet server crash when a tablet is scanned with two predicates on its primary key and the predicates do not overlap.
  • KUDU-2463 - Fixed a bug in which incorrect results would be returned in scans following a server restart.
  • KUDU-2509 - Fixed use-after-free in case of WAL replay error.
  • KUDU-2510 - Fixed symmetric difference logging.
  • KUDU-2521 - Java Implementation for BloomFilter.
  • KUDU-2525 - Fixed an issue where the Kudu MapReduce connector’s KuduTableInputFormat may exhaust its scan too early.
  • KUDU-2531 - (part 1) Ignore invalid tablet metadata files.
  • .KUDU-2531 - (part 2) Add -nobackup flag to pbc edit tool.
  • KUDU-2540 - Fixed a bug causing a tablet server crash when a write batch request from a client failed coarse-grained authorization.
  • KUDU-2580 - Fixed authentication token reacquisition in the C++ client.
  • KUDU-2601 - Correctly print newly created files.
  • Fixed an error that would cause the Kudu CLI tool to unexpectedly exit when the connection to the master or tserver was abruptly closed.

Apache Oozie

The following issues are fixed in CDH 6.1.0:

  • OOZIE-2427 - [Kerberos] Authentication failure for the javascript resources under /ext-2.2
  • OOZIE-2791 - ShareLib installation may fail on busy Hadoop clusters
  • OOZIE-2867 - [Coordinators] Emphasize Region/City timezone format
  • OOZIE-2883 - ProxyUserService: invalid configuration error message is misleading
  • OOZIE-2914 - Consolidate trim calls
  • OOZIE-2934 - [sharelib/spark] Fix Findbugs error
  • OOZIE-2967 - TestStatusTransitService.testBundleStatusCoordSubmitFails fails intermittently in Apache Oozie Core 5.0.0-SNAPSHOT
  • OOZIE-2968 - TestJavaActionExecutor.testCredentialsSkip fails intermittently
  • OOZIE-3134 - Potential inconsistency between the in-memory SLA map and the Oozie database
  • OOZIE-3155 - [ui] Job DAG is not refreshed when a job is finished
  • OOZIE-3217 - Enable definition of admin users using oozie-site.xml
  • OOZIE-3221 - Rename DEFAULT_LAUNCHER_MAX_ATTEMPS
  • OOZIE-3224 - Upgrade Jetty to 9.3
  • OOZIE-3229 - [client] [ui] Improved SLA filtering options
  • OOZIE-3229 - [build] test-patch-30-distro improvement
  • OOZIE-3233 - Remove DST shift from the coordinator job's end time
  • OOZIE-3235 - Upgrade ActiveMQ to 5.15.3
  • OOZIE-3251 - Disable JMX for ActiveMQ in the tests
  • OOZIE-3257 - TestHiveActionExecutor#testHiveAction still fails
  • OOZIE-3260 - [sla] Remove stale item above max retries on JPA related errors from in-memory SLA map
  • OOZIE-3298 - [MapReduce action] External ID is not filled properly and failing MR job is treated as SUCCEEDED
  • OOZIE-3303 - Oozie UI does not work after Jetty 9.3 upgrade
  • OOZIE-3309 - Runtime error during /v2/sla filtering for bundle
  • OOZIE-3310 - SQL error during /v2/sla filtering
  • OOZIE-3348 - [Hive action] Remove dependency hive-contrib
  • OOZIE-3354 - [core] [SSH action] SSH action gets hung
  • OOZIE-3369 - [core] Upgrade guru.nidi:graphviz-java to 0.7.0
  • OOZIE-3370 - Property filtering is not consistent across job submission
  • OOZIE-3376 - [tests] TestGraphGenerator should assume JDK8 minor version at least 1.8.0_u40
  • OOZIE-3378 - Coordinator action's status is SUBMITTED after E1003 error

Apache Parquet

The following issues are fixed in CDH 6.1.0:

  • PARQUET-952 - Avro union with single type fails with 'is not a group'
  • PARQUET-1417 - BINARY_AS_SIGNED_INTEGER_COMPARATOR fails with IOBE for the same arrays with the different length

Apache Pig

There are no notable fixed issues in this release.

Cloudera Search

The following issues are fixed in CDH 6.1.0:

  • SOLR-12541 - Metrics handler throws an error if there are transient cores.
  • SOLR-12594 - MetricsHistoryHandler.getOverseerLeader fails when hostname contains hyphen.
  • SOLR-12683 - HashQuery will throw an exception if more than 4 partitionKeys is specified.
  • SOLR-12704 - Guard AddSchemaFieldsUpdateProcessorFactory against null field names and field values.
  • SOLR-12750 - Migrate API should lock the collection instead of shard.
  • SOLR-12765 - Incorrect format of JMX cache stats.
  • SOLR-12836 - ZkController creates a cloud solr client with no connection or read timeouts.

For more information on the fixes, see the upstream release notes:

Apache Sentry

The following issues are fixed in CDH 6.1.0:

  • SENTRY-853 - Handle show grant on auth failure correctly
  • SENTRY-1572 - SentryMain() shouldn't dynamically load tool class
  • SENTRY-1896 - Optimize retrieving entities by other entity types
  • SENTRY-1944 - Optimize DelegateSentryStore.getGroupsByRoles() and update SentryGenericPolicyProcessor to retrieve roles to group mapping in a single transaction
  • SENTRY-2085 - Sentry error handling exposes SentryGroupNotFoundException externally.
  • SENTRY-2092 - Drop Role log message shows "Creating role"
  • SENTRY-2115 - Incorrect behavior of HMsFollower when HDFSSync feature is disabled.
  • SENTRY-2127 - Fix unstable unit test TestColumnEndToEnd.testCrossDbTableOperations
  • SENTRY-2141 - Sentry Privilege TimeStamp is not converted to grantTime in HivePrivilegeInfo correctly
  • SENTRY-2143 - Table renames should synchronize with Sentry
  • SENTRY-2168 - Altering table will not update sentry permissions when HDFS sync is disabled
  • SENTRY-2194 - Upgrade Sentry hadoop-version dependency to 2.7.5
  • SENTRY-2198 - Update to Kafka 1.0.0.
  • SENTRY-2199 - Bump Hive version from 2.3.2 to 2.3.3
  • SENTRY-2200 - Migrate 3.x Datanucleus unsupported configurations to 4.1 Datanucleus
  • SENTRY-2209 - Incorrect class in SentryHdfsMetricsUtil.java.
  • SENTRY-2210 - AUTHZ_PATH should have index on the foreign key AUTHZ_OBJ_ID
  • SENTRY-2213 - Increase schema version from 2.0.0 to 2.1.0
  • SENTRY-2214 - Sentry should not allow URI grants to EMPTY or NULL locations
  • SENTRY-2224 - Support SHOW GRANT on HIVE_OBJECT
  • SENTRY-2231 - Fix URI check on List Privileges by Provider in SentryStore
  • SENTRY-2238 - Explicitly set Database on SentryHivePrivilegeObjectDesc
  • SENTRY-2244 - Alter sentry role or user at granting privilege can avoid extra query to database
  • SENTRY-2245 - Remove privileges that do not associate with a role or a user
  • SENTRY-2251 - Update user privileges based on changes to authorizables
  • SENTRY-2252 - Normalize the Sentry store API's to handle both user/role privileges
  • SENTRY-2255 - alter table set owner command can be executed only by user with proper privilege
  • SENTRY-2258 - Remove user when it is not associated with other objects
  • SENTRY-2259 - SQL CONSTRAINT name is too long for Oracle 11.2
  • SENTRY-2261 - Implement JSONAlterDatabaseMessage to write HMS alter database events
  • SENTRY-2262 - Sentry client is not compatible when connecting to Sentry 2.0
  • SENTRY-2264 - It is possible to elevate privileges from DROP using alter table rename
  • SENTRY-2270 - Illegal privileges on columns can be granted on Hive
  • SENTRY-2271 - Wrong log messages/method names in SentrySchema related classes.
  • SENTRY-2273 - Create the SHOW GRANT USER task for Hive
  • SENTRY-2280 - The request received in SentryPolicyStoreProcessor.sentry_notify_hms_event is null
  • SENTRY-2281 - list_privileges_by_user() fails with a JDODetachedFieldAccessException
  • SENTRY-2293 - Fix logging parameters on SentryHDFSServiceProcessor
  • SENTRY-2294 - Add requestorUsername to client.notifyHmsEvent() method
  • SENTRY-2295 - Owner privileges should not be granted to sentry admin users
  • SENTRY-2307 - Avoid HMS event synchronization while sentry is fetching full snapshot
  • SENTRY-2309 - Port ModifiedCatch NPE thrown when fetching Partitions with no corresponding SDS entry
  • SENTRY-2310 - Sentry is not be able to fetch full update subsequently, when there is HMS restart in the snapshot process.
  • SENTRY-2312 - Update owner privileges for table when owner is changed.
  • SENTRY-2313 - alter database set owner command can be executed only by user with proper privilege
  • SENTRY-2315 - The grant all operation is not dropping the create/alter/drop/index/lock privileges
  • SENTRY-2324 - Allow sentry to fetch configurable notifications from HMS
  • SENTRY-2332 - Load hadoop default configuration when starting sentry service
  • SENTRY-2333 - Create index AUTHZ_PATH_FK_IDX at table AUTHZ_PATH for Postgres only when it does not exist
  • SENTRY-2352 - User roles with ALTER on a table can not show or describe the table on which they have ALTER
  • SENTRY-2359 - Object owner is unable to grant privileges: SentryAccessDeniedException
  • SENTRY-2373 - Incorrect WARN message when processing add partition messages
  • SENTRY-2376 - Bump Jackson libraries versions to 1.9.13 and 2.9.6
  • SENTRY-2392 - Add metrics statistics to list_user_privileges and list_role_privileges API
  • SENTRY-2395 - ALTER VIEW AS SELECT is asking for CREATE privileges instead of ALTER
  • SENTRY-2403 - Incorrect naming in RollingFileWithoutDeleteAppender
  • SENTRY-2406 - Make sure inputHierarchy and outputHierarchy have unique values
  • SENTRY-2409 - ALTER TABLE SET OWNER does not allow to change the table if using only the table name
  • SENTRY-2417 - LocalGroupMappingService class docs do not accurately reflect required INI format
  • SENTRY-2419 - Log where sentry stands in the process of persisting the snpashot
  • SENTRY-2423 - Increase the allocation size for auto-increment of id's for Snapshot tables.
  • SENTRY-2427 - Use Hadoop KerberosName class to derive shortName
  • SENTRY-2429 - Transfer database owner drops table owner
  • SENTRY-2432 - PortThe case of a username is ignored when determining object ownership
  • SENTRY-2433 - Dropping object privileges does not include update of dropping user privileges

Apache Spark

The following issues are fixed in CDH 6.1.0:

  • SPARK-4502 - [SQL] Rename to spark.sql.optimizer.nestedSchemaPruning.enabled
  • SPARK-19355 - Revert[SPARK-25352]
  • SPARK-19724 - [SQL] allowCreatingManagedTableUsingNonemptyLocation should have legacy prefix
  • SPARK-20327 - [YARN] Follow up: fix resource request tests on Hadoop 3.
  • SPARK-20327 - [CORE][YARN] Add CLI support for YARN custom resources, like GPUs
  • SPARK-20360 - [PYTHON] reprs for interpreters
  • SPARK-20594 - Adjust fix forfor CDH version of Hive.
  • SPARK-21318 - [SQL] Improve exception message thrown by `lookupFunction`
  • SPARK-21402 - [SQL] Fix java array of structs deserialization
  • SPARK-22666 - [ML][FOLLOW-UP] Improve testcase to tolerate different schema representation
  • SPARK-23401 - [PYTHON][TESTS] Add more data types for PandasUDFTests
  • SPARK-23429 - [CORE] Add executor memory metrics to heartbeat and expose in executors REST API
  • SPARK-23549 - [SQL] Rename config spark.sql.legacy.compareDateTimestampInTimestamp
  • SPARK-23715 - Revert "[SQL] the input of to/from_utc_timestamp can not have timezone
  • SPARK-23907 - [SQL] Revert regr_* functions entirely
  • SPARK-23972 - Revert "[BUILD][SQL] Update Parquet to 1.10.0."
  • SPARK-24157 - [SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicroBatches.enabled
  • SPARK-24324 - [PYTHON][FOLLOW-UP] Rename the Conf to spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName
  • SPARK-24518 - Revert "[CORE] Using Hadoop credential provider API to store password"
  • SPARK-24519 - [CORE] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS only once
  • SPARK-24709 - [SQL][FOLLOW-UP] Make schema_of_json's input json as literal only
  • SPARK-24709 - [SQL][2.4] use str instead of basestring in isinstance
  • SPARK-24777 - [SQL] Add write benchmark for AVRO
  • SPARK-24787 - [CORE] Revert hsync in EventLoggingListener and make FsHistoryProvider to read lastBlockBeingWritten data for logs
  • SPARK-24918 - [CORE] Executor Plugin API
  • SPARK-25021 - [K8S][BACKPORT] Add spark.executor.pyspark.memory limit for K8S
  • SPARK-25044 - [FOLLOW-UP] Change ScalaUDF constructor signature
  • SPARK-25314 - [SQL] Fix Python UDF accessing attributes from both side of join in join conditions
  • SPARK-25318 - Add exception handling when wrapping the input stream during the the fetch or stage retry in response to a corrupted block
  • SPARK-25321 - [ML] Fix local LDA model constructor
  • SPARK-25384 - [SQL] Clarify fromJsonForceNullableSchema will be removed in Spark 3.0
  • SPARK-25416 - [SQL] ArrayPosition function may return incorrect result when right expression is implicitly down casted
  • SPARK-25417 - [SQL] ArrayContains function may return incorrect result when right expression is implicitly down casted
  • SPARK-25422 - [CORE] Don't memory map blocks streamed to disk.
  • SPARK-25425 - [SQL][BACKPORT-2.4] Extra options should override session options in DataSource V2
  • SPARK-25450 - [SQL] PushProjectThroughUnion rule uses the same exprId for project expressions in each Union child, causing mistakes in constant propagation
  • SPARK-25454 - [SQL] add a new config for picking minimum precision for integral literals
  • SPARK-25460 - [BRANCH-2.4][SS] DataSourceV2: SS sources do not respect SessionConfigSupport
  • SPARK-25468 - [WEBUI] Highlight current page index in the spark UI
  • SPARK-25469 - [SQL] Eval methods of Concat, Reverse and ElementAt should use pattern matching only once
  • SPARK-25495 - [SS] FetchedData.reset should reset all fields
  • SPARK-25502 - [CORE][WEBUI] Empty Page when page number exceeds the reatinedTask size.
  • SPARK-25503 - [CORE][WEBUI] Total task message in stage page is ambiguous
  • SPARK-25505 - [SQL] The output order of grouping columns in Pivot is different from the input order
  • SPARK-25505 - [SQL][FOLLOWUP] Fix for attributes cosmetically different in Pivot clause
  • SPARK-25509 - [CORE] Windows doesn't support POSIX permissions
  • SPARK-25519 - [SQL] ArrayRemove function may return incorrect result when right expression is implicitly downcasted.
  • SPARK-25521 - [SQL] Job id showing null in the logs when insert into command Job is finished.
  • SPARK-25522 - [SQL] Improve type promotion for input arguments of elementAt function
  • SPARK-25533 - [CORE][WEBUI] AppSummary should hold the information about succeeded Jobs and completed stages only
  • SPARK-25535 - [CORE] Work around bad error handling in commons-crypto.
  • SPARK-25536 - [CORE] metric value for METRIC_OUTPUT_RECORDS_WRITTEN is incorrect
  • SPARK-25538 - [SQL] Zero-out all bytes when writing decimal
  • SPARK-25543 - [K8S] Print debug message iff execIdsRemovedInThisRound is not empty.
  • SPARK-25546 - [CORE] Don't cache value of EVENT_LOG_CALLSITE_LONG_FORM.
  • SPARK-25568 - [CORE] Continue to update the remaining accumulators when failing to update one accumulator
  • SPARK-25579 - [SQL] Use quoted attribute names if needed in pushed ORC predicates
  • SPARK-25591 - [PYSPARK][SQL] Avoid overwriting deserialized accumulator
  • SPARK-25601 - [PYTHON] Register Grouped aggregate UDF Vectorized UDFs for SQL Statement
  • SPARK-25602 - [SQL] SparkPlan.getByteArrayRdd should not consume the input when not necessary
  • SPARK-25636 - [CORE] spark-submit cuts off the failure reason when there is an error connecting to master
  • SPARK-25644 - [SS] Fix java foreachBatch in DataStreamWriter
  • SPARK-25660 - [SQL] Fix for the backward slash as CSV fields delimiter
  • SPARK-25669 - [SQL] Check CSV header only when it exists
  • SPARK-25673 - [BUILD] Remove Travis CI which enables Java lint check
  • SPARK-25674 - [SQL] If the records are incremented by more than 1 at a time,the number of bytes might rarely ever get updated
  • SPARK-25674 - [FOLLOW-UP] Update the stats for each ColumnarBatch
  • SPARK-25690 - [SQL] Analyzer rule HandleNullInputsForUDF does not stabilize and can be applied infinitely
  • SPARK-25697 - [CORE] When zstd compression enabled, InProgress application is throwing Error in the history webui
  • SPARK-25704 - [CORE] Allocate a bit less than Int.MaxValue
  • SPARK-25708 - [SQL] HAVING without GROUP BY means global aggregate
  • SPARK-25714 - Fix Null Handling in the Optimizer rule BooleanSimplification
  • SPARK-25718 - [SQL] Detect recursive reference in Avro schema and throw exception
  • SPARK-25727 - [SQL] Add outputOrdering to otherCopyArgs in InMemoryRelation
  • SPARK-25738 - [SQL] Fix LOAD DATA INPATH for hdfs port
  • SPARK-25741 - [WEBUI] Long URLs are not rendered properly in web UI
  • SPARK-25768 - [SQL] fix constant argument expecting UDAFs
  • SPARK-25776 - [CORE]The disk write buffer size must be greater than 12
  • SPARK-25793 - [ML] call SaveLoadV2_0.load for classNameV2_0
  • SPARK-25816 - [SQL] Fix attribute resolution in nested extractors
  • SPARK-25822 - [PYSPARK] Fix a race condition when releasing a Python worker
  • SPARK-25827 - [CORE] Avoid converting incoming encrypted blocks to byte buffers
  • SPARK-25840 - [BUILD] `make-distribution.sh` should not fail due to missing LICENSE-binary
  • SPARK-25842 - [SQL] Deprecate rangeBetween APIs introduced in SPARK-21608
  • SPARK-25854 - [BUILD] fix `build/mvn` not to fail during Zinc server shutdown
  • SPARK-25855 - [CORE] Don't use erasure coding for event logs by default
  • SPARK-25871 - [STREAMING] Don't use EC for streaming WAL
  • SPARK-25904 - [CORE] Allocate arrays smaller than Int.MaxValue
  • SPARK-25918 - [SQL] LOAD DATA LOCAL INPATH should handle a relative path

Apache Sqoop

The following issues are fixed in CDH 6.1.0:

  • SQOOP-2567 - SQOOP import for Oracle fails with invalid precision/scale for decimal
  • SQOOP-2949 - SQL Syntax error when split-by column is of character type and min or max value has single quote inside it
  • SQOOP-3042 - Sqoop does not clear compile directory under /tmp/sqoop-username/compile automatically
  • SQOOP-3052 - Introduce Gradle based build for Sqoop to make it more developer friendly / open
  • SQOOP-3082 - Sqoop import fails after TCP connection reset if split by datetime column
  • SQOOP-3224 - Mainframe FTP transfer should have an option to use binary mode for transfer
  • SQOOP-3225 - Mainframe module FTP listing parser should cater for larger datasets on disk
  • SQOOP-3267 - Incremental import to HBase deletes only last version of column
  • SQOOP-3288 - Changing OracleManager to use CURRENT_TIMESTAMP instead of
  • SQOOP-3300 - Implement JDBC and Kerberos tools for HiveServer2 support
  • SQOOP-3309 - Implement HiveServer2 client
  • SQOOP-3326 - Mainframe FTP listing for GDG should filter out non-GDG datasets in a heterogeneous listing
  • SQOOP-3327 - Mainframe FTP needs to Include "Migrated" datasets when parsing the FTP list
  • SQOOP-3328 - Implement an alternative solution for Parquet reading and writing
  • SQOOP-3330 - Sqoop --append does not work with -Dmapreduce.output.basename
  • SQOOP-3331 - Add Mainframe FTP integration test for GDG dataset.
  • SQOOP-3333 - Change default behavior of the MS SQL connector to non-resilient.
  • SQOOP-3335 - Add Hive support to the new Parquet writing implementation
  • SQOOP-3353 - Sqoop should not check incremental constraints for HBase imports
  • SQOOP-3378 - Error during direct Netezza import/export can interrupt process in uncontrolled ways

Apache Zookeeper

The following issues are fixed in CDH 6.1.0:

  • ZOOKEEPER-706 - Large numbers of watches can cause session re-establishment to fail
  • ZOOKEEPER-1382 - Zookeeper server holds onto dead/expired session ids in the watch data structures