New Features

This section highlights new features in HDP 3.0.0.


Apache Components	Feature
Atlas	Core Capabilities Show attributes for time-bound classification or business catalog mapping (ATLAS-2457 ) Support Business Terms and Categories Migrate Atlas data in Titan graph DB to JanusGraph DB Atlas Hbase hook to capture metadata and lineage Tag propagation from object to child object or derivative asset (ATLAS-1821) Address Storm Atlas hook compatibility with Storm 1.2 Security Metadata Security: Authorization based on Classification or Business Catalog mapping (ATLAS-2459)
Druid	Integration Kafka-Druid ingest. You can now map a Kafka topic to a Druid table. The events will be automatically ingested and available for querying in near real-time.
HDFS	Core Capabilities Balance utilization of disks (varied capacities) inside a DataNode (HDFS-1312) Reduce Storage Overhead of HDFS with Directory Level Reed Solomon Erasure Coding Encoding (HDFS-7285) Support 2 Standby NameNodes for NameNode High Availability NFS Gateway needs to work in front of ViewFS (file access to the unified namespace) Expose Encrypted zones and Erasure coded zones via WebHDFS API (HDFS-11394, HDFS-13512) Hive Support on Erasure Coded Directories (HDFS-7285) Cloud Testing for HDFS: Cloud Failure Modes/Availability Cloud: Connect/reattach to Elastic Block Volume to use centralized block storage for better TCO (vs. local disks)
HBase	Core Capabilities Procedure V2. You can use Procedure V2 or procv2, which is an updated framework for executing multi-step HBase administrative operations when there is a failure, you can use this framework to implement all the master operations using procv2 to remove the need for tools like hbck in the future. Use procv2 to create, modify and delete tables. Other systems such as new AssignmentManager is implemented using procv2. Fully off-heap read/write path. When you write data into HBase through Put operation, the cell objects do not enter JVM heap until the data is flushed to disk in an HFile. This helps to reduce total heap usage of a RegionServer and it copies less data making it more efficient. Use of Netty for RPC layer and Async API. This replaces the old Java NIO RPC server with a Netty RPC server. Netty provides you the ability to easily provide an Asynchronous Java client API. In-memory compactions. Periodic reorganization of the data in the Memstore can result in a reduction of overall I/O, that is data written and accessed from HDFS. The net performance increases when we keep more data in memory for a longer period of time. Better dependency management. HBase now internally shades commonly-incompatible dependencies to prevent issues for downstream users. You can use shaded client jars that will reduce the burden on the existing applications. Coprocessor and Observer API rewrite. Minor changes made to the API to remove ambiguous, misleading, and dangerous calls.
Hive	Core Capabilities Workload management for LLAP. You can now run LLAP, in a multi-tenant environment without worrying about resource competition. ACID v2 and ACID on by default. ACID v2 has performance improvements in both storage format and execution engine, there is either equal or better performance when compared to non-ACID tables. ACID on is enabled by default to allow full support for data updates. Materialized view navigation. Hive’s query engine now supports materialized view. The query engine will automatically use materialized view when they are available to speed up your queries. Information schema. Hive now exposes the metadata of the database (tables, columns etc.) via Hive SQL interface directly. Integration Hive Warehouse Connector for Spark. Hive Warehouse Connector allows you to connect Spark application with Hive data warehouses. The connector automatically handles ACID tables. JDBC storage connector. You can now map any JDBC database’s tables into Hive and query those tables in conjunction with other tables
Kafka	Core Capabilities Kafka has been upgraded from 1.0.0 to 1.0.1 with some critical bug fixes. A new feature was added to capture producer and topic partition level metrics. Security Cache `lastEntry` in `TimeIndex` to avoid unnecessary disk access (KAFKA-6172) `AbstractIndex` should cache index file to avoid unnecessary disk access during resize() (KAFKA-6175) `SSLTransportLayer` should keep reading from socket until either the buffer is full or the socket has no more data (KAFKA-6258)
Knox	Usability Admin UI along with service discovery and topology generation feature for simplifying and accelerating Knox configuration Security Added SSO support for Zeppelin, YARN, MR2, HDFS, and Oozie Added Knox Proxy support for YARN, Oozie, SHS (Spark History Server), HDFS, MR2, Livy, and SmartSense
Oozie	Core Capabilities Upgrade Oozie baseline to 4.3.1 from 4.2 Disable Oozie Hive Action Integration Oozie support for Spark2
Phoenix	Core Capabilities Query log. This is a new system table "SYSTEM.LOG" that captures information about queries that are being run against the cluster (client-driven). Column encoding. This is new to HDP. You can use a custom encoding scheme of data in the HBase table to reduce the amount of space taken. This increases the performance due to less data to read and thereby reduces the storage. The performance gain is 30% and above for the sparse tables. Supports GRANT and REVOKE commands. This provides automatic changes to indexes ACLs, if access changed for data table or view. Support for sampling tables. Supports atomic update (ON DUPLICATE KEY). Supports snapshot scanners for MR-based queries. Integration HBase 2.0 support. Python driver for Phoenix Query Server. This Provides Python DB 2.0 API implementation. Hive 3.0 support for Phoenix. This provides updated phoenix-hive StorageHandler for the new Hive version. Spark 2.3 support for Phoenix. This provides updated phoenix-spark driver for new the Spark version. Security Hardening of both the secondary indexes that includes Local and Global.
Ranger	Security Time Bound Authorization Policies Hive UDF execution authorization Hive workload management authorization RangerKafkaAuthorizer to support new operations and resources added in Kafka 1.0 Read-only Ranger user roles for auditing purposes Auditing for `usersync` operations HDFS Federation support Support metadata authorization changes in Atlas 1.0 Add ability to specify passwords for admin accounts during ranger install Add consolidated db schema script for all supported DB flavor Ease of Use Show actual hive query in Ranger Audit UI Group policies using Labels Install and turn on Ranger and Atlas by default in HDP3
Spark	Core Capabilities Spark 2.3.1 GA on HDP 3.0 Structured Streaming support for ORC Enable Security and ACLs in History Server Support running Spark jobs in a Docker Container Upgrade Spark/Zeppelin/Livy from HDP 2.6 to HDP 3.0 Cloud: Spark testing with S3Guard/S3A Committers Certification for the Staging Committer with Spark Integrate with new Metastore Catalog feature Beeline support for Spark Thrift Server Integration Support per notebook interpreter configuration Livy to support ACLs Knox to proxy Spark History Server UI Structured Streaming support for Hive Streaming library Transparent write to Hive warehouse
Storm	Core Capabilities Storm has been upgraded from 1.1.0 to 1.2.1. Storm 1.2.1 now supports all HDP 3.0 components including Hadoop/HDFS 3.0, HBase 2.0 and Hive 3.
YARN	Core Capabilities Support intra-queue preemption to support balancing between apps from different users and priorities in the same queue Support async-scheduling (vs. per node-heartbeat) for better response time in a large YARN cluster Support generalized resource-placement in YARN: Affinity/Anti-affinity Application Priority scheduling support in Capacity Scheduler Expose a framework for more powerful apps to queues mappings (YARN-8016, YARN-3635) Support Application timeout feature in YARN Support GPU scheduling/isolation on YARN Support Docker Containers running on YARN Support assemblies on YARN (YARN-6613) YARN Service framework - Slider functionality in YARN Support a simplified services REST API on Slider / YARN Simplified discovery of services via DNS Support HDP on (YARN + Slider + Docker / YCloud) via CloudBreak integration NodeManager should support automatic restart of service containers Support auto-spawning of admin configured system-services Migrate LLAP-on-Slider to LLAP-on-YARN-Service-Framework Support dockerized Spark jobs on YARN Ease of Use Timeline Service V2 Need resource manager web UI authorization control (users can only see their own jobs) Support better classpath isolation for users, remove guava conflicts from user runtime A more user-friendly and developer-friendly YARN web UI YARN/MapReduce integration with SSO/Proxy (via Knox) Enterprise Readiness Enable Capacity Scheduler preemption by default Cgroup support for YARN in a non-secure cluster, and so LinuxContainerExecutor always on by default Enable cgroups and CPU scheduling for YARN containers by default Support for deleting queues without requiring a RM restart Enhancements to STOP queue handling Support Side by Side HDFS tarball based install of multiple Spark auxiliary services in YARN (YARN-1151) Create a log aggregation tool (HAR files) to reduce NameNode load (YARN-4086) YARN queue ACL support when doAs=false Provide an API in YARN to get queue mapping result before application submitted
Zeppelin	Core Capabilities Change Zeppelin UI across the board to not display stack traces Zeppelin should have the option for user name case conversion (ZEPPELIN-3312) Update to 0.8 release of Zeppelin Ease of Use Zeppelin UI: Knox SSO for Quicklinked Web UIs