Atlas |
Core Capabilities
-
Show attributes for time-bound classification or business
catalog mapping (ATLAS-2457 )
Support Business Terms and Categories
-
Migrate Atlas data in Titan graph DB to JanusGraph
DB
-
Atlas Hbase hook to capture metadata and lineage
-
Tag propagation from object to child object or derivative
asset (ATLAS-1821)
-
Address Storm Atlas hook compatibility with Storm
1.2
Security
|
Druid |
Integration
- Kafka-Druid ingest. You can now map a Kafka topic to a Druid
table. The events will be automatically ingested and available
for querying in near real-time.
|
HDFS |
Core Capabilities
-
Balance utilization of disks (varied capacities) inside a
DataNode (HDFS-1312)
-
Reduce Storage Overhead of HDFS with Directory Level Reed
Solomon Erasure Coding Encoding (HDFS-7285)
-
Support 2 Standby NameNodes for NameNode High
Availability
-
NFS Gateway needs to work in front of ViewFS (file access
to the unified namespace)
-
Expose Encrypted zones and Erasure coded zones via WebHDFS
API (HDFS-11394, HDFS-13512)
-
Hive Support on Erasure Coded Directories (HDFS-7285)
-
Cloud Testing for HDFS: Cloud Failure
Modes/Availability
-
Cloud: Connect/reattach to Elastic Block Volume to use
centralized block storage for better TCO (vs. local
disks)
|
HBase |
Core Capabilities
- Procedure V2. You can use Procedure V2 or procv2, which is an
updated framework for executing multi-step HBase administrative
operations when there is a failure, you can use this framework
to implement all the master operations using procv2 to remove
the need for tools like hbck in the future. Use procv2 to
create, modify and delete tables. Other systems such as new
AssignmentManager is implemented using procv2.
- Fully off-heap read/write path. When you write data into HBase
through Put operation, the cell objects do not enter JVM heap
until the data is flushed to disk in an HFile. This helps to
reduce total heap usage of a RegionServer and it copies less
data making it more efficient.
- Use of Netty for RPC layer and Async API. This replaces the old
Java NIO RPC server with a Netty RPC server. Netty provides you
the ability to easily provide an Asynchronous Java client
API.
- In-memory compactions. Periodic reorganization of the data in
the Memstore can result in a reduction of overall I/O, that is
data written and accessed from HDFS. The net performance
increases when we keep more data in memory for a longer period
of time.
- Better dependency management. HBase now internally shades
commonly-incompatible dependencies to prevent issues for
downstream users. You can use shaded client jars that will
reduce the burden on the existing applications.
- Coprocessor and Observer API rewrite. Minor changes made to the
API to remove ambiguous, misleading, and dangerous calls.
|
Hive |
Core Capabilities
-
Workload management for LLAP. You can now run LLAP, in a
multi-tenant environment without worrying about resource
competition.
-
ACID v2 and ACID on by default. ACID v2 has performance
improvements in both storage format and execution engine,
there is either equal or better performance when compared to
non-ACID tables. ACID on is enabled by default to allow full
support for data updates.
-
Materialized view navigation. Hive’s query engine now
supports materialized view. The query engine will
automatically use materialized view when they are available
to speed up your queries.
- Information schema. Hive now exposes the metadata of the
database (tables, columns etc.) via Hive SQL interface
directly.
Integration
- Hive Warehouse Connector for Spark. Hive Warehouse Connector
allows you to connect Spark application with Hive data
warehouses. The connector automatically handles ACID
tables.
- JDBC storage connector. You can now map any JDBC database’s
tables into Hive and query those tables in conjunction with
other tables
|
Kafka |
Core Capabilities
Security
-
Cache lastEntry in
TimeIndex to avoid unnecessary disk
access (KAFKA-6172)
-
AbstractIndex should cache index file to
avoid unnecessary disk access during resize() (KAFKA-6175)
-
SSLTransportLayer should keep reading from
socket until either the buffer is full or the socket has no
more data (KAFKA-6258)
|
Knox |
Usability
- Admin UI along with service discovery and topology generation
feature for simplifying and accelerating Knox configuration
Security
- Added SSO support for Zeppelin, YARN, MR2, HDFS, and Oozie
- Added Knox Proxy support for YARN, Oozie, SHS (Spark History
Server), HDFS, MR2, Livy, and SmartSense
|
Oozie |
Core Capabilities
Integration
|
Phoenix |
Core Capabilities
-
Query log. This is a new system table "SYSTEM.LOG" that
captures information about queries that are being run
against the cluster (client-driven).
-
Column encoding. This is new to HDP. You can use a custom
encoding scheme of data in the HBase table to reduce the
amount of space taken. This increases the performance due to
less data to read and thereby reduces the storage. The
performance gain is 30% and above for the sparse tables.
-
Supports GRANT and REVOKE commands. This provides automatic
changes to indexes ACLs, if access changed for data table or
view.
-
Support for sampling tables.
-
Supports atomic update (ON DUPLICATE KEY).
-
Supports snapshot scanners for MR-based queries.
Integration
-
HBase 2.0 support.
-
Python driver for Phoenix Query Server. This Provides Python
DB 2.0 API implementation.
-
Hive 3.0 support for Phoenix. This provides updated
phoenix-hive StorageHandler for the new Hive version.
-
Spark 2.3 support for Phoenix. This provides updated
phoenix-spark driver for new the Spark version.
Security
|
Ranger |
Security
-
Time Bound Authorization Policies
-
Hive UDF execution authorization
-
Hive workload management authorization
-
RangerKafkaAuthorizer to support new operations and
resources added in Kafka 1.0
-
Read-only Ranger user roles for auditing purposes
-
Auditing for usersync operations
-
HDFS Federation support
-
Support metadata authorization changes in Atlas 1.0
-
Add ability to specify passwords for admin accounts during
ranger install
-
Add consolidated db schema script for all supported DB
flavor
Ease of Use
-
Show actual hive query in Ranger Audit UI
-
Group policies using Labels
-
Install and turn on Ranger and Atlas by default in
HDP3
|
Spark |
Core Capabilities
-
Spark 2.3.1 GA on HDP 3.0
-
Structured Streaming support for ORC
-
Enable Security and ACLs in History Server
-
Support running Spark jobs in a Docker Container
-
Upgrade Spark/Zeppelin/Livy from HDP 2.6 to HDP 3.0
-
Cloud: Spark testing with S3Guard/S3A Committers
-
Certification for the Staging Committer with Spark
-
Integrate with new Metastore Catalog feature
-
Beeline support for Spark Thrift Server
Integration
-
Support per notebook interpreter configuration
-
Livy to support ACLs
-
Knox to proxy Spark History Server UI
-
Structured Streaming support for Hive Streaming
library
-
Transparent write to Hive warehouse
|
Storm |
Core Capabilities
- Storm has been upgraded from 1.1.0 to 1.2.1. Storm 1.2.1 now
supports all HDP 3.0 components including Hadoop/HDFS 3.0, HBase
2.0 and Hive 3.
|
YARN |
Core Capabilities
-
Support intra-queue preemption to support balancing
between apps from different users and priorities in the same
queue
-
Support async-scheduling (vs. per node-heartbeat) for
better response time in a large YARN cluster
-
Support generalized resource-placement in YARN:
Affinity/Anti-affinity
-
Application Priority scheduling support in Capacity
Scheduler
-
Expose a framework for more powerful apps to queues
mappings (YARN-8016, YARN-3635)
-
Support Application timeout feature in YARN
-
Support GPU scheduling/isolation on YARN
-
Support Docker Containers running on YARN
-
Support assemblies on YARN (YARN-6613)
-
YARN Service framework - Slider functionality in
YARN
-
Support a simplified services REST API on Slider /
YARN
-
Simplified discovery of services via DNS
-
Support HDP on (YARN + Slider + Docker / YCloud) via
CloudBreak integration
-
NodeManager should support automatic restart of service
containers
-
Support auto-spawning of admin configured
system-services
-
Migrate LLAP-on-Slider to
LLAP-on-YARN-Service-Framework
-
Support dockerized Spark jobs on YARN
Ease of Use
-
Timeline Service V2
-
Need resource manager web UI authorization control (users
can only see their own jobs)
-
Support better classpath isolation for users, remove guava
conflicts from user runtime
-
A more user-friendly and developer-friendly YARN web
UI
-
YARN/MapReduce integration with SSO/Proxy (via Knox)
Enterprise Readiness
-
Enable Capacity Scheduler preemption by default
-
Cgroup support for YARN in a non-secure cluster, and so
LinuxContainerExecutor always on by default
-
Enable cgroups and CPU scheduling for YARN containers by
default
Support for deleting queues without requiring a RM
restart
-
Enhancements to STOP queue handling
Support Side by Side HDFS tarball based install of
multiple Spark auxiliary services in YARN (YARN-1151)
-
Create a log aggregation tool (HAR files) to reduce
NameNode load (YARN-4086)
-
YARN queue ACL support when doAs=false
-
Provide an API in YARN to get queue mapping result before
application submitted
|
Zeppelin |
Core Capabilities
-
Change Zeppelin UI across the board to not display stack
traces
-
Zeppelin should have the option for user name case conversion
(ZEPPELIN-3312)
-
Update to 0.8 release of Zeppelin
Ease of Use
|