Hortonworks Docs
»
Hortonworks Data Platform 3.1.4
»
Using Apache HBase to store and access data
Using Apache HBase to store and access data
Also available as:
Release Notes
Release Notes
HDP 3.1.4 Release Notes
Component Versions
Descriptions of New Features
Deprecation Notices
Terminology
Removed Components and Product Capabilities
Testing Unsupported Features
Descriptions of the Latest Technical Preview Features
Upgrading to HDP 3.1.4
Behavioral Changes
Apache Patch Information
Accumulo
Atlas
Calcite
DataFu
Hadoop
HBase
Hive
Kafka
Knox
Livy
Oozie
Phoenix
Pig
Ranger
Spark
Sqoop
Storm
Tez
Zeppelin
ZooKeeper
Fixed Common Vulnerabilities and Exposures
Fixed Issues
Known Issues
Legal Information
Concepts
Apache Hive Overview
What's new in this release: Apache Hive
Apache Hive 3 architectural overview
Apache Hive 3 upgrade process
Changes after upgrading to Apache Hive 3
Hive Semantic and Syntax Changes
Creating a table
Escaping db.table references
Casting timestamps
Renaming tables
Checking compatibility of column changes
Dropping partitions
Install the Hive service
Apache Hive content roadmap
Data Governance Overview
Apache Atlas Overview
Apache Atlas features
Atlas-Ranger integration
Apache Spark Overview
Analyzing Data with Apache Spark
Apache Zeppelin Overview
Overview
HDP Security Overview
HDP Security Overview
Understanding Data Lake Security
What's New in This Release: Knox
What's New in This Release: Ranger
HDP Security Features
Dynamically Generating Knox Topology Files
Securing Access to Hadoop Cluster: Apache Knox
Apache Knox Gateway Overview
Knox Supported Services Matrix
Apache Kafka Overview
What is new in Apache Kafka 2.0
Building a High-Throughput Messaging System with Apache Kafka
Apache Kafka Concepts
Apache Storm Overview
Analyzing Streams of Data with Apache Storm
Installation & Upgrade
Installation
Apache Ambari Installation
Apache Ambari Installation for IBM Power Systems
Installing Apache Atlas
Migrating Atlas metadata when upgrading to HDP-3.0+
Overview
Migrate Atlas metadata when upgrading to HDP-3.0+
Installing Atlas
Start the installation
Customize services
Authentication settings
Authorization settings
Dependent configurations
Configure identities
Complete the Atlas installation
Install sample Atlas metadata
Installing Apache Spark
Spark prerequisites
Install Spark using Ambari
Verify the Spark configuration for Hive access
Validate the Spark installation
Installing Apache Zeppelin
Install Apache Zeppelin Using Ambari
Enabling HDFS and Configuration Storage for Zeppelin Notebooks in HDP-2.6.3+
Overview
Enable HDFS Storage when Upgrading to HDP-2.6.3+
Use Local Storage when Upgrading to HDP-2.6.3+
Installing Apache Ranger
Installing Ranger Using Ambari Overview
Set Up Hadoop Group Mapping for LDAP/AD
Ranger Password Requirements
Configuring a Database Instance for Ranger
Configure a Ranger DB: MySQL/MariaDB
Configure a Ranger DB: PostgreSQL
Configure a Ranger DB: Oracle
Configure a Ranger DB: Amazon RDS
Start the Ranger Installation
Customize Services: Admin
Customize Services: Audit
Customize Services: Plugins
Customize Services: User Sync
Customize Services: Tagsync
Customize Services: Authentication
Customize Authentication: UNIX
Customize Authentication: LDAP
Customize Authentication: AD
Complete the Ranger Installation
Additional Ranger Plugin Configuration Steps for Kerberos Clusters
Additional Ranger Plugin Steps for Kerberos: HDFS
Additional Ranger Plugin Steps for Kerberos: Hive
Additional Ranger Plugin Steps for Kerberos: HBase
Additional Ranger Plugin Steps for Kerberos: Knox
Installing Apache Ranger KMS
Installing the Ranger Key Management Service
Install Ranger KMS using Ambari (Kerberized Cluster)
Set up Database Users Without Sharing DBA Credentials
Configure HDFS Encryption to use Ranger KMS Access
Use a Kerberos Principal for the Ranger KMS Repository
Installating Apache Knox
Installing Knox Using Ambari Overview
Install Knox
Set Up Knox Proxy
Example: Configure Knox Gateway for YARN UI
Example: Configure Knox Gateway for LDAP
Enable Knox SSO Using the Ambari CLI
Configure Knox SSO for HDFS, Oozie, MapReduce2, Zeppelin, or YARN
Installing and Configuring Apache Storm
Installing Apache Storm
Configuring Apache Storm for a Production Environment
Configuring Storm for Supervision
Configuring Storm Resource Usage
Installing and Configuring Apache Kafka
Installing Kafka
Prerequisites
Installing Kafka Using Ambari
Configuring Kafka for a Production Environment
Preparing the Environment
Operating System Settings
File System Selection
Disk Drive Considerations
Java Version
Ethernet Bandwidth
Customizing Kafka Settings on an Ambari-Managed Cluster
Kafka Broker Settings
Connection Settings
Topic Settings
Log Settings
Compaction Settings
General Broker Settings
Kafka Producer Settings
Important Producer Settings
Kafka Consumer Settings
Configuring ZooKeeper for Use with Kafka
Enabling Audit to HDFS for a Secure Cluster
Upgrade
Apache Ambari Major Upgrade
Apache Ambari Minor Upgrade
Apache Ambari Upgrade for IBM Power Systems
How To
Data Storage & Data OS
Managing Data Operating System
Introduction
Understanding YARN architecture and features
Application Development
Using the YARN REST APIs to Manage Applications
Collect Application Data with the Timeline Server
Timeline Service 2.0 Overview
Architecture of Timeline Service 2.0
Timeline Service 2.0 Installation
Installation Modes for HBase associated with Timeline Service 2.0
Configure External HBase for Timeline Service 2.0
Enable System Service Mode
Enable System Service Mode On a Newly Installed Cluster
Enable System Service Mode On an Upgraded Cluster
Life cycle management of ats-hbase
Remove ats-hbase before switching between clusters
Publish Application-Specific Data
Application Information for Timeline Service 2.0
REST APIs for Querying Timeline Service 2.0
Timeline Server 1.5 Overview
Upgrade Timeline Server 1.0 to 1.5
Configure the Timeline Server
Enable Generic Data Collection
Configure Per-framework Data Collection
Configure the Timeline Server Store
Configure Timeline Server Security
Run the Timeline Server
Access Generic Data from the Command Line
Publish Per-framework Data in Applications
Application Management
Manage Long-running YARN Applications
Using the YARN Services API
YARN Services API Swagger Specification
Deploying and Managing Services and Microservices on YARN
Launch the Service YARN file
Save the YARN file as Service App
Using the Saved YARN file to Assemble a Complex Service
Managing the YARN service life cycle through the REST API
YARN Services API Examples
Use the YARN CLI to View Logs for Running Applications
Run Multiple MapReduce Versions Using the YARN Distributed Cache
Enable Cross-Origin Support on YARN
Run Docker Containers on YARN
Prerequisites for installing Docker
Recommendations for running Docker containers on YARN
Install Docker
Configure Docker
Configure YARN for running Docker containers
Run Docker on YARN using the YARN services API
Accessing the YARN services examples
Quick start for running YARN services API on Docker containers
Configure Docker Swarm and an Overlay Network
Configure Docker settings for YARN
Run the YARN service on a cluster
Run the YARN service on a Kerberized cluster
Run the YARN service on a non-Kerberized cluster
Add a local Docker registry
Test the local Docker registry
Attaching storage to YARN containers by using CSI
YARN components for CSI support
Configure CSI on YARN
Configure CSI volume processor in ResourceManager
Deploy CSI driver on each NodeManager
Configure YARN CSI driver adapters
Advanced: configure customized CSI driver adapters
Cluster Management
Using Scheduling to Allocate Resources
YARN Resource Allocation
Use CPU Scheduling
Configure CPU Scheduling and Isolation
Configure GPU Scheduling and Isolation
Options to Run Distributed Shell and GPU
GPU support for Docker
Enable GPU Support for Docker on an Ambari Cluster
Enable GPU Support for Docker on a non-Ambari Cluster
Limit CPU Usage with Cgroups
Enable Cgroups
Using Cgroups
Managing Device Plug-ins (Technical Preview)
Use the device plug-in
Prerequisites for using the device plug-in
Enable the device plug-in framework
Configure the device plug-in
Restart YARN and run a job
Node Manager API to query resource allocation
Develop a device plug-in
Prerequisites for developing a device plug-in
Define the device plug-in interface
Implement the DevicePluginScheduler interface
Package the plug-in
Partition a Cluster Using Node Labels
Configure Node Labels
Use Node Labels
Allocating Resources with the Capacity Scheduler
Capacity Scheduler Overview
Enable the Capacity Scheduler
Set up Queues
Hierarchical Queue Characteristics
Scheduling Among Queues
Control Access to Queues with ACLs
Define Queue Mapping Policies
Configure Queue Mapping for Users and Groups to Specific Queues
Configure Queue Mapping for Users and Groups to Queues with the Same Name
Enable Override of Default Queue Mappings
Manage Cluster Capacity with Queues
Set Queue Priorities
Resource Distribution Workflow
Resource Distribution Workflow Example
Set User Limits
Application Reservations
Set Flexible Scheduling Policies
Examples of FIFO and Fair Sharing Policies
Configure Queue Ordering Policies
Best Practices for Ordering Policies
Start and Stop Queues
Set Application Limits
Enable Preemption
Preemption Workflow
Configure Preemption
Enable Priority Scheduling
Configure ACLs for Application Priorities
Enable Intra-Queue Preemption
Properties for Configuring Intra-Queue Preemption
Intra-Queue Preemption Based on Application Priorities
Intra-Queue Preemption based on User Limits
Monitoring Clusters using YARN Web User Interface
Accessing YARN Web User Interface
Monitoring Clusters
Monitoring Queues
Monitoring Applications
Searching an application
Viewing application details
Managing and Monitoring Services
Create New Services
Create a standard service
Create a custom service
Monitoring Flow Activity
Monitoring Nodes
Monitoring GPU metrics
Tools
Fault Tolerance
Configure Work-preserving Restart
Configure the ResourceManager for Work-preserving Restart
Configure NodeManagers for Work-preserving Restart
Scaling Namespaces and Optimizing Data Storage
Introduction
Overview of Apache HDFS
Scaling namespaces
Scaling a cluster using HDFS federation
Federation terminology
Benefits of an HDFS Federation
Configure an HDFS federation
Format NameNodes
Add a NameNode to an existing HDFS cluster
Configure a federation with a cluster upgrade
Cluster management operations
Balance data in a federation
Decommission a DataNode from a federation
Using cluster web console to monitor a federation
Using ViewFs to manage multiple namespaces
Namespace view in a non-federated environment
Namespace view in a federation
Pathnames on clusters with federated and non-federated NameNodes
Considerations for working with ViewFs mount table entries
Example of ViewFs mount table entries
Optimizing data storage
Balancing data across disks of a DataNode
Plan the data movement across disks
Parameters to configure the Disk Balancer
Execute the Disk Balancer plan
Disk Balancer commands
Increasing storage capacity with HDFS erasure coding
Benefits of erasure coding
How the DataNode recovers failed erasure-coded blocks
Erasure coding policies
Limitations of erasure coding
Effect of erasure coding on existing data
Considerations for deploying erasure coding
Erasure coding CLI command
Erasure coding examples
Increasing storage capacity with HDFS compression
Enable GZipCodec as the default compression codec
Use GZipCodec with a one-time job
Setting archival storage policies
HDFS storage types
HDFS storage policies
Configure archival storage
Commands for configuring storage policies
The HDFS mover command
Balancing data across an HDFS cluster
Why HDFS data Becomes unbalanced
Configurations and CLI options for the HDFS Balancer
Properties for configuring the Balancer
Balancer commands
Recommended configurations for the Balancer
Cluster balancing algorithm
Storage group classification
Storage group pairing
Block move scheduling
Block move execution
Exit statuses for the HDFS Balancer
Optimizing performance
Improving performance with centralized cache management
Benefits of centralized cache management in HDFS
Use cases for centralized cache management
Centralized cache management architecture
Caching terminology
Properties for configuring centralized caching
Commands for using cache pools and directives
Configuring HDFS rack awareness
Create a rack topology script
Add the topology script property to core-site.xml
Restart HDFS and MapReduce services
Verify rack awareness
Customizing HDFS
Customize the HDFS home directory
Properties to set the size of the NameNode edits directory
Optimizing NameNode disk space with Hadoop archives
Overview of Hadoop archives
Hadoop archive components
Create a Hadoop archive
List files in Hadoop archives
Format for using Hadoop archives with MapReduce
Detecting slow DataNodes
Enable disk IO statistics
Enable detection of slow DataNodes
Allocating DataNode memory as storage (Technical Preview)
HDFS storage types
LAZY_PERSIST memory storage policy
Configure DataNode memory as storage
Improving performance with short-circuit local reads
Prerequisites for configuring short-ciruit local reads
Properties for configuring short-circuit local reads on HDFS
Using the NFS Gateway for accessing HDFS
Configure the NFS Gateway
Start and stop the NFS Gateway services
Verify validity of the NFS services
Access HDFS from the NFS Gateway
How NFS Gateway authenticates and maps users
Using the NFS Gateway with ViewFs
Export ViewFs mounts using the NFS Gateway
Data storage metrics
Using JMX for accessing HDFS metrics
Configure the G1GC garbage collector (Technical Preview)
Recommended settings for G1GC
Switching from CMS to G1GC
APIs for accessing HDFS
Set up WebHDFS on a secure cluster
Administering HDFS
Cluster Maintenance
Decommissioning slave nodes
Prerequisites to decommission slave nodes
Decommission DataNodes or NodeManagers
Decommission DataNodes
Decommission NodeManagers
Decommission HBase RegionServers
Manually add slave nodes to an HDP cluster
Prerequisites to manually add slave nodes
Add slave nodes
Add HBase RegionServer
Using DistCp to Copy Files
Using DistCp
Command Line Options
Update and Overwrite
DistCp and Security Settings
Secure-to-Secure: Kerberos Principal Name
Secure-to-Secure: ResourceManager mapping rules
DistCp between HA clusters
DistCp and HDP version
DistCp data copy matrix
Copying Data from HDP-2.x to HDP-1.x Clusters
DistCp Architecture
DistCp Driver
Copy-listing Generator
InputFormats and MapReduce Components
DistCp Frequently Asked Questions
DistCp additional considerations
Ports and Services Reference
Configuring ports
Accumulo service ports
Atlas service ports
Druid service ports
Flume service ports
HBase service ports
HDFS service ports
Hive service ports
Hue service port
Kafka service ports
Kerberos service ports
Knox service ports
MapReduce service ports
MySQL service ports
Oozie service ports
Ranger service ports
Sqoop service ports
Storm service ports
Tez ports
YARN service ports
Zeppelin service port
ZooKeeper service ports
Controlling HDP services manually
Starting HDP services
Stopping HDP services
Configuring Fault Tolerance
Configuring Fault Tolerance
High Availability on Non-Ambari Clusters
Configuring High Availability for the Hive Metastore
Use Cases and Failover Scenarios
Software Configuration
Install Hortonworks Data Platform
Update the Hive Metastore
Validate configuration
Deploying Multiple HiveServer2 Instances for High Availability
Adding an Additional HiveServer2 to Your Cluster Manually
Adding an Additional HiveServer2 to a Cluster with Ambari
Configuring HiveServer2 High Availability Using ZooKeeper
How ZooKeeper Manages HiveServer2 Requests
Dynamic Service Discovery Through ZooKeeper
Rolling Upgrade for HiveServer2 Through ZooKeeper
Set Configuration Parameters for HiveServer2 Rolling Upgrade
Perform Rolling Upgrade for HiveServer2
Perform Rollback of HiveServer2
Configuring High Availability for HBase
Introduction to HBase High Availability
Propagating Writes to Region Replicas
Timeline Consistency
Configuring HA Reads for HBase
Creating Highly Available HBase Tables with the HBase Java API
Creating Highly Available HBase Tables with the HBase Shell
Querying Secondary Regions
Monitoring Secondary Region Replicas
HBase Cluster Replication for Geographic Data Distribution
HBase Cluster Replication Overview
HBase Cluster Topologies
Managing and Configuring HBase Cluster Replication
Manually Enable HBase Replication
Pause and Stop HBase Replication
HBase Cluster Management Commands
Verifying Replicated HBase Data
HBase Cluster Replication Details
Spreading Queue Failover Load
Preserving Tags During Replication
HBase Replication Internals
Choosing RegionServers to Replicate to
Keeping Track of Logs
Reading, Filtering, and Sending Edits
Cleaning Logs
RegionServer Failover
HBase Replication Metrics
Replication Configuration Options
Monitoring Replication Status
Setting Up HBase Replication Among Kerberos Secured Clusters
Configuring NameNode High Availability
NameNode Architecture
Preparing the Hardware Resources for NameNode High Availability
Deploying the NameNode HA Cluster
Configuring the NameNode HA Cluster
Deploying a NameNode HA Cluster
Deploying Hue with an HA Cluster
Deploying Oozie with an HA Cluster
Operating a NameNode HA cluster
Configuring and Deploying NameNode Automatic Failover
Prerequisites for Configuring NameNode Automatic Failover
Configure and Deploy Automatic Failover
Configure Oozie Failover
Administrative Commands
Configuring ResourceManager High Availability
Preparing the Hardware Resources
Deploying ResourceManager HA Cluster
Configuring Manual or Automatic ResourceManager Failover
Deploying the ResourceManager HA Cluster
Minimum Settings for Automatic ResourceManager HA Configuration
Testing ResourceManager HA on a Single Node
Deploying Hue with a ResourceManager HA Cluster
Configuring Apache Ranger High Availability
Configuring Ranger Admin HA
Prerequisites for Configuring Apache Ranger for High Availability
Configuring Ranger Admin HA Without SSL
Configuring Ranger Admin HA With SSL
Data Protection
Preventing Accidental Deletion of Files
Backing Up HDFS Metadata
Introduction to HDFS Metadata Files and Directories
Files and Directories
NameNodes
JournalNodes
DataNodes
HDFS Commands
Configuration Properties
Back Up HDFS Metadata
Prepare to Back Up the HDFS Metadata
Perform a Backup of the HDFS Metadata
Using HDFS snapshots for data protection
Considerations for working with HDFS snapshots
Enable snapshot creation on a directory
Create snapshots on a directory
Recover data from a snapshot
Options to determine differences between contents of snapshots
Snapshot operations
Data Access
Starting Apache Hive
Start a Hive shell as the hive user
Start Hive as an end user
Run a Hive command
Convert Hive CLI scripts to Beeline
Using Apache Hive
Apache Hive 3 tables
Create a CRUD transactional table
Create an insert-only transactional table
Create, use, and drop an external table
Drop an external table along with data
Using constraints
Determine the table type
Altering tables from flat to transactional
Alter a table from flat to transactional
Hive 3 ACID transactions
Using materialized views
Create and use a materialized view
Use a materialized view in a subquery
Drop a materialized view
Show materialized views
Describe a materialized view
Manage rewriting of a query
Create a materialized view and store it in Druid
Create and use a partitioned materialized view
Apache Hive queries
Query the information_schema database
Insert data into an ACID table
Update data in a table
Merge data in tables
Delete data from a table
Create a temporary table
Configure temporary table storage
Use a subquery
Subquery restrictions
Aggregate and group data
Query correlated data
Using common table expressions
Use a CTE in a query
Escape an illegal identifier
Escape table names in dot notation
CHAR data type support
Create partitions dynamically
Repair partitions using MSCK repair
Manage partitions automatically
Automate partition discovery and repair
Manage partition retention time
Generate surrogate keys
Query a SQL data source using the JdbcStorageHandler
Creating a user-defined function
Set up the development environment
Create the UDF class
Build the project and upload the JAR
Register the UDF
Call the UDF in a query
Managing Apache Hive
Set up multiple HiveServer Interactives for high availability
ACID operations
Configure partitions for transactions
View transactions
View transaction locks
Data compaction
Initiate compaction
View compaction progress
Disable compaction
Vectorized query execution
Enable vectorization
Check query execution
Managing Apache Hive Workloads
Workload management
Setting up and using a resource plan
Create a resource plan
Enable a resource plan
Activate a resource plan
Query sys database for plan data
Disable a resource plan
Configure a YARN queue for workload management
Workload management entity data in sys
Configuring Apache Hive
Configure the Apache Hive Metastore
Limit concurrent connections
Securing Apache Hive
Authorizing Apache Hive Access
HDFS ACL permissions model
Configure storage-based authorization
Authorization configuration parameters
Storage-based operation permissions
Transactional table access
External table access
Apache Spark access to Apache Hive
Remote data access
Secure HiveServer using LDAP
Secure HiveServer using LDAP over SSL
Secure LLAP in HiveServer
Connections to Apache Hive
Integrating Apache Hive with Spark and BI
Apache Hive-Kafka integration
Create a table for a Kafka stream
Querying Kafka data
Query live data from Kafka
Perform ETL by ingesting data from Kafka into Hive
Writing data to Kafka
Write transformed Hive data to Kafka
Set consumer and producer properties as table properties
Kafka storage handler and table properties
Hive Warehouse Connector for accessing Apache Spark data
Apache Spark-Apache Hive connection configuration
Zeppelin configuration for using the Hive Warehouse Connector
Submit a Hive Warehouse Connector Scala or Java application
Submit a Hive Warehouse Connector Python app
Hive Warehouse Connector supported types
HiveWarehouseSession API operations
Catalog operations
Read and write operations
Close HiveWarehouseSession operations
Use the Hive Warehouse Connector for streaming
Hive Warehouse Connector API Examples
Hive Warehouse Connector Interfaces
Connecting Apache Hive to BI tools
Locate the JDBC or ODBC driver
Specify the JDBC connection string
JDBC connection string syntax
Query a SQL data source using the JdbcStorageHandler
Visualizing Apache Hive data using Superset
Add the Superset service
Connect Apache Hive to Superset
Configure a Superset visualization
Migrating Data
Data migration to Apache Hive
Moving data from databases to Apache Hive
Create a Sqoop import command
Import RDBMS data into Hive
Moving data from HDFS to Apache Hive
Import RDBMS data to HDFS
Convert an HDFS file to ORC
Incrementally update an imported table
Hive import command options
Apache Hive Performance Tuning
Optimizing an Apache Hive data warehouse
LLAP ports
Preparations for tuning performance
Setting up LLAP
Enable YARN preemption
Enable interactive query
Set up multiple HiveServer Interactives for high availability
Configure an llap queue
Add a Hive proxy
Configure other LLAP properties
Configure the HiveServer heap size
Save LLAP settings and restart services
Run an interactive query
Use HiveServer Interactive UI
Connect a JDBC client to LLAP
Configuring YARN queues for Hive
Configure a queue for batch processing
Configure a custom LLAP queue
Set up multiple HiveServer instances
Key components of Hive warehouse processing
Query result cache and metastore cache
Tez execution engine properties
Monitoring Apache Hive performance
Monitoring LLAP resources
Maximizing storage resources using ORC
Advanced ORC properties
Improving performance using partitions
Handling bucketed tables
Improving performance using the cost-based optimizer
Set up the cost-based optimizer and statistics
Generate and view Apache Hive statistics
Statistics generation and viewing commands
Optimization and planning properties
Accessing Data Using Druid
Apache Druid introduction
Apache Druid architectural overview
Apache Druid content roadmap
Setting up and using Apache Druid
Set up a database
Add Apache Druid to the cluster
Configure and deploy Apache Druid
Ingest data into Apache Druid
Query Apache Druid
Configure Apache Druid for high availability
Visualizing Druid data in Superset
Add Superset to the cluster
Visualize data using Superset
Building a dashboard
Apache Superset aggregations
Securing Apache Druid using Kerberos
Enable Kerberos authentication in Apache Druid
Access Kerberos-protected HTTP endpoints
Using Druid and Apache Hive
Accelerating Hive queries using Druid
How Druid indexes Hive data
Transform Apache Hive Data to Druid
Anatomy of a Hive-to-Druid data transformation
Create a Hive materialized view, store it in Druid
Druid and Hive tuning
Using HBase to Store and Access Data
What's New in Apache HBase
Overview of Apache HBase
Apache HBase installation
Installing HBase through Ambari
HBase cluster capacity planning
Configuring HBase cluster for the first time
Increase the request handler thread count
Configure the size and number of WAL files
Configure compactions
Considerations for splitting tables
Tune JVM garbage collection in RegionServers
Node count and JVM configuration
Physical size of the data
Read-Write Throughput
Options to increase HBase Region count and size
Increasing MemStore size for RegionServer
Increasing the size of Region
Enable multitenancy with namespaces
Default HBase namespace actions
Define and drop namespaces
Security features that are available
Managing Apache HBase clusters
Monitoring Apache HBase clusters through Grafana-based dashboard
Optimizing Apache HBase I/O
HBase I/O components
Configuring BlockCache
Configure On-Heap BlockCache
Guidelines for configuring On-Heap BlockCache (LruBlockCache)
Prerequisites to configure Off-Heap Memory (BucketCache)
Configure BucketCache
BlockCache compression
Enable BlockCache compression
BlockCache-and-MemStore-Properties
Import data into HBase with Bulk load
Using Snapshots in HBase
Configure a Snapshot
Take a Snapshot
List Snapshots
Delete Snapshots
Clone a table from a Snapshot
Restore a Snapshot
Snapshot Operations and ACLs
Export data to another cluster
Backing up and restoring Apache HBase datasets
Planning a backup-and-restore Strategy for your environment
Backup within a Cluster
Backup to the dedicated HDFS archive cluster
Backup to the Cloud or a Storage vendor
Best practices for backup-and-restore
Running the backup-and-restore utility
Create and maintain a complete backup image
Command for creating HBase backup image
Monitor backup progress
Using backup sets
Restore a backup image
Administering and deleting backup images
HBase backup commands
Incremental backup-and-restore
Example scenario: Safeguarding application datasets on Amazon S3
Medium Object (MOB) storage support in Apache HBase
Methods to enable MOB storage support
Method 1:Enable MOB Storage support using configure options in the command line
Method 2: Invoke MOB support parameters in a Java API
Test the MOB storage support configuration
MOB storage cache properties
Method 1: Enter property settings using Ambari
Method 2: Enter property settings directly in the hbase-site.xml file
MOB cache properties
HBase quota management
Setting up quotas
General Quota Syntax
Throttle quotas
Throttle quota examples
Space quotas
Quota enforcement
Quota violation policies
Impact of quota violation policy
Live Write Access
Bulk Write Access
Read Access
Metrics and Insight
Examples of overlapping quota policies
Number-of-Tables Quotas
Number-of-Regions Quotas
Understanding Apache HBase Hive integration
Prerequisites
Configuring HBase and Hive
Using HBase Hive integration
HBase Hive integration example
Using Hive to access an existing HBase table example
Understanding Bulk Loading
Understanding HBase Snapshots
HBase Best Practices
Using Phoenix to Store and Access Data
What's New in Apache Phoenix
Orchestrating SQL and APIs with Apache Phoenix
Enable Phoenix and interdependent components
Thin Client connectivity with Phoenix Query Server
Secure authentication on the Phoenix Query Server
Options to obtain a client driver
Obtaining a driver for application development
Creating and using User-Defined functions (UDFs) in Phoenix
Overview of mapping Phoenix schemas to HBase namespaces
Enable namespace mapping
Namespace mapping properties in the hbase-site.xml file
Overview to managing schemas
Associating tables of a schema to a namespace
Associating table in a noncustomized environment without Kerberos
Associating table in a customized Kerberos environment
Understanding Apache Phoenix-spark connector
Connect to secured cluster
Considerations for setting up spark
Phoenix Spark connector usage examples
Reading Phoenix tables
Saving Phoenix tables
Using PySpark to READ and WRITE tables
Limitations of Apache Phoenix-spark connector
Understanding Apache Phoenix-Hive connector
Considerations for setting up Hive
Apache Phoenix-Hive usage examples
Limitations of Phoenix-Hive connector
Python library for Apache Phoenix
Example of Phoenix Python library
Using index in Phoenix
Global indexes in Phoenix
Local indexes in Phoenix
Using Phoenix client to load data
Phoenix repair tool
Run the Phoenix repair tool
Accessing Cloud Data
Streaming
Developing Apache Storm Applications
Developing Apache Storm Applications
Core Storm Concepts
Spouts
Bolts
Stream Groupings
Topologies
Processing Reliability
Workers, Executors, and Tasks
Parallelism
Core Storm Example: RollingTopWords Topology
Trident Concepts
Introductory Example: Trident Word Count
Trident Operations
Filters
Functions
Trident Aggregations
CombinerAggregator
ReducerAggregator
Aggregator
Trident State
Trident Spouts
Achieving Exactly-Once Messaging in Trident
Further Reading about Trident
Moving Data Into and Out of a Storm Topology
Implementing Windowing Computations on Data Streams
Understanding Sliding and Tumbling Windows
Implementing Windowing in Core Storm
Understanding Tuple Timestamps and Out-of-Order Tuples
Understanding Watermarks
Understanding the "at-least-once" Guarantee
Saving the Window State
Implementing Windowing in Trident
Trident Windowing Implementation Details
Sample Trident Application with Windowing
Implementing State Management
Checkpointing
Recovery
Guarantees
Implementing Custom Actions: IStateful Bolt Hooks
Implementing Custom States
Implementing Stateful Windowing
Sample Topology with Saved State
Using Storm to Move Data
Moving Data Into and Out of Apache Storm Using Spouts and Bolts
Ingesting Data from Kafka
KafkaSpout Integration: Core Storm APIs
KafkaSpout Integration: Trident APIs
Tuning KafkaSpout Performance
Configuring Kafka for Use with the Storm-Kafka Connector
Configuring KafkaSpout to Connect to HBase or Hive
Ingesting Data from HDFS
Configuring HDFS Spout
HDFS Spout Example
Streaming Data to Kafka
KafkaBolt Integration: Core Storm APIs
KafkaBolt Integration: Trident APIs
Writing Data to HDFS
Storm-HDFS: Core Storm APIs
Storm-HDFS: Trident APIs
Writing Data to HBase
Writing Data to Hive
Core-storm APIs
Trident APIs
Configuring Connectors for a Secure Cluster
Configuring KafkaSpout for a Secure Kafka Cluster
Configuring Storm-HDFS for a Secure Cluster
Configuring Storm-HBase for a Secure Cluster
Configuring Storm-Hive for a Secure Cluster
Working with Storm Topologies
Packaging Storm Topologies
Deploying and Managing Apache Storm Topologies
Configuring the Storm UI
Using the Storm UI
Monitoring and Debugging an Apache Storm Topology
Enabling Dynamic Log Levels
Setting and Clearing Log Levels Using the Storm UI
Setting and Clearing Log Levels Using the CLI
Enabling Topology Event Logging
Configuring Topology Event Logging
Enabling Event Logging
Viewing Event Logs
Accessing Event Logs on a Secure Cluster
Disabling Event Logs
Extending Event Logging
Enabling Distributed Log Search
Dynamic Worker Profiling
Tuning an Apache Storm Topology
Mirroring Data Across Clusters with Kafka MirrorMaker
Mirroring Data Between Clusters: Using the MirrorMaker Tool
Running MirrorMaker
Checking Mirroring Progress
Avoiding Data Loss
Running MirrorMaker on Kerberos-Enabled Clusters
Creating Kafka Topics
Creating a Kafka Topic
Developing Kafka Producers and Consumer
Developing Kafka Producers and Consumers
Governance
Using Apache Atlas
Searching and Viewing Entities
Using Basic and Advanced Search
Using Basic Search
Using Advanced Search
Save searches
View entity data lineage and impact
View entity details
Manually create entities
Working with Atlas Classifications
Create Atlas classifications
Associate classifications with entities
Propagate classifications to derived entities
Search for entities associated with classifications
Managing Business Terms with Atlas Glossaries
Overview
Create a glossary
Create glossary terms
Associate glossary terms with entities
Associate glossary terms with related terms
Search for entities associated with terms
Create glossary categories
Assign terms to categories
Apache Atlas REST API
Configuring Apache Atlas
Configure Atlas High Availability
Configuring Apache Atlas Security
Enable the Ranger plugin
Configure Atlas Tagsync in Ranger
Additional requirements for Atlas with Ranger and Kerberos
Enable Atlas HTTPS
Hive CLI security
Configure the Knox proxy for Atlas
Configuring Atlas Authorization
Data Science
Configuring Apache Spark
Configuring the Spark Thrift Server
Configuring the Livy Server
Configuring the Spark History Server
Configuring Dynamic Resource Allocation
Customize dynamic resource allocation settings on Ambari
Manually configure dynamic resource allocation
Configure a job for dynamic resource allocation
Dynamic resource allocation properties
Configuring Spark for Wire Encryption
Configuring Spark for a Kerberos-enabled Cluster
Configure the Spark history server
Configure the Spark Thrift server
Set up access for submitting jobs
Running Apache Spark Applications
Introduction
Running Sample Spark Applications
Running Spark in Docker Containers on YARN
Submitting Spark Applications Through Livy
Using Livy with Spark
Using Livy with interactive notebooks
Using the Livy API to run Spark jobs: overview
Running an Interactive Session With the Livy API
Livy Objects for Interactive Sessions
Set Path Variables for Python
Livy API Reference for Interactive Sessions
Submitting Batch Applications Using the Livy API
Livy Batch Object
Livy API Reference for Batch Jobs
Running PySpark in a Virtual Environment
Automating Spark Jobs with Oozie Spark Action
Developing Apache Spark Applications
Introduction
Using the Spark DataFrame API
Using Spark SQL
Access Spark SQL through the Spark shell
Access Spark SQL through JDBC or ODBC: prerequisites
Access Spark SQL through JDBC
Accessing Spark SQL through ODBC
Using the Hive Warehouse Connector with Spark
Calling Hive User-Defined Functions
Using Spark Streaming
Building and Running a Secure Spark Streaming Job
Running Spark Streaming Jobs on a Kerberos-Enabled Cluster
Sample pom.xml File for Spark Streaming with Kafka
HBase Data on Spark with Connectors
Selecting a Connector
Using the Connector with Apache Phoenix
Accessing HDFS Files from Spark
Accessing ORC Data in Hive Tables
Access ORC files from Spark
Predicate Push-Down Optimization
Load ORC Data into DataFrames Using Predicate Push-Down
Optimize Queries Using Partition Pruning
Enable Vectorized Query Execution
Read Hive ORC Tables
Additional Resources
Using Custom Libraries with Spark
Using Spark from R: SparkR
Tuning Apache Spark
Introduction
Provisioning Hardware
Check Job Status
Check Job History
Improving Software Performance
Configuring Apache Zeppelin
Configure Livy on an Ambari-Managed Cluster
Configure User Impersonation for Access to Hive
Configure User Impersonation for Access to Phoenix
Configuring Apache Zeppelin Security
Introduction
Getting Started
Configure Zeppelin for Authentication: Non-Production Use
Configure Zeppelin for Authentication: LDAP and Active Directory
Configuring Authentication for Production Using Active Directory
Configuring Authentication for Production Using LDAP
Enabling Access Control for Zeppelin Elements
Enable Access Control for Interpreter, Configuration, and Credential Settings
Enable Access Control for Notebooks
Enable Access Control for Data
Configure SSL for Zeppelin
Configure Zeppelin for a Kerberos-Enabled Cluster
Shiro Settings: Reference
Active Directory Settings
LDAP Settings
General Settings
shiro.ini Example
Using Apache Zeppelin
Introduction
Launch Zeppelin
Working with Zeppelin Notes
Create and Run a Note
Import a Note
Export a Note
Using the Note Toolbar
Import External Packages
Configuring and Using Zeppelin Interpreters
Modify interpreter settings
Using Zeppelin Interpreters
Customize interpreter settings in a note
Use the JDBC interpreter to access Hive
Use the JDBC interpreter to access Phoenix
Use the Livy interpreter to access Spark
Using Spark Hive Warehouse and HBase Connector Client .jar files with Livy
IBM Data Science Experience
Security
Configuring Proxy with Apache Knox
Set Up Knox Proxy
Example: Configure Knox Gateway for YARN UI
Example: Configure Knox Gateway for LDAP
Configuring the Knox Gateway
Change the Master Secret
Manually Redeploy Cluster Topologies
Enable WebSockets
Audit Gateway Activity
Audit Log Fields
Change Roll Frequency of the Audit Log
Configuring Storm Plugin Audit Log to File
Manually Configuring Knox Topology Files
Defining Cluster Topologies
Configuring a Server for Knox
Set up Service URLs (Proxy a Service)
Validate Service Connectivity
Add a New Service to the Knox Gateway
Mapping the Internal Nodes to External URLs
Set Up a Hostmap Provider
Example of an EC2 Hostmap Provider
Example of Sandbox Hostmap Provider
Enable Hostmap Debugging
Configuring an Authentication Provider
Set Up LDAP Authentication
Configuring Advanced LDAP Authentication
Using Advanced LDAP Authentication
Advanced LDAP Configuration Parameters
Advanced LDAP Configuration Combinations
Advanced LDAP Authentication Addendum
Configure LDAP Authentication Caching
Example Active Directory Configuration
Example OpenLDAP Configuration
Setting Up SPNEGO Authentication
Setting up PAM Authentication
Test an LDAP Provider
Test HTTP Header Tokens
Setting Up 2-Way SSL Authentication
Configuring a Federation Provider
Set Up HeaderPreAuth Federation Provider
Setting up JWT Federation Provider
Setting up Pac4j Federation Provider
Set up SSOCookieProvider Federation Provider
Configuring Identity Assertion
Define a Default Identity Assertion Provider
Map Authenticated Users to Other Users
Mapping Authenticated Users to Groups
Concat Identity Assertion Provider
Hadoop Group Lookup Identity Assertion Provider
Using GroupMappingServiceProvider to Configure Group Mapping
Regular Expression Identity Assertion Provider
SwitchCase Identity Assertion Provider
Configuring Group Mapping
Set Up an Authorization Provider
Setting Up Knox Services for HA
HA Prerequisites
Configure WebHDFS for Knox (HA)
Configure Knox for HA
Configuring Knox With Kerberos
Configuring Authentication with Kerberos
Configuring Authentication with Kerberos
Kerberos Overview
Kerberos Principals Overview
Enabling SPNEGO Authentication for Hadoop
Set Up Kerberos for Ambari Server
Configure HTTP Authentication for HDFS, YARN, MapReduce2, HBase, Oozie, Falcon, and Storm
Enable Browser Access to a SPNEGO-enabled Web UI
Enabling Kerberos Authentication Using Ambari
Checklist: Installing and Configuring the KDC
Optional: Install a new MIT KDC
Optional: Use an Existing IPA
Install the JCE for Kerberos
Enabling Kerberos Security
Create Mappings Between Principals and UNIX Usernames
Running the Kerberos Security Wizard
Launch the Kerberos Wizard (Automated Setup)
Launch the Kerberos Wizard (Manual Setup)
Update KDC Admin Credentials
Customizing the Attribute Template
Disable Kerberos Security
Configuring HDP Components for Kerberos
Configuring Kafka for Kerberos
Kerberos for Kafka Prerequisites
Configuring the Kafka Broker for Kerberos
Create Kafka Topics
Produce Events or Messages to Kafka on a Secured Cluster
Consume Events or Messages from Kafka on a Secured Cluster
Authorizing Access when Kerberos is Enabled
Appendix: Kerberos Kafka Configuration Options
Server.properties key value pairs
JAAS Configuration File for the Kafka Server
Configuration Setting for the Kafka Producer
JAAS Configuration File for the Kafka Client
Configuring Storm for Kerberos
Kerberos for Storm Prerequisites
Designating a Storm Client Node
Dedicate or Use an Existing Gateway Node
Use an Existing Storm Node
Running Storm Commands
Running Workers as Users
Accessing the Storm UI
Accessing the Storm UI Active Directory Trust Configuration
Kerberos Storm Security Properties
Known Issues with Storm for Kerberos
Securing Apache HBase in a production environment
Installing Apache HBase with Kerberos on an existing HDP cluster
Verify if kerberos is enabled for HBase
Access Kerberos-enabled HBase cluster using a Java client
Download configurations
Set up client account
Create the Java client
Configuring Ambari Authentication with LDAP/AD
Configuring Ambari Authentication for LDAP/AD
Configuring Ambari to authenticate external users
Preparing for LDAPS integration
Active Directory LDAP setup example
FreeIPA LDAP setup example
Generic, Open LDAP setup example
Synchronize LDAP Users and Groups
LDAP Authentication and Authorization Testing
Configuring Ranger Authentication with UNIX, LDAP, or AD
Configuring Ranger Authentication with UNIX, LDAP, or AD
Configure Ranger Authentication for UNIX
Configure Ranger Authentication for AD
Configure Ranger Authentication for LDAP
Ranger AD Integration
Ranger UI Authentication
Ranger UI Authorization
Ranger Usersync
Ranger User Management
Known Issue: Ranger Group Mapping
Configuring Knox SSO
Setting Up Knox SSO
Configuring an Identity Provider (IdP)
Configuring an LDAP/AD Identity Provider (IdP)
Configuring an Okta Identity Provider (IdP)
Enable Knox SSO Using the Ambari CLI
Configure Knox SSO for HDFS, Oozie, MapReduce2, Zeppelin, or YARN
Providing Authorization with Ranger
Using Ranger to Provide Authorization in Hadoop
Ranger Policies Overview
Ranger Tag-Based Policies
Tags and Policy Evaluation
Apache Ranger Access Conditions
Using the Ranger Console
Opening and Closing the Ranger Console
Ranger Console Navigation
Resource-Based Services and Policies
Configuring Resource-Based Services
Configure a Resource-based Service: HBase
Configure a Resource-based Service: HDFS
Configure a Resource-based Service: Hive
Configure a Resource-based Service: Kafka
Configure a Resource-based Service: Knox
Configure a Resource-based Service: Solr
Configure a Resource-based Service: Storm
Configure a Resource-based Service: YARN
Configure a Resource-based Service: Atlas
Configure a Resource-based Service: NiFi
Configure a Resource-based Service: NiFi Registry
Configuring Resource-Based Policies
Configure a Resource-based Policy: HBase
Configure a Resource-based Policy: HDFS
Configure a Resource-based Policy: Hive
Configure a Resource-based Policy: Kafka
Configure a Resource-based Policy: Knox
Configure a Resource-based Policy: Solr
Configure a Resource-based Policy: Storm
Configure a Resource-based Policy: YARN
Configure a Resource-based Policy: Atlas
Configure a Resource-based Policy: NiFi
Configure a Resource-based Policy: NiFi Registry
Wildcards and Variables in Resource-based Policies
Importing and Exporting Resource-Based Policies
Import Resource-Based Policies for a Specific Service
Import Resource-Based Policies for All Services
Export Resource-Based Policies for a Specific Service
Export All Resource-Based Policies for All Services
Row-level Filtering and Column Masking in Hive
Row-level Filtering in Hive with Ranger Policies
Dynamic Resource-Based Column Masking in Hive with Ranger Policies
Dynamic Tag-Based Column Masking in Hive with Ranger Policies
Tag-Based Services and Policies
Adding a Tag-based Service
Adding Tag-Based Policies
Using Tag Attributes and Values in Ranger Tag-Based Policy Conditions
Adding a Tag-Based PII Policy
Default EXPIRES ON Tag Policy
Importing and Exporting Tag-Based Policies
Import Tag Based Policies
Export Tag-Based Policies
Create a Time-bound Policy
Ranger Security Zones
Overview
Adding a Ranger Security Zone
Administering Ranger Users, Groups, and Permissions
Add a User
Edit a User
Delete a User
Add a Group
Edit a Group
Delete a Group
Add/Edit Permissions
Administering Ranger Reports
View Ranger Reports
Search Ranger Reports
Export Reports
Adding a New Component to Apache Ranger
Configuring Advanced Authorization Settings
Developing a Custom Authorization Module
Special Requirements for High Availability Environments
Configure Advanced Usersync Settings
Configure User Sync LDAP SSL
Set Up Database Users Without Sharing DBA Credentials
Updating Ranger Admin Passwords
Ranger Password Requirements
Managing Auditing
Audit Overview
Manually Enabling Audit Settings in Ambari Clusters
Manually Update Ambari Solr Audit Settings
Manually Update HDFS Audit Settings for Ambari installs
Managing Auditing in Ranger
View Audit Details
Differentiate Events from Multiple Clusters
Using Apache Solr for Ranger Audits
Using Apache Solr for Ranger Audits: Prerequisites
Migrating Audit Logs from DB to Solr in Ambari Clusters
Install Externally-Managed SolrCloud
Configure Externally-Managed SolrCloud
Configure Externally-Managed Solr Standalone
Configuring SolrCloud for Kerberos
Configure Kerberos for SolrCloud
Configure SolrCloud for Kerberos
Connecting to Kerberos enabled SolrCloud
Create Read-Only Admin User (Auditor)
Configuring Wire Encryption
Wire Encryption
Enable RPC Encryption
Enable Data Transfer Protocol
Enabling SSL Understanding the Hadoop SSL Keystore Factory
Creating and Managing SSL Certificates
Obtain a Certificate from a Trusted Third Party Certification Authority CA
Create and Set Up an Internal CA OpenSSL
Install Certificates in the Hadoop SSL Keystore Factory (HDFS, MapReduce, and YARN)
Installing Certificates in the Hadoop SSL Keystore Factory Using a CA Signed Certificate
Enabling SSL for HDP Components
Enable SSL for WebHDFS, MapReduce, Tez, and YARN
Enable SSL for HttpFS
Enable SSL on Oozie
Configure the Oozie Client to Connect Using SSL
Connect to the Oozie Web UI Using SSL
Configure Oozie HCatalogJob Properties
Enable SSL on the HBase REST Server
Enable SSL on the HBase Web UI
Enable SSL on HiveServer2
Set up SSL with self-signed certificates
Selectively disable SSL protocol versions
Enable SSL for Kafka Clients
Configuring the Kafka Broker
Configuring Kafka Producer and Kafka Consumer
Enable SSL for Accumulo
Generating a Certificate Authority
Generating a Certificate Keystore Per Host
Configure Accumulo Servers
Configure Accumulo Clients
Enable SSL for Apache Atlas
Configuring Apache Atlas SSL
SSL for Apache Atlas Credential Provider Utility Script
SPNEGO setup for WebHCat
Configure SSL for Knox
Create Self-Signed Certificate with Specific Hostname for Evaluations
Create CA-Signed Certificates for Production
Set Up Trust for the Knox Gateway Clients
Set Up SSL for Ambari
Set Up Truststore for Ambari Server
Set Up Two-Way SSL Between Ambari Server and Ambari Agents
Optional: Recreating the Ambari SSL Certificate Authority
Configure Ranger SSL
Configuring Public CA Certificates (Ranger SSL)
Prerequisites
Configure Ranger Admin
Configure Ranger Usersync
Configuring Ranger Plugins for SSL (Public CA Certificates)
Configure the Ranger HDFS Plugin for SSL
Configure the Ranger KMS Plugin for SSL
Configure the Ranger KMS Server for SSL
Configure Ranger KMS Database for SSL-enabled MySQL
Configure Ranger HBase Plugin for SSL
Configuring a Self-Signed Certificate (Ranger SSL)
Prerequisites
Configure Ranger Admin
Configure Ranger Usersync
Configuring Ranger Plugins (Self-Signed Certificate)
Configure the Ranger HDFS Plugin for SSL
Configure the Ranger KMS Plugin for SSL
Configure the Ranger KMS Server for SSL
Configure Ranger Admin Database for SSL-Enabled MySQL (Ranger SSL)
Connecting to SSL Enabled Components
Connect to SSL-Enabled HiveServer2 using JDBC
Connecting to SSL Enabled Oozie Server
Use a Self-Signed Certificate from Oozie Java Clients
Connect to Oozie from Java Clients
Connect to Oozie from a Web Browser
Configuring Advanced Security Options for Ambari
Advanced Security Options for Ambari
Configure Ciphers and Protocols for Ambari Server
Configure Ambari Web Inactivity Timeout
Configuring HDFS Encryption
HDFS Encryption
Ranger KMS Administration
Store Master Key in a Hardware Security Module (HSM)
Installing Ranger KMS Hardware Security Module (HSM)
Install Ranger KMS HSM Manually
Install Ranger KMS HSM via Ambari with a plain text password
Install Ranger KMS HSM via Ambari with JCEKS
Configure HSM for High Availability (HA)
Migrate between HSM and Ranger DB
Optional: Clear Objects from the HSM Partition
Installing Ranger Keysecure
Install Ranger KMS Keysecure Using Ambari with a plain text password
Install Ranger KMS Keysecure Using Ambari with JCEKS
Enable Ranger KMS Audit
Save Audits to Solr
Save Audits to HDFS
Enable SSL for Ranger KMS
Install Multiple Ranger KMS
Using the Ranger Key Management Service
Accessing the Ranger KMS Web UI
List and Create Keys
Roll Over an Existing Key
Delete a Key
Ranger KMS Properties
Troubleshooting Ranger KMS
HDFS "Data at Rest" Encryption
HDFS Encryption Overview
Configuring and Starting the Ranger Key Management Service (Ranger KMS)
Configuring and Using HDFS "Data at Rest" Encryption
Preparing the Environment
Create an Encryption Key
Create an Encryption Zone
Copying Files to or from an Encryption Zone
Reading and Writing Files from or to an Encryption Zone
Deleting Files from an Encryption Zone with Trash Enabled
Create an HDFS Admin User
Configuring HDP Services for HDFS Encryption
Configure HBase for HDFS Encryption
Configuring Hive for HDFS Encryption
Configure Hive Tables for HDFS Encryption
Loading Data into an Encrypted Table
Encrypting Other Hive Directories
Additional Changes in Behavior with HDFS-Encrypted Tables
Configure YARN for HDFS Encryption
Configuring Oozie for HDFS Encryption
Configuring Sqoop for HDFS Encryption
Configure WebHDFS for HDFS Encryption
Running DataNodes as Non-Root
Configuring DataNode SASL
Securing Credentials
Secure Credentials Management Overview
Encrypt Database and LDAP Passwords in Ambari
Remove Encryption Entirely
Change the Current Master Key
Configuring Ambari for Non-Root
Configure Ambari Server for Non-Root
Configure an Ambari Agent for Non-Root
Reference
Cluster Planning
Hardware Recommendations for Apache Hadoop
Typical Hadoop Cluster
Typical Workload Patterns For Hadoop
Early Deployments
Server Node Hardware Recommendations
Hardware for Slave Nodes
Hardware for Master Nodes
Hardware for HBase
Other Issues
Conclusion
File System Partitioning Recommendations
Apache Hive Workload Management Commands
Workload management command summary
ALTER MAPPING
ALTER POOL
ALTER RESOURCE PLAN
ALTER TRIGGER
CREATE MAPPING
CREATE POOL
CREATE RESOURCE PLAN
CREATE TRIGGER
DISABLE WORKLOAD MANAGEMENT
DROP MAPPING
DROP POOL
DROP RESOURCE PLAN
DROP TRIGGER
REPLACE RESOURCE PLAN WITH
REPLACE ACTIVE RESOURCE PLAN WITH
SHOW RESOURCE PLAN
SHOW RESOURCE PLANS
Workload trigger counters
Materialized View Commands
ALTER MATERIALIZED VIEW REBUILD
ALTER MATERIALIZED VIEW REWRITE
CREATE MATERIALIZED VIEW
DESCRIBE EXTENDED and DESCRIBE FORMATTED
DROP MATERIALIZED VIEW
SHOW MATERIALIZED VIEWS
HDFS ACLs
Apache HDFS ACLs
Configuring ACLs on HDFS
Using CLI Commands to Create and List ACLs
ACL Examples
ACLS on HDFS Features
Use Cases for ACLs on HDFS
ZooKeeper ACLs
Apache ZooKeeper ACLs Best Practices
ZooKeeper ACLs Best Practices: Accumulo
ZooKeeper ACLs Best Practices: Ambari Solr
ZooKeeper ACLs Best Practices: Atlas
ZooKeeper ACLs Best Practices: HBase
ZooKeeper ACLs Best Practices: HDFS/WebHDFS
ZooKeeper ACLs Best Practices: Hive/HCatalog
ZooKeeper ACLs Best Practices: Kafka
ZooKeeper ACLs Best Practices: Oozie
ZooKeeper ACLs Best Practices: Ranger
ZooKeeper ACLs Best Practices: Ranger KMS/Hadoop KMS
ZooKeeper ACLs Best Practices: Storm
ZooKeeper ACLs Best Practices: WebHCat
ZooKeeper ACLs Best Practices: YARN
ZooKeeper ACLs Best Practices: YARN Registry
ZooKeeper ACLs Best Practices: ZooKeeper
Audit Reference
Managing Auditing in Ranger: Access
Managing Auditing in Ranger: Admin
Managing Auditing in Ranger: Login Sessions
Managing Auditing in Ranger: Plugins
Managing Auditing in Ranger: Plugin Status
Managing Auditing in Ranger User Sync
Security Reference
Non-Ambari Security Overview
Setting Up Kerberos Authentication for Non-Ambari Clusters
Preparing Kerberos
Non-Ambari Kerberos Overview
Install and Configure the KDC (Non-Ambari)
Create the Database and Set Up the First Administrator
Create Service Principals and Keytab Files for HDP (Non-Ambari)
Configuring HDP for Kerberos
Create Mappings Between Principals and UNIX Usernames
Adding Security Information to Configuration Files
core-site.xml
hdfs-site.xml
yarn-site.xml
mapred-site.xml
hbase-site.xml
hive-site.xml
oozie-site.xml
webhcat-site.xml
limits.conf
Configuring HTTP Cookie Persistence
Configuring HBase and ZooKeeper
Configure HBase Master
Create JAAS configuration files
Start HBase and ZooKeeper services
Configure Secure Client-Side Access for HBase
Optional: Configure Client-Side Operation for Secure Operation- Thrift Gateway
Optional: Configure Client-Side Operation for Secure Operation- REST Gateway
Configure HBase for Access Control Lists (ACL)
Configure Phoenix Query Server
Set up One-Way Trust with Active Directory
Configuring Proxy Users
Configure Non-Ambari Ranger SSL
Configuring Non-Ambari Ranger SSL Using Public CA Certificates
Configure Ranger Admin
Configure Ranger Usersync
Configure the Ranger HDFS Plugin for SSL
Configuring a Self-Signed Certificate (Non-Ambari Ranger SSL)
Configure Ranger Admin
Configure Ranger Usersync
Configure Ranger Plugins
Enable Audit Logging in Non-Ambari Clusters
Knox Reference
Configuring Gateway Security
Implementing Web Application Security
Configure a Protection Filter Against CSRF
Knox Admin UI Quicklink Requirements for Unsecured Clusters
Set up the Knox Token Service for Ranger APIs
Ambari CLI Wizard for Knox SSO Reference
Ranger Reference
Use Consolidated DB Schema Script to Reduce Ranger Install Time
Ranger Kafka Policy Authorization Model
Apache Ranger Public REST APIs
Apache Ranger Public REST APIs
Service Definition APIs
Get Service Definition by ID
Get Service Definition by Name
Create Service Definition
Update Service Definition by ID
Update Service Definition by Name
Delete Service Definition by ID
Delete Service Definition by Name
Search Service Definitions
Service APIs
Get Service by ID
Get Service by Name
Create Service
Update Service by ID
Update Service by Name
Delete Service by ID
Delete Service by Name
Search Services
Policy APIs
Get Policy by ID
Get Policy by Service Name and Policy Name
Create Policy
Update Policy by ID
Update Policy by Service Name and Policy Name
Delete Policy by ID
Delete Policy by Service Name and Policy Name
Search Policies in a Service
Teradata Connector User Guide
Prerequisites
You must complete the following steps before configuring the Hive and HBase.
Install ZooKeeper, HBase, and Hive through Ambari.
Install the required version of Hadoop.
Add all the required jars:
Hive-hbase-handler.jar is available on the Hive client auxpath
ZooKeeper jar
Hbase server jar
Hbase client jar
Parent topic:
Understanding Apache HBase Hive integration
© 2012–2019, Hortonworks, Inc.
Document licensed under the
Creative Commons Attribution ShareAlike 4.0 License
.
Hortonworks.com
|
Documentation
|
Support
|
Community