Cloudera Docs
»
2.6.3
»
Data Access
Data Access
Also available as:
Contents
1. What's New in Data Access for HDP 2.6
What's New in Apache Hive
What's New in Apache Tez
What's New in Apache HBase
What's New in Apache Phoenix
Druid
2. Data Warehousing with Apache Hive
Content Roadmap
Features Overview
Temporary Tables
Optimized Row Columnar (ORC) Format
SQL Optimization
Enabling Cost-Based Optimization
Statistics
Generating Hive Statistics
Viewing Generated Statistics
SQL Compliance and ACID-Based Transactions
Transactions in Hive
Understanding and Administering Hive Compactions
Configuring the Hive Transaction Manager
Performing Manual Compactions
Lock Manager
Transaction Limitations
INSERT ... VALUES, UPDATE, DELETE, and MERGE SQL Statements
Creating Hive ACID Transaction Tables
SQL Standard-Based Authorization with GRANT and REVOKE SQL Statements
Subqueries
Common Table Expressions
Quoted Identifiers in Column Names
CHAR Data Type Support
Streaming Data Ingestion
Query Vectorization
Beeline versus Hive CLI
Hive JDBC and ODBC Drivers
Moving Data into Apache Hive
Using an External Table
Using Sqoop
Incrementally Updating a Table
Queries on Data Stored in Remote Clusters
Query Capability on Remote Clusters
Configuring HiveServer2
Configuring HiveServer2 for Transactions (ACID Support)
Configuring HiveServer2 for LDAP and for LDAP over SSL
Securing Apache Hive
Authorization Using Apache Ranger Policies
SQL Standard-Based Authorization
Configuring SQL Standard-Based Authorization
Required Privileges for Hive Operations
Storage-Based Authorization
Configuring Storage-Based Authorization
Permissions for Apache Hive Operations
Row-Level Filtering and Column Masking
Troubleshooting
JIRAs
3. Enabling Efficient Execution with Apache Pig and Apache Tez
4. Managing Metadata Services with Apache HCatalog
HCatalog Community Information
WebHCat Community Information
Security for WebHCat
5. Persistent Read/Write Data Access with Apache HBase
Content Roadmap
Deploying Apache HBase
Installation and Setup
Cluster Capacity and Region Sizing
Node Count and JVM Configuration
Physical Size of the Data
Read/Write Throughput
Region Count and Size
Increase MemStore size for RegionServer
Increase Size of Region
Initial Tuning of the Cluster
Increasing the Request Handler Thread Count
Configuring the Size and Number of WAL Files
Configuring Compactions
Splitting Tables
Tuning JVM Garbage Collection in RegionServers
Enabling Multitenancy with Namepaces
Default HBase Namespace Actions
Defining and Dropping Namespaces
Security Features Available in Technical Preview
Managing Apache HBase Clusters
Monitoring Apache HBase Clusters
Optimizing Apache HBase I/O
An Overview of HBase I/O
Configuring BlockCache
Compressing BlockCache
Configuring Off-Heap Memory (BucketCache)
Configuring BucketCache
Importing Data into HBase with Bulk Load
Using Snapshots
Configuring a Snapshot
Taking a Snapshot
Listing Snapshots
Deleting Snapshots
Cloning a Table from a Snapshot
Restoring a Snapshot
Snapshot Operations and ACLs
Exporting to Another Cluster
Backing up and Restoring Apache HBase Datasets
Planning a Backup-and-Restore Strategy for Your Environment
Backup within a Cluster
Dedicated HDFS Archive Cluster
Backup to the Cloud or a Storage Vendor
Best Practices for Backup-and-Restore
Running the Backup-and-Restore Utility
Creating and Maintaining a Complete Backup Image
Required Command-Line Arguments
Optional Command-Line Arguments
Example of Usage
Monitoring Backup Progress
Required Command-Line Argument
Example of Usage
Using Backup Sets
Subcommands
Optional Command-Line Arguments
Example of Usage
Restoring a Backup Image
Required Command-Line Arguments
Optional Command-Line Arguments
Example of Usage
Administering and Deleting Backup Images
Technical Details of Incremental Backup-and-Restore
Scenario: Safeguarding Application Datasets on Amazon S3
Medium Object (MOB) Storage Support in Apache HBase
Enabling MOB Storage Support
Testing the MOB Storage Support Configuration
Tuning MOB Storage Cache Properties
HBase Quota Management
Setting Up Quotas
Throttle Quotas
Space Quotas
Quota Enforcement
Quota Violation Policies
Impact of Quota Violation Policy
Number-of-Tables Quotas
Number-of-Regions Quotas
HBase Best Practices
6. Orchestrating SQL and APIs with Apache Phoenix
Enabling Phoenix and Interdependent Components
Thin Client Connectivity with Phoenix Query Server
Securing Authentication on the Phoenix Query Server
Selecting and Obtaining a Client Driver
Creating and Using User-Defined Functions (UDFs) in Phoenix
Mapping Phoenix Schemas to HBase Namespaces
Enabling Namespace Mapping
Creating New Schemas and Tables with Namespace Mapping
Associating Tables of a Schema to a Namespace
Associating in a Noncustomized Environment without Kerberos
Associating in a Customized Kerberos Environment
Phoenix Repair Tool
Running the Phoenix Repair Tool
7. Real-Time Data Analytics with Druid
Content Roadmap
Architecture
Installing and Configuring Druid
Interdependencies for the Ambari-Assisted Druid Installation
Assigning Slave and Client Components
Configuring the Druid Installation
Setting up MySQL for Druid
Security and Druid
Securing Druid Web UIs and Accessing Endpoints
High Availability in Druid Clusters
Configuring Druid Clusters for High Availability
Configure a Cluster with an HDFS Filesystem
Leveraging Druid to Accelerate Hive SQL Queries
How Druid Indexes Hive-Sourced Data
Transforming Hive Data to Druid Datasources
Performance-Related druid.* Properties
« Prev
Next »
JIRAs
Issue tracking for Hive bugs and improvements can be found on the
Apache Hive site
.
© 2012–2021 by Cloudera, Inc.
Document licensed under the
Creative Commons Attribution ShareAlike 4.0 License
.
Cloudera.com
|
Documentation
|
Support
|
Community