
- Release Notes
- Concepts
- Apache Hive Overview
- Data Governance Overview
- Apache Spark Overview
- Apache Zeppelin Overview
- HDP Security Overview
- Apache Kafka Overview
- Apache Storm Overview
- Installation & Upgrade
- Installation
- Apache Ambari Installation
- Apache Ambari Installation for IBM Power Systems
- Installing Apache Atlas
- Installing Apache Spark
- Installing Apache Zeppelin
- Installing Apache Ranger
- Installing Ranger Using Ambari Overview
- Set Up Hadoop Group Mapping for LDAP/AD
- Ranger Password Requirements
- Configuring a Database Instance for Ranger
- Start the Ranger Installation
- Customize Services: Admin
- Customize Services: Audit
- Customize Services: Plugins
- Customize Services: User Sync
- Customize Services: Tagsync
- Customize Services: Authentication
- Complete the Ranger Installation
- Additional Ranger Plugin Configuration Steps for Kerberos Clusters
- Installing Apache Ranger KMS
- Installating Apache Knox
- Installing and Configuring Apache Storm
- Installing and Configuring Apache Kafka
- Upgrade
- Installation
- How To
- Data Storage & Data OS
- Managing Data Operating System
- Introduction
- Application Development
- Using the YARN REST APIs to Manage Applications
- Collect Application Data with the Timeline Server
- Timeline Service 2.0 Overview
- Timeline Server 1.5 Overview
- Upgrade Timeline Server 1.0 to 1.5
- Configure the Timeline Server
- Enable Generic Data Collection
- Configure Per-framework Data Collection
- Configure the Timeline Server Store
- Configure Timeline Server Security
- Run the Timeline Server
- Access Generic Data from the Command Line
- Publish Per-framework Data in Applications
- Application Management
- Cluster Management
- Allocating Resources with the Capacity Scheduler
- Capacity Scheduler Overview
- Enable the Capacity Scheduler
- Set up Queues
- Control Access to Queues with ACLs
- Define Queue Mapping Policies
- Manage Cluster Capacity with Queues
- Set Queue Priorities
- Resource Distribution Workflow
- Resource Distribution Workflow Example
- Set User Limits
- Application Reservations
- Set Flexible Scheduling Policies
- Start and Stop Queues
- Set Application Limits
- Enable Preemption
- Enable Priority Scheduling
- Configure ACLs for Application Priorities
- Enable Intra-Queue Preemption
- Monitoring Clusters using YARN Web User Interface
- Fault Tolerance
- Scaling Namespaces and Optimizing Data Storage
- Introduction
- Scaling namespaces
- Optimizing data storage
- Balancing data across disks of a DataNode
- Increasing storage capacity with HDFS erasure coding
- Increasing storage capacity with HDFS compression
- Setting archival storage policies
- Balancing data across an HDFS cluster
- Optimizing performance
- Using the NFS Gateway for accessing HDFS
- Data storage metrics
- APIs for accessing HDFS
- Administering HDFS
- Cluster Maintenance
- Decommissioning slave nodes
- Manually add slave nodes to an HDP cluster
- Using DistCp to Copy Files
- Using DistCp
- Command Line Options
- Update and Overwrite
- DistCp and Security Settings
- Secure-to-Secure: Kerberos Principal Name
- Secure-to-Secure: ResourceManager mapping rules
- DistCp between HA clusters
- DistCp and HDP version
- DistCp data copy matrix
- Copying Data from HDP-2.x to HDP-1.x Clusters
- DistCp Architecture
- DistCp Frequently Asked Questions
- DistCp additional considerations
- Ports and Services Reference
- Configuring ports
- Accumulo service ports
- Atlas service ports
- Flume service ports
- HBase service ports
- HDFS service ports
- Hive service ports
- Hue service port
- Kafka service ports
- Kerberos service ports
- Knox service ports
- MapReduce service ports
- MySQL service ports
- Oozie service ports
- Ranger service ports
- Sqoop service ports
- Storm service ports
- Tez ports
- YARN service ports
- Zeppelin service port
- ZooKeeper service ports
- Controlling HDP services manually
- Configuring ports
- Cluster Maintenance
- Configuring Fault Tolerance
- Configuring Fault Tolerance
- High Availability on Non-Ambari Clusters
- Configuring High Availability for the Hive Metastore
- Deploying Multiple HiveServer2 Instances for High Availability
- Configuring HiveServer2 High Availability Using ZooKeeper
- Configuring High Availability for HBase
- Introduction to HBase High Availability
- Propagating Writes to Region Replicas
- Timeline Consistency
- Configuring HA Reads for HBase
- Creating Highly Available HBase Tables with the HBase Java API
- Creating Highly Available HBase Tables with the HBase Shell
- Querying Secondary Regions
- Monitoring Secondary Region Replicas
- HBase Cluster Replication for Geographic Data Distribution
- Configuring NameNode High Availability
- Configuring ResourceManager High Availability
- Configuring Apache Ranger High Availability
- Data Protection
- Managing Data Operating System
- Data Access
- Starting Apache Hive
- Using Apache Hive
- Apache Hive 3 tables
- Hive 3 ACID transactions
- Using materialized views
- Apache Hive queries
- Create partitions dynamically
- Query a SQL data source using the JdbcStorageHandler
- Creating a user-defined function
- Managing Apache Hive
- Managing Apache Hive Workloads
- Configuring Apache Hive
- Securing Apache Hive
- Integrating Apache Hive with Spark and BI
- Migrating Data
- Apache Hive Performance Tuning
- Optimizing an Apache Hive data warehouse
- LLAP ports
- Preparations for tuning performance
- Setting up LLAP
- LLAP and HiveServer Interactive properties
- Use HiveServer Interactive UI
- Connect a JDBC client to LLAP
- Configuring YARN queues for Hive
- Set up multiple HiveServer instances
- Key components of Hive warehouse processing
- Query result cache and metastore cache
- Tez execution engine properties
- Monitoring Apache Hive performance
- Maximizing storage resources using ORC
- Improving performance using partitions
- Handling bucketed tables
- Improving performance using the cost-based optimizer
- Adding Druid to a cluster
- Apache Druid introduction
- Apache Druid architectural overview
- Apache Druid content roadmap
- Setting up and using Apache Druid
- Configure Apache Druid for high availability
- Visualizing Druid data in Superset
- Apache Superset aggregations
- Securing Apache Druid using Kerberos
- Enable Kerberos authentication in Apache Druid
- Access Kerberos-protected HTTP endpoints
- Using Druid and Apache Hive
- Using HBase to Store and Access Data
- What's New in Apache HBase
- Overview of Apache HBase
- Apache HBase installation
- Managing Apache HBase clusters
- Backing up and restoring Apache HBase datasets
- Planning a backup-and-restore Strategy for your environment
- Best practices for backup-and-restore
- Running the backup-and-restore utility
- Medium Object (MOB) storage support in Apache HBase
- Methods to enable MOB storage support
- Method 1:Enable MOB Storage support using configure options in the command line
- Method 2: Invoke MOB support parameters in a Java API
- Test the MOB storage support configuration
- MOB storage cache properties
- HBase quota management
- HBase Best Practices
- Using Phoenix to Store and Access Data
- What's New in Apache Phoenix
- Orchestrating SQL and APIs with Apache Phoenix
- Creating and using User-Defined functions (UDFs) in Phoenix
- Overview of mapping Phoenix schemas to HBase namespaces
- Associating tables of a schema to a namespace
- Understanding Apache Phoenix-spark connector
- Understanding Apache Phoenix-Hive connector
- Python library for Apache Phoenix
- Using index in Phoenix
- Phoenix repair tool
- Run the Phoenix repair tool
- Accessing Cloud Data
- Streaming
- Developing Apache Storm Applications
- Developing Apache Storm Applications
- Using Storm to Move Data
- Working with Storm Topologies
- Mirroring Data Across Clusters with Kafka MirrorMaker
- Creating Kafka Topics
- Developing Kafka Producers and Consumer
- Developing Apache Storm Applications
- Governance
- Data Science
- Configuring Apache Spark
- Running Apache Spark Applications
- Introduction
- Running Sample Spark Applications
- Running Spark in Docker Containers on YARN
- Submitting Spark Applications Through Livy
- Running PySpark in a Virtual Environment
- Automating Spark Jobs with Oozie Spark Action
- Developing Apache Spark Applications
- Introduction
- Using the Spark DataFrame API
- Using Spark SQL
- Using the Hive Warehouse Connector with Spark
- Calling Hive User-Defined Functions
- Using Spark Streaming
- HBase Data on Spark with Connectors
- Accessing HDFS Files from Spark
- Accessing ORC Data in Hive Tables
- Using Custom Libraries with Spark
- Using Spark from R: SparkR
- Tuning Apache Spark
- Configuring Apache Zeppelin
- Configuring Apache Zeppelin Security
- Introduction
- Getting Started
- Configure Zeppelin for Authentication: Non-Production Use
- Configure Zeppelin for Authentication: LDAP and Active Directory
- Enabling Access Control for Zeppelin Elements
- Configure SSL for Zeppelin
- Configure Zeppelin for a Kerberos-Enabled Cluster
- Shiro Settings: Reference
- shiro.ini Example
- Using Apache Zeppelin
- IBM Data Science Experience
- Security
- Configuring Proxy with Apache Knox
- Set Up Knox Proxy
- Configuring the Knox Gateway
- Audit Gateway Activity
- Manually Configuring Knox Topology Files
- Defining Cluster Topologies
- Configuring a Server for Knox
- Mapping the Internal Nodes to External URLs
- Configuring an Authentication Provider
- Configuring a Federation Provider
- Configuring Identity Assertion
- Set Up an Authorization Provider
- Setting Up Knox Services for HA
- Configuring Knox With Kerberos
- Configuring Authentication with Kerberos
- Configuring Authentication with Kerberos
- Kerberos Overview
- Kerberos Principals Overview
- Enabling SPNEGO Authentication for Hadoop
- Enabling Kerberos Authentication Using Ambari
- Configuring HDP Components for Kerberos
- Configuring Kafka for Kerberos
- Configuring Storm for Kerberos
- Securing Apache HBase in a production environment
- Configuring Ambari Authentication with LDAP/AD
- Configuring Ranger Authentication with UNIX, LDAP, or AD
- Configuring Knox SSO
- Providing Authorization with Ranger
- Using Ranger to Provide Authorization in Hadoop
- Ranger Policies Overview
- Using the Ranger Console
- Resource-Based Services and Policies
- Configuring Resource-Based Services
- Configure a Resource-based Service: HBase
- Configure a Resource-based Service: HDFS
- Configure a Resource-based Service: Hive
- Configure a Resource-based Service: Kafka
- Configure a Resource-based Service: Knox
- Configure a Resource-based Service: Solr
- Configure a Resource-based Service: Storm
- Configure a Resource-based Service: YARN
- Configure a Resource-based Service: Atlas
- Configuring Resource-Based Policies
- Configure a Resource-based Policy: HBase
- Configure a Resource-based Policy: HDFS
- Configure a Resource-based Policy: Hive
- Configure a Resource-based Policy: Kafka
- Configure a Resource-based Policy: Knox
- Configure a Resource-based Policy: Solr
- Configure a Resource-based Policy: Storm
- Configure a Resource-based Policy: YARN
- Configure a Resource-based Policy: Atlas
- Wildcards and Variables in Resource-based Policies
- Importing and Exporting Resource-Based Policies
- Row-level Filtering and Column Masking in Hive
- Configuring Resource-Based Services
- Tag-Based Services and Policies
- Create a Time-bound Policy
- Administering Ranger Users/Groups and Permissions
- Administering Ranger Reports
- Adding a New Component to Apache Ranger
- Configuring Advanced Authorization Settings
- Managing Auditing
- Audit Overview
- Manually Enabling Audit Settings in Ambari Clusters
- Managing Auditing in Ranger
- Using Apache Solr for Ranger Audits
- Create Read-Only Admin User (Auditor)
- Configuring Wire Encryption
- Wire Encryption
- Enable RPC Encryption
- Enable Data Transfer Protocol
- Enabling SSL Understanding the Hadoop SSL Keystore Factory
- Creating and Managing SSL Certificates
- Enabling SSL for HDP Components
- Enable SSL for WebHDFS, MapReduce, Tez, and YARN
- Enable SSL for HttpFS
- Enable SSL on Oozie
- Enable SSL on the HBase REST Server
- Enable SSL on the HBase Web UI
- Enable SSL on HiveServer2
- Enable SSL for Kafka Clients
- Enable SSL for Accumulo
- Enable SSL for Apache Atlas
- SPNEGO setup for WebHCat
- Configure SSL for Knox
- Set Up SSL for Ambari
- Configure Ranger SSL
- Configuring Public CA Certificates (Ranger SSL)
- Configuring a Self-Signed Certificate (Ranger SSL)
- Configure Ranger Admin Database for SSL-Enabled MySQL (Ranger SSL)
- Connecting to SSL Enabled Components
- Configuring Advanced Security Options for Ambari
- Configuring HDFS Encryption
- HDFS Encryption
- Ranger KMS Administration
- Store Master Key in a Hardware Security Module (HSM)
- Enable Ranger KMS Audit
- Enable SSL for Ranger KMS
- Install Multiple Ranger KMS
- Using the Ranger Key Management Service
- Ranger KMS Properties
- Troubleshooting Ranger KMS
- HDFS "Data at Rest" Encryption
- HDFS Encryption Overview
- Configuring and Starting the Ranger Key Management Service (Ranger KMS)
- Configuring and Using HDFS "Data at Rest" Encryption
- Configuring HDP Services for HDFS Encryption
- Running DataNodes as Non-Root
- Securing Credentials
- Configuring Proxy with Apache Knox
- Data Storage & Data OS
- Reference
- Cluster Planning
- Apache Hive Workload Management Commands
- Workload management command summary
- ALTER MAPPING
- ALTER POOL
- ALTER RESOURCE PLAN
- ALTER TRIGGER
- CREATE MAPPING
- CREATE POOL
- CREATE RESOURCE PLAN
- CREATE TRIGGER
- DISABLE WORKLOAD MANAGEMENT
- DROP MAPPING
- DROP POOL
- DROP RESOURCE PLAN
- DROP TRIGGER
- REPLACE RESOURCE PLAN WITH
- REPLACE ACTIVE RESOURCE PLAN WITH
- SHOW RESOURCE PLAN
- SHOW RESOURCE PLANS
- Workload trigger counters
- Materialized View Commands
- HDFS ACLs
- ZooKeeper ACLs
- Apache ZooKeeper ACLs Best Practices
- ZooKeeper ACLs Best Practices: Accumulo
- ZooKeeper ACLs Best Practices: Ambari Solr
- ZooKeeper ACLs Best Practices: Atlas
- ZooKeeper ACLs Best Practices: HBase
- ZooKeeper ACLs Best Practices: HDFS/WebHDFS
- ZooKeeper ACLs Best Practices: Hive/HCatalog
- ZooKeeper ACLs Best Practices: Kafka
- ZooKeeper ACLs Best Practices: Oozie
- ZooKeeper ACLs Best Practices: Ranger
- ZooKeeper ACLs Best Practices: Ranger KMS/Hadoop KMS
- ZooKeeper ACLs Best Practices: Storm
- ZooKeeper ACLs Best Practices: WebHCat
- ZooKeeper ACLs Best Practices: YARN
- ZooKeeper ACLs Best Practices: YARN Registry
- ZooKeeper ACLs Best Practices: ZooKeeper
- Audit Reference
- Security Reference
- Non-Ambari Security Overview
- Setting Up Kerberos Authentication for Non-Ambari Clusters
- Preparing Kerberos
- Configuring HDP for Kerberos
- Create Mappings Between Principals and UNIX Usernames
- Adding Security Information to Configuration Files
- Configuring HBase and ZooKeeper
- Configure HBase Master
- Create JAAS configuration files
- Start HBase and ZooKeeper services
- Configure Secure Client-Side Access for HBase
- Optional: Configure Client-Side Operation for Secure Operation- Thrift Gateway
- Optional: Configure Client-Side Operation for Secure Operation- REST Gateway
- Configure HBase for Access Control Lists (ACL)
- Configure Phoenix Query Server
- Set up One-Way Trust with Active Directory
- Configuring Proxy Users
- Configure Non-Ambari Ranger SSL
- Enable Audit Logging in Non-Ambari Clusters
- Setting Up Kerberos Authentication for Non-Ambari Clusters
- Knox Reference
- Ranger Install Reference
- Non-Ambari Security Overview
- Apache Ranger Public REST APIs
- HBase Java API Reference
- Teradata Connector User Guide