Data Access

Cloudera Data Platform Runtime includes Apache Hive 3 and Apache Impala for storing and accessing data in the Hive metastore database. Hive 3 addresses enterprise data warehouse demands for transactional data in the ORC file format. Impala performs high-performance, low-latency SQL queries on data in Parquet and other formats. Hue is a web-based interactive editor for querying the Hive metastore that also creates Oozie workflows. DAS is a web application for performing operations on Hive tables and also provides recommendations for optimizing the performance of your queries.

Data Analytics Studio🔗

Using Data Analytics Studio🔗

Describes how to work with queries, manage databases and tables, and generate reports.

Apache Hive🔗

Working with Hive Metastore🔗

Describes how Hive metastore (HMS) detects client types and and stores compatible tables, authorizes access to tables from Spark, and stores metadata of multiple services, such as Hive and Impala.

Starting Apache Hive🔗

Describes how to launch Hive, execute commands, and issue queries from Beeline.

Using Hive🔗

Covers how to use Hive 3 to query flat and transactional data using SQL statements.

Managing Apache Hive🔗

Includes information about mature ACID v2 operations on transactions, compaction of files that accumulate during, ingestion, and query vectorization.

Configuring Apache Hive🔗

Describes how to set up Hive to generate statistics and control the number of concurrent connections to Hive.

Securing Apache Hive🔗

Discusses how to choose an authorization model based on your use case.

Integrating Apache Hive with Apache Spark and BI🔗

Covers accessing Hive tables from Spark through the Spark Direct Reader or Hive Warehouse Connector, using JdbcStorageHandler to access an external data source, and connecting to Business Intelligence (BI) tools.

Apache Hive Performance Tuning🔗

Explains low-latency analytical processing, caching, and tuning options.

Migrating Data Using Sqoop🔗

Explains how to move data from relational databases directly to Hive or to the file system or object store and how to move data back to Hive.

Apache Impala🔗

Starting and Stopping Apache Impala🔗

Presents the task topics for configuring client access to Impala, and starting and stopping Impala.

Securing Apache Impala🔗

Describes a set of security features Impala provides to protect your critical and sensitive data.

Configuring Apache Impala🔗

Describes how to customize your environment after installing Impala.

Tuning Apache Impala🔗

Describes how to tune Impala queries and other SQL operations.

Managing Apache Impala🔗

Presents the task topics for managing resources and metadata in Impala.

Monitoring Apache Impala🔗

Describes how to monitor Impala service to run smoothly and avoid conflicts with other components running on the same cluster.

Hue🔗

Using Hue🔗

Describes how to use Hue to query Apache Impala data sets and how to use it to browse metadata in Apache Atlas.

Administering Hue🔗

Describes how to configure Hue, customize its web UI, and to enable integration with Apache Atlas.

Securing Hue🔗

Describes how to set Hue user and application permissions, configure SSL connections, LDAP authentication, and integration with Apache Ranger and Knox.

Tuning Hue🔗

Describes how to add a load balancer and configure high availability for Hue and between Hue and other components, such as Hive, Impala, and HDFS.

Search🔗

Search Tutorial🔗

A tutorial on using Cloudera Search.

Securing Cloudera Search🔗

Describes how to secure Solr network connections, configure authentication and authorization.

Tuning Cloudera Search🔗

Describes how to optimize Cloudera Search performance for various use cases.

Managing Cloudera Search🔗

Describes how to configure and manage Cloudera Search.

Cloudera Search ETL🔗

Describes how to perform ETL using Cloudera Search and Morphlines.

Indexing Data Using Cloudera Search🔗

Describes how to index data using Cloudera Search.