Overview of Cloudera and the Cloudera Documentation Set
Cloudera provides a scalable, flexible, integrated platform that makes it easy to manage rapidly increasing volumes and varieties of data in your enterprise. Cloudera products and solutions enable you to deploy and manage Apache Hadoop and related projects, manipulate and analyze your data, and keep that data secure and protected.
- CDH—The Cloudera distribution of Apache Hadoop and other related open-source projects,
including Impala and Cloudera Search. CDH also provides security and integration with numerous hardware and software solutions.
- Cloudera Impala—A massively parallel processing SQL engine for interactive analytics and business intelligence. Its highly optimized architecture makes it ideally suited for traditional BI-style queries with joins, aggregations, and subqueries. It can query Hadoop data files from a variety of sources, including those produced by MapReduce jobs or loaded into Hive tables. The YARN and Llama resource management components let Impala coexist on clusters running batch workloads concurrently with Impala SQL queries. You can manage Impala alongside other Hadoop components through the Cloudera Manager user interface, and secure its data through the Sentry authorization framework.
- Cloudera Search—Provides near real-time access to data stored in or ingested into Hadoop and HBase. Search provides near real-time indexing, batch indexing, full-text exploration and navigated drill-down, as well as a simple, full-text interface that requires no SQL or programming skills. Fully integrated in the data-processing platform, Search uses the flexible, scalable, and robust storage system included with CDH. This eliminates the need to move large data sets across infrastructures to perform business tasks.
- Cloudera Manager—A sophisticated application used to deploy, manage, monitor, and diagnose issues with your CDH deployments. Cloudera Manager provides the Admin Console, a web-based user interface that makes administration of your enterprise data simple and straightforward. It also includes the Cloudera Manager API, which you can use to obtain cluster health information and metrics, as well as configure Cloudera Manager.
- Cloudera Navigator—An end-to-end data management and security tool for the CDH platform. Cloudera Navigator enables administrators, data managers, and analysts to explore the large amounts of data in Hadoop, and simplifies the storage and management of encryption keys. The robust auditing, data management, lineage management, lifecycle management, and encryption key management in Cloudera Navigator allow enterprises to adhere to stringent compliance and regulatory requirements.
Documentation Overview
Guide |
Description |
|
---|---|---|
Cloudera provides a scalable, flexible, integrated platform that makes it easy to manage rapidly increasing volumes and varieties of data in your enterprise. Cloudera products and solutions enable you to deploy and manage Apache Hadoop and related projects, manipulate and analyze your data, and keep that data secure and protected. |
||
This guide contains release and download information for installers and administrators. It includes release notes as well as information about versions and downloads. The guide also provides a release matrix that shows which major and minor release version of a product is supported with which release version of Cloudera Manager, CDH and, if applicable, Cloudera Impala. |
||
This guide describes how to quickly install Cloudera software and create initial deployments for proof of concept (POC) or development. It describes how to download and use the QuickStart virtual machines, which provide everything you need to start a basic installation. It also shows you how to create a new installation of Cloudera Manager 5, CDH 5, and managed services on a cluster of four hosts. QuickStart installations should be used for demonstrations and POC applications only and are not recommended for production. |
||
installation.html#xd_583c10bfdbd326ba--5a52cca-1476e7473cd--7fdb |
This guide provides Cloudera software requirements and installation information for production deployments. This guide also provides specific port information for Cloudera software. |
|
This guide describes how to configure and administer a Cloudera deployment. Administrators manage resources, availability, and backup and recovery configurations. In addition, this guide shows how to implement high availability, and discusses integration. |
||
This guide describes how to perform governance using Cloudera Navigator. Governance activities include auditing access to data residing in HDFS and Hive metastores, reviewing and updating metadata, and discovering the lineage of data objects. |
||
This guide shows how to monitor the health of a Cloudera deployment and diagnose issues. You can obtain metrics and usage information and view processing activities. This guide also describes how to examine logs and reports to troubleshoot issues with cluster configuration and operation as well as monitor compliance. |
||
This guide is intended for system administrators who want to secure a cluster using data encryption, user authentication, and authorization techniques. This topic also provides information about Hadoop security programs and shows you how to set up a gateway to restrict access. |
||
This guide describes Cloudera Impala, its features and benefits, and how it works with CDH. This topic introduces Impala concepts, describes how to plan your Impala deployment, and provides tutorials for first-time users as well as more advanced tutorials that describe scenarios and specialized features. You will also find a language reference, performance tuning, instructions for using the Impala shell, troubleshooting information, and frequently asked questions. |
||
This guide provides instructions for installing Cloudera software. |
||
This guide contains glossaries of terms for Cloudera components. |