High-level Design and Best PracticesPDF version

Abstract

The Cloudera Base on premises Reference Architecture is a high-level design and best-practices guide for deploying Cloudera Base on premises in customer data centers.

Cloudera Base on premises is the on-premises version of Cloudera. This new product combines the best of Cloudera Enterprise Data Hub and Hortonworks Data Platform Enterprise along with new features and enhancements across the stack. This unified distribution is a scalable and customizable platform where you can securely run many types of workloads.

Cloudera Base on premises supports a variety of hybrid solutions where compute tasks are separated from data storage and where data can be accessed from remote clusters. This hybrid approach provides a foundation for containerized applications by managing storage, table schema, authentication, authorization, and governance.

Cloudera Base on premises comprises of a variety of components such as Apache HDFS, Apache Hive 3, Apache HBase, and Apache Impala, along with many other services for specialized workloads. You can select any combination of these services to create clusters that address your business requirements and workloads. Several pre-configured packages of services are also available for common workloads. These include:
  • Data Engineering: Ingest, transform, and analyze data

    Services: HDFS, YARN, YARN Queue Manager, Ranger, Atlas, Hive metastore, Hive on Tez, Spark, Oozie, Hue, and Data Analytics Studio (DAS)

  • Data Mart: Browse, query, and explore your data in an interactive way

    Services: HDFS, YARN, YARN Queue Manager, Ranger, Atlas, Hive metastore, Impala, and Hue

  • Operational Database: Low latency writes, reads, and persistent access to data for Online Transactional Processing (OLTP) use cases

    Services: HDFS, Ranger, Atlas, and HBase

Installing a Cloudera Base on premises cluster involes installing a parcel called Cloudera Runtime that contains all of the services and installing certain powerful tools to manage, govern, and secure your cluster. For a complete list of the included components, see Cloudera Runtime Component Versions.