Abstract
The Cloudera Base on premises Reference Architecture is a high-level design and best-practices guide for deploying Cloudera Base on premises in customer data centers.
Cloudera Base on premises is the on-premises version of Cloudera. This new product combines the best of Cloudera Enterprise Data Hub and Hortonworks Data Platform Enterprise along with new features and enhancements across the stack. This unified distribution is a scalable and customizable platform where you can securely run many types of workloads.
Cloudera Base on premises supports a variety of hybrid solutions where compute tasks are separated from data storage and where data can be accessed from remote clusters. This hybrid approach provides a foundation for containerized applications by managing storage, table schema, authentication, authorization, and governance.
- Data Engineering: Ingest, transform, and analyze data
Services: HDFS, YARN, YARN Queue Manager, Ranger, Atlas, Hive metastore, Hive on Tez, Spark, Oozie, Hue, and Data Analytics Studio (DAS)
- Data Mart: Browse, query, and explore your data in an interactive
way
Services: HDFS, YARN, YARN Queue Manager, Ranger, Atlas, Hive metastore, Impala, and Hue
- Operational Database: Low latency writes, reads, and persistent access to data
for Online Transactional Processing (OLTP) use cases
Services: HDFS, Ranger, Atlas, and HBase
Installing a Cloudera Base on premises cluster involes installing a parcel called Cloudera Runtime that contains all of the services and installing certain powerful tools to manage, govern, and secure your cluster. For a complete list of the included components, see Cloudera Runtime Component Versions.