Data Storage & Data OS Documentation

Apache HDFS is a Java-based file system for storing large volumes of data. Designed to span large clusters of commodity servers, HDFS provides scalable and reliable data storage.

Apache YARN is the processing layer for managing distributed applications that run on multiple machines in a network. YARN allows you to use various data processing engines for batch, interactive, and real-time stream processing of data stored in HDFS.

HDFS and YARN form the data management layer of Apache Hadoop. YARN provides the resource management while HDFS provides the storage.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

cta

Search Results

DataPlane Platform
Data Lifecycle Manager
Data Steward Studio
Data Analytics Studio
Data Platform Search
HDP for Cloud
Streams Messaging Manager for HDF and HDP
Edge Management
1.0.0
Flow Management
1.0.1
Stream Processing
2.0.0
Cloudera Data Platform
cloud
Management Console
cloud
Workload Manager
cloud
Data Catalog
cloud
Replication Manager
cloud
Data Hub
cloud
Data Warehouse
cloud
Machine Learning
cloud
Cloudera Runtime
7.0.0
Cloudera Manager
7.0.0
Altus
Cloud
Data Science Workbench
Kafka
Kudu
5.12
Manager
5.1
Navigator Optimizer
Cloud
Accumulo
1.7.2
Hive JDBC HDP
2.6.7
Netezza Connector
1.0
Teradata Connector
1.0
Licenses
Latest
Spark
2.3
Matrix
Latest
Reference Architecture
Latest
Security Bulletins
Latest
Workload XM
Cloud
cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?