Data is maintained on a file system or object store service such as HDFS or Ozone. Data
may also exist in higher level abstractions maintained by a service such as HBase or Solr, which
create their own file formats on HDFS. In other cases, for services such as Kudu or Kafka, the
data is placed and maintained directly on a local disk.
File systems CDP has multiple deployment methodologies in both public cloud and on-premises. These form factors have multiple, interrelated mechanisms for storing data at various organizational levels within the cluster. Each layer has specific requirements related to how and whether data should be replicated. These layers include local file storage on individual nodes, HDFS, and Ozone. Data Stores Separate data stores might exist on top of HDFS and Ozone or at the local file system level. These data stores might have their own replication or backup and restore mechanisms used in conjunction with traditional Hadoop replication strategies. Services might have additional availability capabilities and configurations to help establish intra-cluster high availability. This creates a robust and fault tolerant environment when paired with cross-cluster replication. Hive and Impala data Hive and Impala data generally reside on HDFS. Hive and Impala replication enables you to copy your Hive metastore and data from one cluster to another. You can synchronize the Hive metastore and data on the destination cluster with the source, based on a specified replication policy.