Cluster configuration best practices Review the cluster configuration best practices. ZooKeeperLearn why it is recommended to install ZooKeeper on a node where it can have unobstructed access to the disk.HDFSLearn about the various considerations and bottlenecks when planning cluster configuration for the HDFS service.YARNThe YARN service manages MapReduce and Spark tasks. Applications run in YARN containers, which use Linux Cgroups for resource management and process isolation.ImpalaThe Impala service is a distributed, MPP database engine for interactive performance of SQL queries over large data sets. Impala performs best when it can operate on data in memory. Therefore, Impala is often configured with a very large heap size.SparkCloudera supports Spark on YARN-managed deployments for a more flexible and consistent resource management approach.HBaseBy default, major compactions happen every 7 days. The next major compaction happens 7 days after the last one has finished. This means that the actual time that major compaction happens can impact production processes, which is not ideal if it is desired to run compactions at a specific known off-peak hour, such as at 3 AM.SearchCloudera Search is a service based on Apache Solr. It provides a distributed search engine service. Search engines are often expected to provide fast, interactive performance so it is important to allocate sufficient RAM to the Search serviceOozieWriting Oozie XML configuration files can be tedious and error-prone. Cloudera recommends that you use the Oozie editor that is embedded in Hue for creating, scheduling, and executing Oozie workflows.KafkaKafka’s default configuration with Cloudera Manager is suited to start development quickly. Several default settings should be changed before deploying a Cloudera Kafka cluster in production. KuduReview the partitioning guidelines and limitations before deploying the Kudu service on your cluster.Parent topic: Cluster configuration