Server and Client Configuration
Cloudera Manager generates server and client configuration files from its database.
Administrators are sometimes surprised that modifying /etc/hadoop/conf
and
then restarting HDFS has no effect. That is because service instances started by Cloudera Manager do not read configurations from the default locations. To use
HDFS as an example, when not managed by Cloudera Manager, there would
usually be one HDFS configuration per host, located at
/etc/hadoop/conf/hdfs-site.xml
. Server-side daemons and clients running on
the same host would all use that same configuration.
Cloudera Manager distinguishes between server and client configuration. In
the case of HDFS, the file /etc/hadoop/conf/hdfs-site.xml
contains only
configuration relevant to an HDFS client. That is, by default, if you run a program that needs
to communicate with Hadoop, it will get the addresses of the NameNode and JobTracker, and
other important configurations, from that directory. A similar approach is taken for
/etc/hbase/conf
and /etc/hive/conf
.
/var/run/cloudera-scm-agent/process/
unique-process-name.
Giving each process its own private execution and configuration environment allows Cloudera Manager to control each process independently. For example, here are
the contents of an example 879-hdfs-NAMENODE process
directory:$ tree -a /var/run/cloudera-scm-Agent/process/879-hdfs-NAMENODE/
/var/run/cloudera-scm-Agent/process/879-hdfs-NAMENODE/
├── cloudera_manager_Agent_fencer.py
├── cloudera_manager_Agent_fencer_secret_key.txt
├── cloudera-monitor.properties
├── core-site.xml
├── dfs_hosts_allow.txt
├── dfs_hosts_exclude.txt
├── event-filter-rules.json
├── hadoop-metrics2.properties
├── hdfs.keytab
├── hdfs-site.xml
├── log4j.properties
├── logs
│ ├── stderr.log
│ └── stdout.log
├── topology.map
└── topology.py
- Sensitive information in the server-side configuration, such as the password for the Hive Metastore RDBMS, is not exposed to the clients.
- A service that depends on another service may deploy with customized
configuration. For example, to get good HDFS read performance, Impala
needs a specialized version of the HDFS client configuration, which
may be harmful to a generic client. This is achieved by separating the
HDFS configuration for the Impala daemons (stored in the per-process
directory mentioned above) from that of the generic client
(
/etc/hadoop/conf
). - Client configuration files are much smaller and more readable. This also avoids confusing non-administrator Hadoop users with irrelevant server-side properties.