Server and Client Configuration
Cloudera Manager generates server and client configuration files from its database.
Administrators are sometimes surprised that modifying
/etc/hadoop/conf
and then restarting HDFS has no
effect. That is because service instances started by Cloudera Manager do
not read configurations from the default locations. To use HDFS as an
example, when not managed by Cloudera Manager, there would usually be one
HDFS configuration per host, located at
/etc/hadoop/conf/hdfs-site.xml
. Server-side daemons and
clients running on the same host would all use that same
configuration.
Cloudera Manager distinguishes between server and client configuration.
In the case of HDFS, the file
/etc/hadoop/conf/hdfs-site.xml
contains only
configuration relevant to an HDFS client. That is, by default, if you run
a program that needs to communicate with Hadoop, it will get the addresses
of the NameNode and JobTracker, and other important configurations, from
that directory. A similar approach is taken for
/etc/hbase/conf
and
/etc/hive/conf
.
/var/run/cloudera-scm-agent/process/
unique-process-name.
Giving each process its own private execution and configuration
environment allows Cloudera Manager to control each process independently.
For example, here are the contents of an example
879-hdfs-NAMENODE process
directory:$ tree -a /var/run/cloudera-scm-Agent/process/879-hdfs-NAMENODE/
/var/run/cloudera-scm-Agent/process/879-hdfs-NAMENODE/
├── cloudera_manager_Agent_fencer.py
├── cloudera_manager_Agent_fencer_secret_key.txt
├── cloudera-monitor.properties
├── core-site.xml
├── dfs_hosts_allow.txt
├── dfs_hosts_exclude.txt
├── event-filter-rules.json
├── hadoop-metrics2.properties
├── hdfs.keytab
├── hdfs-site.xml
├── log4j.properties
├── logs
│ ├── stderr.log
│ └── stdout.log
├── topology.map
└── topology.py
- Sensitive information in the server-side configuration, such as the password for the Hive Metastore RDBMS, is not exposed to the clients.
- A service that depends on another service may deploy with customized
configuration. For example, to get good HDFS read performance, Impala
needs a specialized version of the HDFS client configuration, which
may be harmful to a generic client. This is achieved by separating the
HDFS configuration for the Impala daemons (stored in the per-process
directory mentioned above) from that of the generic client
(
/etc/hadoop/conf
). - Client configuration files are much smaller and more readable. This also avoids confusing non-administrator Hadoop users with irrelevant server-side properties.