Use the following instructions to manually configure the cluster properties file for deploying HDP from the command-line interface or in a script.
Create a file for the cluster properties, or use the sample
clusterproperties.txt
file extracted from the HDP Installation zip file. You'll pass the name of the cluster properties file to themsiexec
call when you install HDP. The following examples use the file nameclusterproperties.txt
.Add the properties to the
clusterproperties.txt
file as described in the table below. As you add properties, keep in mind the following:All properties in the cluster properties file must be separated by a newline character.
Directory paths cannot contain white space characters. (For example,
c:\Program Files\Hadoop
is an invalid directory path for HDP.)Use Fully Qualified Domain Names (FQDN) to specify the network host name for each cluster host.
The FQDN is a DNS name that uniquely identifies the computer on the network. By default, it is a concatenation of the host name, the primary DNS suffix, and a period.
When specifying the host lists in the cluster properties file, if the hosts are multi-homed or have multiple NIC cards, make sure that each name or IP address is the preferred name or IP address by which the hosts can communicate among themselves. In other words, these should be the addresses used internal to the cluster, not those used for addressing cluster nodes from outside the cluster.
To Enable NameNode HA, you must include the HA properties and exclude the
SECONDARY_NAMENODE_HOST
definition.
Table 2.11. Configuration Values for Deploying HDP
Configuration Property Name |
Description |
Example Value |
Mandatory/Optional |
---|---|---|---|
HDP_LOG_DIR |
HDP's operational logs are written to this directory on each cluster host. Ensure that you have sufficient disk space for storing these log files. |
d:\hadoop\logs |
Mandatory |
HDP_DATA_DIR |
HDP data will be stored in this directory on each cluster node. You can add multiple comma-separated data locations for multiple data directories. |
d:\hdp\data |
Mandatory |
HDFS_NAMENODE_ DATA_DIR |
Determines where on the local file system the HDFS name node should store the name table (fsimage). You can add multiple comma-separated data locations for multiple data directories. |
d:\hadoop\data\hdfs\nn,c:\hdpdata,d:\hdpdatann |
Mandatory |
HDFS_DATANODE_ DATA_DIR |
Determines where on the local file system an HDFS data node should store its blocks. You can add multiple comma-separated data locations for multiple data directories. |
d:\hadoop\data\hdfs\dn,c:\hdpdata,d:\hdpdatadn |
Mandatory |
NAMENODE_HOST |
The FQDN for the cluster node that will run the NameNode master service. |
NAMENODE-MASTER.acme.com |
Mandatory |
SECONDARY_NAMENODE_ HOST |
The FQDN for the cluster node that will run the Secondary NameNode master service. |
SECONDARY-NN-MASTER.acme.com |
Mandatory when no HA |
RESOURCEMANAGER_HOST |
The FQDN for the cluster node that will run the YARN Resource Manager master service. |
RESOURCE-MANAGER.acme.com |
Mandatory |
HIVE_SERVER_HOST |
The FQDN for the cluster node that will run the Hive Server master service. |
HIVE-SERVER-MASTER.acme.com |
Mandatory |
OOZIE_SERVER_HOST |
The FQDN for the cluster node that will run the Oozie Server master service. |
OOZIE-SERVER-MASTER.acme.com |
Mandatory |
WEBHCAT_HOST |
The FQDN for the cluster node that will run the WebHCat master service. |
WEBHCAT-MASTER.acme.com |
Mandatory |
FLUME_HOSTS |
A comma-separated list of FQDN for those cluster nodes that will run the Flume service. |
FLUME-SERVICE1.acme.com, FLUME-SERVICE2.acme.com, FLUME-SERVICE3.acme.com |
Mandatory |
HBASE_MASTER |
The FQDN for the cluster node that will run the HBase master. |
HBASE-MASTER.acme.com |
Mandatory |
HBASE_REGIONSERVERS |
A comma-separated list of FQDN for those cluster nodes that will run the HBase Region Server services. |
slave1.acme.com, slave2.acme.com, slave3.acme.com |
Mandatory |
SLAVE_HOSTS |
A comma-separated list of FQDN for those cluster nodes that will run the DataNode and TaskTracker services. |
slave1.acme.com, slave2.acme.com, slave3.acme.com |
Mandatory |
ZOOKEEPER_HOSTS |
A comma-separated list of FQDN for those cluster nodes that will run the ZooKeeper hosts. |
ZOOKEEEPER-HOST.acme.com |
Optional |
FALCON_HOST |
A comma-separated list of FQDN for those cluster nodes that will run the Falcon hosts. |
falcon.acme.com, falcon1.acme.com, falcon2.acme.com |
Optional |
KNOX_HOST |
The FQDN of the Knox Gateway host. |
KNOX-HOST.acme.com |
Optional |
STORM_SUPERVISORS |
A comma-separated list of FQDN for those cluster nodes that will run the Storm Supervisor hosts. |
supervisor.acme.com, supervisor1.acme.com, supervisor2.acme.com |
Optional |
STORM_NIMBUS |
The FQDN of the Storm Nimbus Server. |
STORM-HOST.acme.com |
Optional |
DB_FLAVOR |
Database type for Hive and Oozie metastores (allowed databases are SQL Server and Derby). To use default embedded Derby instance, set the value of this property to derby. To use an existing SQL Server instance as the metastore DB, set the value as mssql. |
mssql or derby |
Mandatory |
DB_PORT |
Port address, required only if you are using SQL Server for Hive and Oozie metastores. |
1433 (default) |
Optional |
DB_HOSTNAME |
FQDN for the node where the metastore database service is installed. If using SQL Server, set the value to your SQL Server host name. If using Derby for Hive metastore, set the value to HIVE_SERVER_HOST. |
sqlserver1.acme.com |
Mandatory |
HIVE_DB_NAME |
Database for Hive metastore. If using SQL Server, ensure that you create the database on the SQL Server instance. |
hivedb |
Mandatory |
HIVE_DB_USERNAME |
User account credentials for Hive metastore database instance. Ensure that this user account has appropriate permissions. |
hive_user |
Mandatory |
HIVE_DB_PASSWORD |
User account credentials for Hive metastore database instance. Ensure that this user account has appropriate permissions. |
hive_pass |
Mandatory |
OOZIE_DB_NAME |
Database for Oozie metastore. If using SQL Server, ensure that you create the database on the SQL Server instance. |
ooziedb |
Mandatory |
OOZIE_DB_USERNAME |
User account credentials for Oozie metastore database instance. Ensure that this user account has appropriate permissions. |
oozie_user |
Mandatory |
OOZIE_DB_PASSWORD |
User account credentials for Oozie metastore database instance. Ensure that this user account has appropriate permissions. |
oozie_pass |
Mandatory |
DEFAULT_FS |
Default file system. |
HDFS |
|
RESOURCEMANAGER_HOST |
Host used for Resource Manager |
|
|
IS_TEZ |
Installs the Tez component on Hive host. |
YES or NO |
Optional |
ENABLE_LZO |
Enables the LZO codec for compression in HBase cells. |
YES or NO |
Optional |
IS_PHOENIX |
Installs Phoenix on the HBase hosts. |
YES or NO |
Optional |
IS_HDFS_HA |
Specify whether to enable High Availability for HDFS |
YES or NO |
Mandatory |
SPARK_JOB_SERVER |
Specifies the Spark job history server |
onprem-ranger1 |
Optional |
SPARK_HIVE_METASTORE |
Specifies the Hive metastore for Spark |
metastore |
Optional |
HIVE_DR |
Indicates whether you want to install HiveDR |
YES or NO |
Optional |
Configuration Values: High Availability
To ensure that a multi-node cluster remains available, configure and enable High Availability. Configuring High Availability includes defining locations and names of hosts in a cluster that are available to act as journal nodes and a standby name node in the event that the primary name node fails. To configure High Availability, add the following properties to your cluster properties file, and set their values as follows:
Note | |
---|---|
To enable High Availability, you must also run several HA-specific commands when you start cluster services. |
Table 2.12. High Availability configuration information
Configuration Property Name |
Description |
Example Value |
Mandatory/Optional |
---|---|---|---|
HA |
Whether to deploy a highly available NameNode or not. |
yes or no |
Optional |
NN_HA_JOURNALNODE_ HOSTS |
A comma-separated list of FQDN for those cluster nodes that will run the JournalNode processes. |
journalnode1.acme.com, journalnode2.acme.com, journalnode3.acme.com |
Optional |
NN_HA_CLUSTER_NAME |
This name is used for both configuration and authority component of absolute HDFS paths in the cluster. |
hdp2-ha |
Optional |
NN_HA_JOURNALNODE_ EDITS_DIR |
This is the absolute path on the JournalNode machines where the edits and other local state used by the JournalNodes (JNs) are stored. You can only use a single path for this configuration. |
d:\hadoop\journal |
Optional |
NN_HA_STANDBY_ NAMENODE_HOST |
The host for the standby NameNode. |
STANDBY_NAMENODE.acme.com |
Optional |
RM_HA_CLUSTER_NAME |
A logical name for the Resource Manager cluster. |
HA Resource Manager |
Optional |
RM_HA_STANDBY_ RESOURCEMANAGER_ HOST |
The FQDN of the standby resource manager host. |
rm-standby-host.acme.com |
Optional |
Configuration Values: Ranger
Note | |
---|---|
"Mandatory" means that the property must be specified if Ranger is enabled. |
Table 2.13. Ranger configuration information
Configuration Property Name |
Description |
Example Value |
Mandatory/Optional/Conditional |
---|---|---|---|
RANGER_HOST |
Host name of the host where Ranger-Admin and Ranger-UserSync services will be installed |
WIN-Q0E0PEACTR |
Mandatory |
RANGER_ADMIN_DB_HOST |
MySQL server instance for use by the Ranger Admin database host. (MySQL should be up and running at installation time.) |
localhost |
Mandatory |
RANGER_ADMIN_DB_PORT |
Port number for Ranger-Admin database server |
3306 |
Mandatory |
RANGER_ADMIN_DB_ROOT_ PASSWORD |
Database root password (required for policy/audit database creation) |
adm2 |
Mandatory |
RANGER_ADMIN_DB_ DBNAME |
Ranger-Admin policy database name |
ranger (default) |
Mandatory |
RANGER_ADMIN_DB_ USERNAME |
Ranger-Admin policy database user name |
rangeradmin (default) |
Mandatory |
RANGER_ADMIN_DB_ PASSWORD |
Password for the RANGER_ADMIN_DB_USERNAME user |
RangerAdminPassW0Rd |
Mandatory |
RANGER_AUDIT_DB_HOST |
Host for Ranger Audit database. (MySQL should be up and running at installation time). This can be the same as RANGER_ADMIN_DB_HOST or you can specify a different server. |
localhost |
Mandatory |
RANGER_AUDIT_DB_PORT |
Port number where Ranger-Admin runs audit service |
3306 |
Mandatory |
RANGER_AUDIT_DB_ROOT _PASSWORD |
Database password for the RANGER_AUDIT_DB_USERNAME (required for audit database creation) |
RangerAuditPassW0Rd |
Mandatory |
RANGER_EXTERNAL_URL |
URL used for Ranger |
localhost:8080 |
Optional |
RANGER_AUDIT_DB_ DBNAME |
Ranger audit database name. This can be a different database in the same database server mentioned above. |
ranger_audit (default) |
Mandatory |
RANGER_AUDIT_DB_ USERNAME |
Database user that performs all audit logging operations from Ranger plugins |
rangerlogger (default) |
Mandatory |
RANGER_AUDIT_DB_ PASSWORD |
Database password for the RANGER_AUDIT_DB_USERNAME user |
RangerAuditPassW0Rd |
Mandatory |
RANGER_AUTHENTICA- TION_METHOD |
Authentication Method used to login into the Policy Admin Tool. |
None: allows only users created within Policy Admin Tool (default) LDAP: allows users to be authenticated using Corporate LDAP. AD: allows users to be authenticated using a Active Directory. |
Mandatory |
RANGER_LDAP_URL |
URL for the LDAP service |
ldap://71.127.43.33:386 |
Mandatory if authentication method is LDAP |
RANGER_LDAP_ USERDNPATTERN |
LDAP DN pattern used to locate the login user (uniquely) |
uid={0},ou=users,dc=ranger2, dc=net |
Mandatory if authentication method is LDAP |
RANGER_LDAP_ GROUPSEARCHBASE |
Defines the part of the LDAP directory tree under which group searches should be performed |
ou=groups,dc=ranger2, dc=net |
Mandatory if authentication method is LDAP |
RANGER_LDAP_ GROUPSEARCHFILTER |
LDAP search filter used to retrieve groups for the login user |
(member=uid={0},ou=users, dc=ranger2,dc=net) |
Mandatory if authentication method is LDAP |
RANGER_LDAP_ GROUPROLEATTRIBUTE |
Contains the name of the authority defined by the group entry, used to retrieve the group names from the group search filters |
cn |
Mandatory if authentication method is LDAP |
RANGER_LDAP_AD_ DOMAIN |
Active Directory Domain Name used for AD login |
rangerad.net |
Mandatory if authentication method is Active Directory |
RANGER_LDAP_AD_URL |
Active Directory LDAP URL for authentication of user |
ldap://ad.rangerad.net:389 |
Mandatory if authentication method is Active Directory |
RANGER_POLICY_ADMIN _URL |
URL used within policy admin tool when a link to its own page is generated in the policy admin tool website |
localhost:6080 |
Optional |
RANGER_HDFS_REPO |
The repository name used in Policy Admin Tool for defining policies for HDFS |
hadoopdev |
Mandatory if using Ranger on HDFS |
RANGER_HIVE_REPO |
The repository name used in Policy Admin Tool for defining policies for Hive |
hivedev |
Mandatory if using Ranger on Hive |
RANGER_HBASE_REPO |
The repository name used in Policy Admin Tool for defining policies for HBase |
hbasedev |
Mandatory if using Ranger on HBase |
RANGER_KNOX_REPO |
The repository name used in Policy Admin Tool for defining policies for Knox |
knoxdev |
Mandatory if using Ranger on Knox |
RANGER_STORM_REPO |
The repository name used in Policy Admin Tool for defining policies for Storm |
stormdev |
Mandatory if using Ranger on Storm |
RANGER_SYNC_INTERVAL |
Specifies the interval (in minutes) between synchronization cycles. Note: the second sync cycle will NOT start until the first sync cycle is complete. |
5 |
Mandatory |
RANGER_SYNC_LDAP_URL |
LDAP URL for synchronizing users |
ldap://ldap.example.com:389 |
Mandatory |
RANGER_SYNC_LDAP_ BIND_DN |
LDAP bind DN used to connect to LDAP and query for users and group. This must be a user with admin privileges to search the directory for users/groups. |
cn=admin,ou=users, dc=hadoop,dc=apache, dc-org |
Mandatory |
RANGER_SYNC_LDAP_ BIND_PASSWORD |
Password for the LDAP bind DN |
LdapAdminPassW0Rd |
Mandatory |
RANGER_SYNC_LDAP_ USER_SEARCH_SCOPE |
Scope for user search |
base, one and sub are supported values |
Mandatory |
RANGER_SYNC_LDAP_ USER_OBJECT_CLASS |
Object class to identify user entries |
person (default) |
Mandatory |
RANGER_SYNC_LDAP_ USER_NAME_ATTRIBUTE |
Attribute from user entry that will be treated as user name |
cn (default) |
Mandatory |
RANGER_SYNC_LDAP_ USER_GROUP_NAME _ATTRIBUTE |
Attribute from user entry whose values will be treated as group values to be pushed into the Policy Manager database. |
One or more attribute names separated by commas, such as: memberof,ismemberof |
Mandatory |
RANGER_SYNC_LDAP_ USERNAME_CASE _CONVERSION |
Convert all user names to lowercase or uppercase |
none: no conversion; keep as-is in SYNC_SOURCE. lower: (default) convert to lowercase when saving user names to the Ranger database. upper: convert to uppercase when saving user names to the Ranger db. |
Mandatory |
RANGER_SYNC_LDAP_ GROUPNAME_CASE _CONVERSION |
Convert all group names to lowercase or uppercase |
(same as user name case conversion property) |
Mandatory |
RANGER_SYNC_LDAP_ USER_SEARCH_BASE |
Search base for users |
ou=users,dc=hadoop, dc=apache,dc=org |
Mandatory |
AUTHSERVICEHOSTNAME |
Server Name (or IP address) where Ranger-Usersync module is running (along with Unix Authentication Service) |
localhost (default) |
Mandatory |
AUTHSERVICEPORT |
Port Number where Ranger-Usersync module is running the Unix Authentication Service |
5151 (default) |
Mandatory |
POLICYMGR_HTTP_ENABLED |
Flag to enable/disable HTTP protocol for downloading policies by Ranger plugin modules |
true (default) |
Mandatory |
REMOTELOGINENABLED |
Flag to enable/disable remote Login via Unix Authentication Mode |
true (default) |
Mandatory |
SYNCSOURCE |
Specifies where the user/group information is extracted to be put into ranger database. |
LDAP |
|
Sample Cluster Properties File
The following snapshot illustrates a sample cluster properties file:
A Typical Hadoop Cluster. #Log directory HDP_LOG_DIR=d:\hadoop\logs #Data directory HDP_DATA_DIR=d:\hadoop\data HDFS_NAMENODE_DATA_DIR=d:\hadoop\data\hdfs\nn,c:\hdpdata,d:\hdpdatann HDFS_DATANODE_DATA_DIR=d:\hadoop\data\hdfs\dn,c:\hdpdata,d:\hdpdatadn #Hosts NAMENODE_HOST=onprem-ranger1 SECONDARY_NAMENODE_HOST=onprem-ranger1 HIVE_SERVER_HOST=onprem-ranger1 OOZIE_SERVER_HOST=onprem-ranger1 WEBHCAT_HOST=onprem-ranger1 FLUME_HOSTS=onprem-ranger1 HBASE_MASTER=onprem-ranger1 HBASE_REGIONSERVERS=onprem-ranger2 SLAVE_HOSTS=onprem-ranger2 ZOOKEEPER_HOSTS=onprem-ranger1 KNOX_HOST=onprem-ranger2 STORM_SUPERVISORS=onprem-ranger2 STORM_NIMBUS=onprem-ranger1 SPARK_JOB_SERVER=onprem-ranger1 SPARK_HIVE_METASTORE=metastore IS_SLIDER= #Database host DB_FLAVOR=mssql DB_PORT=9433 DB_HOSTNAME=singlehcatms7.cloudapp.net #Hive properties HIVE_DB_NAME=onpremranger1hive HIVE_DB_USERNAME=hive HIVE_DB_PASSWORD=hive HIVE_DR=YES #Oozie properties OOZIE_DB_NAME=onpremranger1oozie OOZIE_DB_USERNAME=oozie OOZIE_DB_PASSWORD=oozie #ASV/HDFS properties DEFAULT_FS=HDFS RESOURCEMANAGER_HOST=onprem-ranger1 IS_TEZ=yes ENABLE_LZO=yes RANGER_HOST=onprem-ranger1 RANGER_ADMIN_DB_HOST=localhost RANGER_ADMIN_DB_PORT=3306 RANGER_ADMIN_DB_ROOT_PASSWORD=hcattest RANGER_ADMIN_DB_DBNAME= xasecure RANGER_ADMIN_DB_USERNAME= xaadmin RANGER_ADMIN_DB_PASSWORD=admin RANGER_AUDIT_DB_HOST=localhost RANGER_AUDIT_DB_PORT=3306 RANGER_AUDIT_DB_ROOT_PASSWORD=hcattest RANGER_EXTERNAL_URL=http://localhost:6080 RANGER_AUDIT_DB_DBNAME= xasecure RANGER_AUDIT_DB_USERNAME= xalogger RANGER_AUDIT_DB_PASSWORD=xalogger RANGER_AUTHENTICATION_METHOD=LDAP RANGER_LDAP_URL=ldap://71.127.43.33:389 RANGER_LDAP_USERDNPATTERN=uid={0},ou=users,dc=xasecure,dc=net RANGER_LDAP_GROUPSEARCHBASE=ou=groups,dc=xasecure,dc=net RANGER_LDAP_GROUPSEARCHFILTER=(member=uid={0},ou=users,dc=xasecure,dc=net) RANGER_LDAP_GROUPROLEATTRIBUTE=cn RANGER_POLICY_ADMIN_URL=http://localhost:6080 RANGER_HDFS_REPO=hadoopdev RANGER_HIVE_REPO=hivedev RANGER_HBASE_REPO=hbasedev RANGER_KNOX_REPO=knoxdev RANGER_STORM_REPO=stormdev RANGER_SYNC_INTERVAL=360 RANGER_SYNC_LDAP_URL=ldap://10.0.0.4:389 RANGER_SYNC_LDAP_BIND_DN=cn=Administrator,cn=users,dc=hwqe,dc=net RANGER_SYNC_LDAP_BIND_PASSWORD=Horton!#%works RANGER_SYNC_LDAP_USER_SEARCH_SCOPE=sub RANGER_SYNC_LDAP_USER_OBJECT_CLASS=person RANGER_SYNC_LDAP_USER_NAME_ATTRIBUTE=cn RANGER_SYNC_LDAP_USER_GROUP_NAME_ATTRIBUTE=memberof,ismemberof RANGER_SYNC_LDAP_USERNAME_CASE_CONVERSION=lower RANGER_SYNC_LDAP_GROUPNAME_CASE_CONVERSION=lower AUTHSERVICEHOSTNAME=localhost AUTHSERVICEPORT=5151 RANGER_SYNC_LDAP_USER_SEARCH_BASE=cn=users,dc=hwqe,dc=net POLICYMGR_HTTP_ENABLED=true REMOTELOGINENABLED=true SYNCSOURCE=LDAP