7. Manually Creating a Cluster Properties File

Use the following instructions to manually configure the cluster properties file for deploying HDP from the command-line interface or in a script.

  1. Create a file for the cluster properties, or use the sample clusterproperties.txt file extracted from the HDP Installation zip file. You'll pass the name of the cluster properties file to the msiexec call when you install HDP. The following examples use the file name clusterproperties.txt.

  2. Add the properties to the clusterproperties.txt file as described in the table below. As you add properties, keep in mind the following:

    • All properties in the cluster properties file must be separated by a newline character.

    • Directory paths cannot contain white space characters. (For example, c:\Program Files\Hadoop is an invalid directory path for HDP.)

    • Use Fully Qualified Domain Names (FQDN) to specify the network host name for each cluster host.

      The FQDN is a DNS name that uniquely identifies the computer on the network. By default, it is a concatenation of the host name, the primary DNS suffix, and a period.

    • When specifying the host lists in the cluster properties file, if the hosts are multi-homed or have multiple NIC cards, make sure that each name or IP address is the preferred name or IP address by which the hosts can communicate among themselves. In other words, these should be the addresses used internal to the cluster, not those used for addressing cluster nodes from outside the cluster.

    • To Enable NameNode HA, you must include the HA properties and exclude the SECONDARY_NAMENODE_HOST definition.

 

Table 2.11. Configuration Values for Deploying HDP

Configuration Property Name

Description

Example Value

Mandatory/Optional

HDP_LOG_DIR

HDP's operational logs are written to this directory on each cluster host. Ensure that you have sufficient disk space for storing these log files.

d:\hadoop\logs

Mandatory

HDP_DATA_DIR

HDP data will be stored in this directory on each cluster node. You can add multiple comma-separated data locations for multiple data directories.

d:\hdp\data

Mandatory

HDFS_NAMENODE_ DATA_DIR

Determines where on the local file system the HDFS name node should store the name table (fsimage). You can add multiple comma-separated data locations for multiple data directories.

d:\hadoop\data\hdfs\nn,c:\hdpdata,d:\hdpdatann

Mandatory

HDFS_DATANODE_ DATA_DIR

Determines where on the local file system an HDFS data node should store its blocks. You can add multiple comma-separated data locations for multiple data directories.

d:\hadoop\data\hdfs\dn,c:\hdpdata,d:\hdpdatadn

Mandatory

NAMENODE_HOST

The FQDN for the cluster node that will run the NameNode master service.

NAMENODE-MASTER.acme.com

Mandatory

SECONDARY_NAMENODE_ HOST

The FQDN for the cluster node that will run the Secondary NameNode master service.

SECONDARY-NN-MASTER.acme.com

Mandatory when no HA

RESOURCEMANAGER_HOST

The FQDN for the cluster node that will run the YARN Resource Manager master service.

RESOURCE-MANAGER.acme.com

Mandatory

HIVE_SERVER_HOST

The FQDN for the cluster node that will run the Hive Server master service.

HIVE-SERVER-MASTER.acme.com

Mandatory

OOZIE_SERVER_HOST

The FQDN for the cluster node that will run the Oozie Server master service.

OOZIE-SERVER-MASTER.acme.com

Mandatory

WEBHCAT_HOST

The FQDN for the cluster node that will run the WebHCat master service.

WEBHCAT-MASTER.acme.com

Mandatory

FLUME_HOSTS

A comma-separated list of FQDN for those cluster nodes that will run the Flume service.

FLUME-SERVICE1.acme.com, FLUME-SERVICE2.acme.com, FLUME-SERVICE3.acme.com

Mandatory

HBASE_MASTER

The FQDN for the cluster node that will run the HBase master.

HBASE-MASTER.acme.com

Mandatory

HBASE_REGIONSERVERS

A comma-separated list of FQDN for those cluster nodes that will run the HBase Region Server services.

slave1.acme.com, slave2.acme.com, slave3.acme.com

Mandatory

SLAVE_HOSTS

A comma-separated list of FQDN for those cluster nodes that will run the DataNode and TaskTracker services.

slave1.acme.com, slave2.acme.com, slave3.acme.com

Mandatory

ZOOKEEPER_HOSTS

A comma-separated list of FQDN for those cluster nodes that will run the ZooKeeper hosts.

ZOOKEEEPER-HOST.acme.com

Optional

FALCON_HOST

A comma-separated list of FQDN for those cluster nodes that will run the Falcon hosts.

falcon.acme.com, falcon1.acme.com, falcon2.acme.com

Optional

KNOX_HOST

The FQDN of the Knox Gateway host.

KNOX-HOST.acme.com

Optional

STORM_SUPERVISORS

A comma-separated list of FQDN for those cluster nodes that will run the Storm Supervisor hosts.

supervisor.acme.com, supervisor1.acme.com, supervisor2.acme.com

Optional

STORM_NIMBUS

The FQDN of the Storm Nimbus Server.

STORM-HOST.acme.com

Optional

DB_FLAVOR

Database type for Hive and Oozie metastores (allowed databases are SQL Server and Derby). To use default embedded Derby instance, set the value of this property to derby. To use an existing SQL Server instance as the metastore DB, set the value as mssql.

mssql or derby

Mandatory

DB_PORT

Port address, required only if you are using SQL Server for Hive and Oozie metastores.

1433 (default)

Optional

DB_HOSTNAME

FQDN for the node where the metastore database service is installed. If using SQL Server, set the value to your SQL Server host name. If using Derby for Hive metastore, set the value to HIVE_SERVER_HOST.

sqlserver1.acme.com

Mandatory

HIVE_DB_NAME

Database for Hive metastore. If using SQL Server, ensure that you create the database on the SQL Server instance.

hivedb

Mandatory

HIVE_DB_USERNAME

User account credentials for Hive metastore database instance. Ensure that this user account has appropriate permissions.

hive_user

Mandatory

HIVE_DB_PASSWORD

User account credentials for Hive metastore database instance. Ensure that this user account has appropriate permissions.

hive_pass

Mandatory

OOZIE_DB_NAME

Database for Oozie metastore. If using SQL Server, ensure that you create the database on the SQL Server instance.

ooziedb

Mandatory

OOZIE_DB_USERNAME

User account credentials for Oozie metastore database instance. Ensure that this user account has appropriate permissions.

oozie_user

Mandatory

OOZIE_DB_PASSWORD

User account credentials for Oozie metastore database instance. Ensure that this user account has appropriate permissions.

oozie_pass

Mandatory

DEFAULT_FS

Default file system.

HDFS

RESOURCEMANAGER_HOST

Host used for Resource Manager

IS_TEZ

Installs the Tez component on Hive host.

YES or NO

Optional

ENABLE_LZO

Enables the LZO codec for compression in HBase cells.

YES or NO

Optional

IS_PHOENIX

Installs Phoenix on the HBase hosts.

YES or NO

Optional

IS_HDFS_HA

Specify whether to enable High Availability for HDFS

YES or NO

Mandatory

SPARK_JOB_SERVER

Specifies the Spark job history server

onprem-ranger1

Optional

SPARK_HIVE_METASTORE

Specifies the Hive metastore for Spark

metastore

Optional

HIVE_DR

Indicates whether you want to install HiveDR

YES or NO

Optional


Configuration Values: High Availability

To ensure that a multi-node cluster remains available, configure and enable High Availability. Configuring High Availability includes defining locations and names of hosts in a cluster that are available to act as journal nodes and a standby name node in the event that the primary name node fails. To configure High Availability, add the following properties to your cluster properties file, and set their values as follows:

[Note]Note

To enable High Availability, you must also run several HA-specific commands when you start cluster services.

 

Table 2.12. High Availability configuration information

Configuration Property Name

Description

Example Value

Mandatory/Optional

HA

Whether to deploy a highly available NameNode or not.

yes or no

Optional

NN_HA_JOURNALNODE_ HOSTS

A comma-separated list of FQDN for those cluster nodes that will run the JournalNode processes.

journalnode1.acme.com, journalnode2.acme.com, journalnode3.acme.com

Optional

NN_HA_CLUSTER_NAME

This name is used for both configuration and authority component of absolute HDFS paths in the cluster.

hdp2-ha

Optional

NN_HA_JOURNALNODE_ EDITS_DIR

This is the absolute path on the JournalNode machines where the edits and other local state used by the JournalNodes (JNs) are stored. You can only use a single path for this configuration.

d:\hadoop\journal

Optional

NN_HA_STANDBY_ NAMENODE_HOST

The host for the standby NameNode.

STANDBY_NAMENODE.acme.com

Optional

RM_HA_CLUSTER_NAME

A logical name for the Resource Manager cluster.

HA Resource Manager

Optional

RM_HA_STANDBY_ RESOURCEMANAGER_ HOST

The FQDN of the standby resource manager host.

rm-standby-host.acme.com

Optional


Configuration Values: Ranger

[Note]Note

"Mandatory" means that the property must be specified if Ranger is enabled.

 

Table 2.13. Ranger configuration information

Configuration Property Name

Description

Example Value

Mandatory/Optional/Conditional

RANGER_HOST

Host name of the host where Ranger-Admin and Ranger-UserSync services will be installed

WIN-Q0E0PEACTR

Mandatory

RANGER_ADMIN_DB_HOST

MySQL server instance for use by the Ranger Admin database host. (MySQL should be up and running at installation time.)

localhost

Mandatory

RANGER_ADMIN_DB_PORT

Port number for Ranger-Admin database server

3306

Mandatory

RANGER_ADMIN_DB_ROOT_ PASSWORD

Database root password (required for policy/audit database creation)

adm2

Mandatory

RANGER_ADMIN_DB_ DBNAME

Ranger-Admin policy database name

ranger (default)

Mandatory

RANGER_ADMIN_DB_ USERNAME

Ranger-Admin policy database user name

rangeradmin (default)

Mandatory

RANGER_ADMIN_DB_ PASSWORD

Password for the RANGER_ADMIN_DB_USERNAME user

RangerAdminPassW0Rd

Mandatory

RANGER_AUDIT_DB_HOST

Host for Ranger Audit database. (MySQL should be up and running at installation time). This can be the same as RANGER_ADMIN_DB_HOST or you can specify a different server.

localhost

Mandatory

RANGER_AUDIT_DB_PORT

Port number where Ranger-Admin runs audit service

3306

Mandatory

RANGER_AUDIT_DB_ROOT _PASSWORD

Database password for the RANGER_AUDIT_DB_USERNAME (required for audit database creation)

RangerAuditPassW0Rd

Mandatory

RANGER_EXTERNAL_URL

URL used for Ranger

localhost:8080

Optional

RANGER_AUDIT_DB_ DBNAME

Ranger audit database name. This can be a different database in the same database server mentioned above.

ranger_audit (default)

Mandatory

RANGER_AUDIT_DB_ USERNAME

Database user that performs all audit logging operations from Ranger plugins

rangerlogger (default)

Mandatory

RANGER_AUDIT_DB_ PASSWORD

Database password for the RANGER_AUDIT_DB_USERNAME user

RangerAuditPassW0Rd

Mandatory

RANGER_AUTHENTICA- TION_METHOD

Authentication Method used to login into the Policy Admin Tool.

None: allows only users created within Policy Admin Tool (default) LDAP: allows users to be authenticated using Corporate LDAP. AD: allows users to be authenticated using a Active Directory.

Mandatory

RANGER_LDAP_URL

URL for the LDAP service

ldap://71.127.43.33:386

Mandatory if authentication method is LDAP

RANGER_LDAP_ USERDNPATTERN

LDAP DN pattern used to locate the login user (uniquely)

uid={0},ou=users,dc=ranger2, dc=net

Mandatory if authentication method is LDAP

RANGER_LDAP_ GROUPSEARCHBASE

Defines the part of the LDAP directory tree under which group searches should be performed

ou=groups,dc=ranger2, dc=net

Mandatory if authentication method is LDAP

RANGER_LDAP_ GROUPSEARCHFILTER

LDAP search filter used to retrieve groups for the login user

(member=uid={0},ou=users, dc=ranger2,dc=net)

Mandatory if authentication method is LDAP

RANGER_LDAP_ GROUPROLEATTRIBUTE

Contains the name of the authority defined by the group entry, used to retrieve the group names from the group search filters

cn

Mandatory if authentication method is LDAP

RANGER_LDAP_AD_ DOMAIN

Active Directory Domain Name used for AD login

rangerad.net

Mandatory if authentication method is Active Directory

RANGER_LDAP_AD_URL

Active Directory LDAP URL for authentication of user

ldap://ad.rangerad.net:389

Mandatory if authentication method is Active Directory

RANGER_POLICY_ADMIN _URL

URL used within policy admin tool when a link to its own page is generated in the policy admin tool website

localhost:6080

Optional

RANGER_HDFS_REPO

The repository name used in Policy Admin Tool for defining policies for HDFS

hadoopdev

Mandatory if using Ranger on HDFS

RANGER_HIVE_REPO

The repository name used in Policy Admin Tool for defining policies for Hive

hivedev

Mandatory if using Ranger on Hive

RANGER_HBASE_REPO

The repository name used in Policy Admin Tool for defining policies for HBase

hbasedev

Mandatory if using Ranger on HBase

RANGER_KNOX_REPO

The repository name used in Policy Admin Tool for defining policies for Knox

knoxdev

Mandatory if using Ranger on Knox

RANGER_STORM_REPO

The repository name used in Policy Admin Tool for defining policies for Storm

stormdev

Mandatory if using Ranger on Storm

RANGER_SYNC_INTERVAL

Specifies the interval (in minutes) between synchronization cycles. Note: the second sync cycle will NOT start until the first sync cycle is complete.

5

Mandatory

RANGER_SYNC_LDAP_URL

LDAP URL for synchronizing users

ldap://ldap.example.com:389

Mandatory

RANGER_SYNC_LDAP_ BIND_DN

LDAP bind DN used to connect to LDAP and query for users and group. This must be a user with admin privileges to search the directory for users/groups.

cn=admin,ou=users, dc=hadoop,dc=apache, dc-org

Mandatory

RANGER_SYNC_LDAP_ BIND_PASSWORD

Password for the LDAP bind DN

LdapAdminPassW0Rd

Mandatory

RANGER_SYNC_LDAP_ USER_SEARCH_SCOPE

Scope for user search

base, one and sub are supported values

Mandatory

RANGER_SYNC_LDAP_ USER_OBJECT_CLASS

Object class to identify user entries

person (default)

Mandatory

RANGER_SYNC_LDAP_ USER_NAME_ATTRIBUTE

Attribute from user entry that will be treated as user name

cn (default)

Mandatory

RANGER_SYNC_LDAP_ USER_GROUP_NAME _ATTRIBUTE

Attribute from user entry whose values will be treated as group values to be pushed into the Policy Manager database.

One or more attribute names separated by commas, such as: memberof,ismemberof

Mandatory

RANGER_SYNC_LDAP_ USERNAME_CASE _CONVERSION

Convert all user names to lowercase or uppercase

none: no conversion; keep as-is in SYNC_SOURCE. lower: (default) convert to lowercase when saving user names to the Ranger database. upper: convert to uppercase when saving user names to the Ranger db.

Mandatory

RANGER_SYNC_LDAP_ GROUPNAME_CASE _CONVERSION

Convert all group names to lowercase or uppercase

(same as user name case conversion property)

Mandatory

RANGER_SYNC_LDAP_ USER_SEARCH_BASE

Search base for users

ou=users,dc=hadoop, dc=apache,dc=org

Mandatory

AUTHSERVICEHOSTNAME

Server Name (or IP address) where Ranger-Usersync module is running (along with Unix Authentication Service)

localhost (default)

Mandatory

AUTHSERVICEPORT

Port Number where Ranger-Usersync module is running the Unix Authentication Service

5151 (default)

Mandatory

POLICYMGR_HTTP_ENABLED

Flag to enable/disable HTTP protocol for downloading policies by Ranger plugin modules

true (default)

Mandatory

REMOTELOGINENABLED

Flag to enable/disable remote Login via Unix Authentication Mode

true (default)

Mandatory

SYNCSOURCE

Specifies where the user/group information is extracted to be put into ranger database.

LDAP


Sample Cluster Properties File

The following snapshot illustrates a sample cluster properties file:

A Typical Hadoop Cluster.
 #Log directory
 HDP_LOG_DIR=d:\hadoop\logs
 
 #Data directory
 HDP_DATA_DIR=d:\hadoop\data
 HDFS_NAMENODE_DATA_DIR=d:\hadoop\data\hdfs\nn,c:\hdpdata,d:\hdpdatann
 HDFS_DATANODE_DATA_DIR=d:\hadoop\data\hdfs\dn,c:\hdpdata,d:\hdpdatadn
 
 #Hosts
 NAMENODE_HOST=onprem-ranger1
 SECONDARY_NAMENODE_HOST=onprem-ranger1
 HIVE_SERVER_HOST=onprem-ranger1
 OOZIE_SERVER_HOST=onprem-ranger1
 WEBHCAT_HOST=onprem-ranger1
 FLUME_HOSTS=onprem-ranger1
 HBASE_MASTER=onprem-ranger1
 HBASE_REGIONSERVERS=onprem-ranger2
 SLAVE_HOSTS=onprem-ranger2
 ZOOKEEPER_HOSTS=onprem-ranger1
 KNOX_HOST=onprem-ranger2
 STORM_SUPERVISORS=onprem-ranger2
 STORM_NIMBUS=onprem-ranger1
 SPARK_JOB_SERVER=onprem-ranger1
 SPARK_HIVE_METASTORE=metastore
 IS_SLIDER=
 
 #Database host
 DB_FLAVOR=mssql
 DB_PORT=9433
 DB_HOSTNAME=singlehcatms7.cloudapp.net
 
 #Hive properties
 HIVE_DB_NAME=onpremranger1hive
 HIVE_DB_USERNAME=hive
 HIVE_DB_PASSWORD=hive
 HIVE_DR=YES
 
 #Oozie properties
 OOZIE_DB_NAME=onpremranger1oozie
 OOZIE_DB_USERNAME=oozie
 OOZIE_DB_PASSWORD=oozie
 
 #ASV/HDFS properties
 DEFAULT_FS=HDFS
 RESOURCEMANAGER_HOST=onprem-ranger1
 IS_TEZ=yes
 ENABLE_LZO=yes
 RANGER_HOST=onprem-ranger1
 RANGER_ADMIN_DB_HOST=localhost
 RANGER_ADMIN_DB_PORT=3306
 RANGER_ADMIN_DB_ROOT_PASSWORD=hcattest
 RANGER_ADMIN_DB_DBNAME= xasecure
 RANGER_ADMIN_DB_USERNAME= xaadmin
 RANGER_ADMIN_DB_PASSWORD=admin
 RANGER_AUDIT_DB_HOST=localhost
 RANGER_AUDIT_DB_PORT=3306
 RANGER_AUDIT_DB_ROOT_PASSWORD=hcattest
 RANGER_EXTERNAL_URL=http://localhost:6080
 RANGER_AUDIT_DB_DBNAME= xasecure
 RANGER_AUDIT_DB_USERNAME= xalogger
 RANGER_AUDIT_DB_PASSWORD=xalogger
 RANGER_AUTHENTICATION_METHOD=LDAP
 RANGER_LDAP_URL=ldap://71.127.43.33:389
 RANGER_LDAP_USERDNPATTERN=uid={0},ou=users,dc=xasecure,dc=net
 RANGER_LDAP_GROUPSEARCHBASE=ou=groups,dc=xasecure,dc=net
 RANGER_LDAP_GROUPSEARCHFILTER=(member=uid={0},ou=users,dc=xasecure,dc=net)
 RANGER_LDAP_GROUPROLEATTRIBUTE=cn
 RANGER_POLICY_ADMIN_URL=http://localhost:6080
 RANGER_HDFS_REPO=hadoopdev
 RANGER_HIVE_REPO=hivedev
 RANGER_HBASE_REPO=hbasedev
 RANGER_KNOX_REPO=knoxdev
 RANGER_STORM_REPO=stormdev
 RANGER_SYNC_INTERVAL=360
 RANGER_SYNC_LDAP_URL=ldap://10.0.0.4:389
 RANGER_SYNC_LDAP_BIND_DN=cn=Administrator,cn=users,dc=hwqe,dc=net
 RANGER_SYNC_LDAP_BIND_PASSWORD=Horton!#%works
 RANGER_SYNC_LDAP_USER_SEARCH_SCOPE=sub
 RANGER_SYNC_LDAP_USER_OBJECT_CLASS=person
 RANGER_SYNC_LDAP_USER_NAME_ATTRIBUTE=cn
 RANGER_SYNC_LDAP_USER_GROUP_NAME_ATTRIBUTE=memberof,ismemberof
 RANGER_SYNC_LDAP_USERNAME_CASE_CONVERSION=lower
 RANGER_SYNC_LDAP_GROUPNAME_CASE_CONVERSION=lower
 AUTHSERVICEHOSTNAME=localhost
 AUTHSERVICEPORT=5151
 RANGER_SYNC_LDAP_USER_SEARCH_BASE=cn=users,dc=hwqe,dc=net
 POLICYMGR_HTTP_ENABLED=true
 REMOTELOGINENABLED=true
 SYNCSOURCE=LDAP 

loading table of contents...