Deploying HDFS on a Cluster
For instructions for configuring High Availability (HA) for the NameNode, see the CDH 5 High Availability Guide. For instructions on using HDFS Access Control Lists (ACLs), see Enabling HDFS Extended ACLs.
Proceed as follows to deploy HDFS on a cluster. Do this for all clusters, whether you are deploying MRv1 or YARN:
- Copy the Hadoop configuration
- Customize configuration files
- Configure Local Storage Directories
- Configure DataNodes to tolerate local storage directory failure
- Format the NameNode
- Configure a remote NameNode storage directory
- Configure the Secondary NameNode (if used)
- Optionally enable Trash
- Optionally configure DataNode storage balancing
- Optionally enable WebHDFS
- Optionally configure LZO
- Start HDFS
- Deploy MRv1 or YARN and start services
When starting, stopping and restarting CDH components, always use the service (8) command rather than running scripts in /etc/init.d directly. This is important because service sets the current working directory to / and removes most environment variables (passing only LANG and TERM) so as to create a predictable environment in which to administer the service. If you run the scripts in/etc/init.d, any environment variables you have set remain in force, and could produce unpredictable results. (If you install CDH from packages, service will be installed as part of the Linux Standard Base (LSB).)
Copying the Hadoop Configuration and Setting Alternatives
To customize the Hadoop configuration:
- Copy the default configuration to your custom directory:
$ sudo cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster
You can call this configuration anything you like; in this example, it's called my_cluster.Important: When performing the configuration tasks in this section, and when you go on to deploy MRv1 or YARN, edit the configuration files in this custom directory. Do not create your custom configuration in the default directory /etc/hadoop/conf.empty.
- CDH uses the alternatives setting to determine which
Hadoop configuration to use. Set alternatives to point to your custom directory, as follows.
To manually set the configuration on Red Hat-compatible systems:
$ sudo alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 $ sudo alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
To manually set the configuration on Ubuntu and SLES systems:
$ sudo update-alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 $ sudo update-alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
This tells CDH to use the configuration in /etc/hadoop/conf.my_cluster.
You can display the current alternatives setting as follows.
sudo alternatives --display hadoop-conf
sudo update-alternatives --display hadoop-confYou should see output such as the following:
hadoop-conf - status is auto. link currently points to /etc/hadoop/conf.my_cluster /etc/hadoop/conf.my_cluster - priority 50 /etc/hadoop/conf.empty - priority 10 Current `best' version is /etc/hadoop/conf.my_cluster.
Because the configuration in /etc/hadoop/conf.my_cluster has the highest priority (50), that is the one CDH will use. For more information on alternatives, see the update-alternatives(8) man page on Ubuntu and SLES systems or the alternatives(8) man page On Red Hat-compatible systems.
Customizing Configuration Files
The following tables show the most important properties that you must configure for your cluster.
For information on other important configuration properties, and the configuration files, see the Apache Cluster Setup page.
Property |
Configuration File |
Description |
---|---|---|
fs.defaultFS |
core-site.xml |
Note: fs.default.name is deprecated. Specifies the NameNode and the default file system, in the form hdfs://<namenode host>:<namenode port>/. The default value is file///. The default file system is used to resolve relative paths; for example, if fs.default.name or fs.defaultFS is set to hdfs://mynamenode/, the relative URI /mydir/myfile resolves to hdfs://mynamenode/mydir/myfile. Note: for the cluster to function correctly, the <namenode> part of the string must be the hostname (for example mynamenode), or the HA-enabled logical URI, not the IP address. |
dfs.permissions.superusergroup |
hdfs-site.xml |
Specifies the UNIX group containing users that will be treated as superusers by HDFS. You can stick with the value of 'hadoop' or pick your own group depending on the security policies at your site. |
Sample Configuration
core-site.xml:
<property> <name>fs.defaultFS</name> <value>hdfs://namenode-host.company.com:8020</value> </property>
hdfs-site.xml:
<property> <name>dfs.permissions.superusergroup</name> <value>hadoop</value> </property>
Configuring Local Storage Directories
You need to specify, create, and assign the correct permissions to the local directories where you want the HDFS daemons to store data. You specify the directories by configuring the following two properties in the hdfs-site.xml file.
Property |
Configuration File Location |
Description |
---|---|---|
dfs.name.dir or dfs.namenode.name.dir |
hdfs-site.xml on the NameNode |
This property specifies the URIs of the directories where the NameNode stores its metadata and edit logs. Cloudera recommends that you specify at least two directories. One of these should be located on an NFS mount point, unless you will be using a High Availability (HA) configuration. |
dfs.data.dir or dfs.datanode.data.dir |
hdfs-site.xml on each DataNode |
This property specifies the URIs of the directories where the DataNode stores blocks. Cloudera recommends that you configure the disks on the DataNode in a JBOD configuration, mounted at /data/1/ through /data/N, and configure dfs.data.dir or dfs.datanode.data.dir to specify file:///data/1/dfs/dn through file:///data/N/dfs/dn/. |
dfs.data.dir and dfs.name.dir are deprecated; you should use dfs.datanode.data.dir and dfs.namenode.name.dir instead, though dfs.data.dir and dfs.name.dir will still work.
Sample configuration:
hdfs-site.xml on the NameNode:
<property> <name>dfs.namenode.name.dir</name> <value>file:///data/1/dfs/nn,file:///nfsmount/dfs/nn</value> </property>
hdfs-site.xml on each DataNode:
<property> <name>dfs.datanode.data.dir</name> <value>file:///data/1/dfs/dn,file:///data/2/dfs/dn,file:///data/3/dfs/dn,file:///data/4/dfs/dn</value> </property>
After specifying these directories as shown above, you must create the directories and assign the correct file permissions to them on each node in your cluster.
In the following instructions, local path examples are used to represent Hadoop parameters. Change the path examples to match your configuration.
Local directories:
- The dfs.name.dir or dfs.namenode.name.dir parameter is represented by the /data/1/dfs/nn and /nfsmount/dfs/nn path examples.
- The dfs.data.dir or dfs.datanode.data.dir parameter is represented by the /data/1/dfs/dn, /data/2/dfs/dn, /data/3/dfs/dn, and /data/4/dfs/dn examples.
To configure local storage directories for use by HDFS:
- On a NameNode host: create the dfs.name.dir or dfs.namenode.name.dir local directories:
$ sudo mkdir -p /data/1/dfs/nn /nfsmount/dfs/nn
Important: If you are using High Availability (HA), you should not configure these directories on an NFS mount; configure them on local storage.
- On all DataNode hosts: create the dfs.data.dir or dfs.datanode.data.dir local directories:
$ sudo mkdir -p /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
- Configure the owner of the dfs.name.dir or dfs.namenode.name.dir directory, and of the dfs.data.dir or dfs.datanode.data.dir directory, to be the hdfs user:
$ sudo chown -R hdfs:hdfs /data/1/dfs/nn /nfsmount/dfs/nn /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
Here is a summary of the correct owner and permissions of the local directories:Footnote: 1 The Hadoop daemons automatically set the correct permissions for you on dfs.data.dir or dfs.datanode.data.dir. But in the case of dfs.name.dir or dfs.namenode.name.dir, permissions are currently incorrectly set to the file-system default, usually drwxr-xr-x (755). Use the chmod command to reset permissions for these dfs.name.dir or dfs.namenode.name.dir directories to drwx------ (700); for example:Directory
Owner
Permissions (see Footnote 1)
dfs.name.dir or dfs.namenode.name.dir
hdfs:hdfs
drwx------
dfs.data.dir or dfs.datanode.data.dir
hdfs:hdfs
drwx------
$ sudo chmod 700 /data/1/dfs/nn /nfsmount/dfs/nn
or$ sudo chmod go-rx /data/1/dfs/nn /nfsmount/dfs/nn
Note: If you specified nonexistent directories for the dfs.data.dir or dfs.datanode.data.dir property in the hdfs-site.xml file, CDH 5 will shut down. (In previous releases, CDH silently ignored nonexistent directories for dfs.data.dir.)
Configuring DataNodes to Tolerate Local Storage Directory Failure
By default, the failure of a single dfs.data.dir or dfs.datanode.data.dir will cause the HDFS DataNode process to shut down, which results in the NameNode scheduling additional replicas for each block that is present on the DataNode. This causes needless replications of blocks that reside on disks that have not failed.
To prevent this, you can configure DataNodes to tolerate the failure of dfs.data.dir or dfs.datanode.data.dir directories; use the dfs.datanode.failed.volumes.tolerated parameter in hdfs-site.xml. For example, if the value for this parameter is 3, the DataNode will only shut down after four or more data directories have failed. This value is respected on DataNode startup; in this example the DataNode will start up as long as no more than three directories have failed.
It is important that dfs.datanode.failed.volumes.tolerated not be configured to tolerate too many directory failures, as the DataNode will perform poorly if it has few functioning data directories.
Formatting the NameNode
Before starting the NameNode for the first time you need to format the file system.
- Make sure you format the NameNode as user hdfs.
- If you are re-formatting the NameNode, keep in mind that this invalidates the DataNode storage locations, so you should remove the data under those locations after the NameNode is formatted.
$ sudo -u hdfs hdfs namenode -format
If Kerberos is enabled, do not use commands in the form sudo -u <user> hadoop <command>; they will fail with a security error. Instead, use the following commands: $ kinit <user> (if you are using a password) or $ kinit -kt <keytab> <principal> (if you are using a keytab) and then, for each command executed by this user, $ <command>
You'll get a confirmation prompt; for example:
Re-format filesystem in /data/namedir ? (Y or N)
Configuring a Remote NameNode Storage Directory
You should configure the NameNode to write to multiple storage directories, including one remote NFS mount. To keep NameNode processes from hanging when the NFS server is unavailable, configure the NFS mount as a soft mount (so that I/O requests that time out fail rather than hang), and set other options as follows:
tcp,soft,intr,timeo=10,retrans=10
These options configure a soft mount over TCP; transactions will be retried ten times (retrans=10) at 1-second intervals (timeo=10) before being deemed to have failed.
Example:
mount -t nfs -o tcp,soft,intr,timeo=10,retrans=10, <server>:<export> <mount_point>
where <server> is the remote host, <export> is the exported file system, and <mount_point> is the local mount point.
Cloudera recommends similar settings for shared HA mounts, as in the example that follows.
Example for HA:
mount -t nfs -o tcp,soft,intr,timeo=50,retrans=12, <server>:<export> <mount_point>
Note that in the HA case timeo should be set to 50 (five seconds), rather than 10 (1 second), and retrans should be set to 12, giving an overall timeout of 60 seconds.
For more information, see the man pages for mount and nfs.
Configuring Remote Directory Recovery
You can enable the dfs.namenode.name.dir.restore option so that the NameNode will attempt to recover a previously failed NameNode storage directory on the next checkpoint. This is useful for restoring a remote storage directory mount that has failed because of a network outage or intermittent NFS failure.
Configuring the Secondary NameNode
The Secondary NameNode does not provide failover or High Availability (HA). If you intend to configure HA for the NameNode, skip this section: do not install or configure the Secondary NameNode (the Standby NameNode performs checkpointing). After completing the HA software configuration, follow the installation instructions under Deploying HDFS High Availability.
In non-HA deployments, configure a Secondary NameNode that will periodically merge the EditLog with the FSImage, creating a new FSImage which incorporates the changes which were in the EditLog. This reduces the amount of disk space consumed by the EditLog on the NameNode, and also reduces the restart time for the Primary NameNode.
A standard Hadoop cluster (not a Hadoop Federation or HA configuration), can have only one Primary NameNode plus one Secondary NameNode. On production systems, the Secondary NameNode should run on a different machine from the Primary NameNode to improve scalability (because the Secondary NameNode does not compete with the NameNode for memory and other resources to create the system snapshot) and durability (because the copy of the metadata is on a separate machine that is available if the NameNode hardware fails).
Configuring the Secondary NameNode on a Separate Machine
To configure the Secondary NameNode on a separate machine from the NameNode, proceed as follows.
- Add the name of the machine that will run the Secondary NameNode to the masters file.
- Add the following property to the hdfs-site.xml file:
<property> <name>dfs.namenode.http-address</name> <value><namenode.host.address>:50070</value> <description> The address and the base port on which the dfs NameNode Web UI will listen. </description> </property>
Note: - dfs.http.address is deprecated; use dfs.namenode.http-address.
- In most cases, you should set dfs.namenode.http-address to a routable IP address with port 50070. However, in some cases such as Amazon EC2, when the NameNode should bind to multiple local addresses, you may want to set dfs.namenode.http-address to 0.0.0.0:50070 on the NameNode machine only, and set it to a real, routable address on the Secondary NameNode machine. The different addresses are needed in this case because HDFS uses dfs.namenode.http-address for two different purposes: it defines both the address the NameNode binds to, and the address the Secondary NameNode connects to for checkpointing. Using 0.0.0.0 on the NameNode allows the NameNode to bind to all its local addresses, while using the externally-routable address on the the Secondary NameNode provides the Secondary NameNode with a real address to connect to.
For more information, see Multi-host SecondaryNameNode Configuration.
More about the Secondary NameNode
- The NameNode stores the HDFS metadata information in RAM to speed up interactive lookups and modifications of the metadata.
- For reliability, this information is flushed to disk periodically. To ensure that these writes are not a speed bottleneck, only the list of modifications is written to disk, not a full snapshot of the current filesystem. The list of modifications is appended to a file called edits.
- Over time, the edits log file can grow quite large and consume large amounts of disk space.
- When the NameNode is restarted, it takes the HDFS system state from the fsimage file, then applies the contents of the edits log to construct an accurate system state that can be loaded into the NameNode's RAM. If you restart a large cluster that has run for a long period with no Secondary NameNode, the edits log may be quite large, and so it can take some time to reconstruct the system state to be loaded into RAM.
When the Secondary NameNode is configured, it periodically constructs a checkpoint by compacting the information in the edits log and merging it with the most recent fsimage file; it then clears the edits log. So, when the NameNode restarts, it can use the latest checkpoint and apply the contents of the smaller edits log. The interval between checkpoints is determined by the checkpoint period (dfs.namenode.checkpoint.period) or the number of edit transactions (dfs.namenode.checkpoint.txns). The default checkpoint period is one hour, and the default number of edit transactions before a checkpoint is 1,000,000. The SecondaryNameNode will checkpoint in an hour if there have not been 1,000,000 edit transactions within the hour; it will checkpoint after 1,000,000 transactions have been committed if they were committed in under one hour.
Secondary NameNode Parameters
The behavior of the Secondary NameNode is controlled by the following parameters in hdfs-site.xml.
- dfs.namenode.checkpoint.check.period
- dfs.namenode.checkpoint.txns
- dfs.namenode.checkpoint.dir
- dfs.namenode.checkpoint.edits.dir
- dfs.namenode.num.checkpoints.retained
See http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml for details.
Enabling Trash
The trash feature is disabled by default. Cloudera recommends that you enable it on all production clusters.
The Hadoop trash feature helps prevent accidental deletion of files and directories. If trash is enabled and a file or directory is deleted using the Hadoop shell, the file is moved to the .Trash directory in the user's home directory instead of being deleted. Deleted files are initially moved to the Current sub-directory of the .Trash directory, and their original path is preserved. If trash checkpointing is enabled, the Current directory is periodically renamed using a timestamp. Files in .Trash are permanently removed after a user-configurable time delay. Files and directories in the trash can be restored simply by moving them to a location outside the .Trash directory.
The trash feature works by default only for files and directories deleted using the Hadoop shell. Files or directories deleted programmatically using other interfaces (WebHDFS or the Java APIs, for example) are not moved to trash, even if trash is enabled, unless the program has implemented a call to the trash functionality. (Hue, for example, implements trash as of CDH 4.4.)
Users can bypass trash when deleting files using the shell by specifying the -skipTrash option to the hadoop fs -rm -r command. This can be useful when it is necessary to delete files that are too large for the user's quota.
Trash is configured with the following properties in the core-site.xml file:
CDH Parameter |
Value |
Description |
---|---|---|
fs.trash.interval |
minutes or 0 |
The number of minutes after which a trash checkpoint directory is deleted. This option can be configured both on the server and the client.
|
fs.trash.checkpoint.interval |
minutes or 0 |
The number of minutes between trash checkpoints. Every time the checkpointer runs on the NameNode, it creates a new checkpoint of the "Current" directory and removes checkpoints older than fs.trash.interval minutes. This value should be smaller than or equal to fs.trash.interval. This option is configured on the server. If configured to zero (the default), then the value is set to the value of fs.trash.interval. |
The period during which a file remains in the trash starts when the file is moved to the trash, not when the file is last modified.
Configuring Storage-Balancing for the DataNodes
You can configure HDFS to distribute writes on each DataNode in a manner that balances out available storage among that DataNode's disk volumes.
By default a DataNode writes new block replicas to disk volumes solely on a round-robin basis. You can configure a volume-choosing policy that causes the DataNode to take into account how much space is available on each volume when deciding where to place a new replica.
- how much DataNode volumes are allowed to differ in terms of bytes of free disk space before they are considered imbalanced, and
- what percentage of new block allocations will be sent to volumes with more available disk space than others.
Property |
Value |
Description |
---|---|---|
dfs.datanode.fsdataset. volume.choosing.policy | org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy |
Enables storage balancing among the DataNode's volumes. |
dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold | 10737418240 (default) |
The amount by which volumes are allowed to differ from each other in terms of bytes of free disk space before they are considered imbalanced. The default is 10737418240 (10 GB). If the free space on each volume is within this range of the other volumes, the volumes will be considered balanced and block assignments will be done on a pure round-robin basis. |
dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction | 0.75 (default) | What proportion of new block allocations will be sent to volumes with more available disk space than others. The allowable range is 0.0-1.0, but set it in the range 0.5 - 1.0 (that is, 50-100%), since there should be no reason to prefer that volumes with less available disk space receive more block allocations. |
Enabling WebHDFS
To configure HttpFs instead, see HttpFS Installation.
If you want to use WebHDFS, you must first enable it.
To enable WebHDFS:
Set the following property in hdfs-site.xml:
<property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property>
To enable numeric usernames in WebHDFS:
^[A-Za-z_][A-Za-z0-9._-]*[$]?$You can override the default username pattern by setting the dfs.webhdfs.user.provider.user.pattern property in hdfs-site.xml. For example, to allow numerical usernames, the property can be set as follows:
<property> <name>dfs.webhdfs.user.provider.user.pattern</name> <value>^[A-Za-z0-9_][A-Za-z0-9._-]*[$]?$</value> </property>
- To use WebHDFS in a secure cluster, you must set additional properties to configure secure WebHDFS. For instructions, see the CDH 5 Security Guide .
- When you use WebHDFS in a high-availability (HA) configuration, you must supply the value
of dfs.nameservices in the WebHDFS URI, rather than the
address of a particular NameNode; for example:
hdfs dfs -ls webhdfs://nameservice1/, not
hdfs dfs -ls webhdfs://server1.myent.myco.com:20101/
Configuring LZO
If you have installed LZO, configure it as follows.
To configure LZO:
If you copy and paste the value string, make sure you remove the line-breaks and carriage returns, which are included below because of page-width constraints.
<property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec, com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value> </property>
For more information about LZO, see Using LZO Compression.
Start HDFS
Deploy the configuration
To deploy your configuration to your entire cluster:
- Push your custom directory (for example /etc/hadoop/conf.my_cluster) to each node in your cluster; for example:
$ scp -r /etc/hadoop/conf.my_cluster myuser@myCDHnode-<n>.mycompany.com:/etc/hadoop/conf.my_cluster
- Manually set alternatives on each node to point to that directory, as follows.
To manually set the configuration on Red Hat-compatible systems:
$ sudo alternatives --verbose --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 $ sudo alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
To manually set the configuration on Ubuntu and SLES systems:
$ sudo update-alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 $ sudo update-alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
For more information on alternatives, see the update-alternatives(8) man page on Ubuntu and SLES systems or the alternatives(8) man page On Red Hat-compatible systems.
Start HDFS
for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done
This starts all the CDH services installed on the node. This is normally what you want, but you can start services individually if you prefer.
Create the /tmp directory
If you do not create /tmp properly, with the right permissions as shown below, you may have problems with CDH components later. Specifically, if you don't create /tmp yourself, another process may create it automatically with restrictive permissions that will prevent your other applications from using it.
Create the /tmp directory after HDFS is up and running, and set its permissions to 1777 (drwxrwxrwt), as follows:
$ sudo -u hdfs hadoop fs -mkdir /tmp $ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
If Kerberos is enabled, do not use commands in the form sudo -u <user> hadoop <command>; they will fail with a security error. Instead, use the following commands: $ kinit <user> (if you are using a password) or $ kinit -kt <keytab> <principal> (if you are using a keytab) and then, for each command executed by this user, $ <command>
Deploy YARN or MRv1
To to deploy MRv1 or YARN, and start HDFS services if you have not already done so, see
<< Configuring Network Names | Deploying MapReduce v1 (MRv1) on a Cluster >> | |