Hortonworks Docs
»
»
Also available as:
Scaling Namespaces and Optimizing Data Storage
Introduction
Overview of Apache HDFS
Scaling namespaces
Scaling a cluster using HDFS federation
Federation terminology
Benefits of an HDFS Federation
Configure an HDFS federation
Format NameNodes
Add a NameNode to an existing HDFS cluster
Configure a federation with a cluster upgrade
Cluster management operations
Balance data in a federation
Decommission a DataNode from a federation
Using cluster web console to monitor a federation
Using ViewFs to manage multiple namespaces
Namespace view in a non-federated environment
Namespace view in a federation
Pathnames on clusters with federated and non-federated NameNodes
Considerations for working with ViewFs mount table entries
Example of ViewFs mount table entries
Optimizing data storage
Balancing data across disks of a DataNode
Plan the data movement across disks
Parameters to configure the Disk Balancer
Execute the Disk Balancer plan
Disk Balancer commands
Increasing storage capacity with HDFS erasure coding
Benefits of erasure coding
How the DataNode recovers failed erasure-coded blocks
Erasure coding policies
Limitations of erasure coding
Effect of erasure coding on existing data
Considerations for deploying erasure coding
Erasure coding CLI command
Erasure coding examples
Increasing storage capacity with HDFS compression
Enable GZipCodec as the default compression codec
Use GZipCodec with a one-time job
Setting archival storage policies
HDFS storage types
HDFS storage policies
Configure archival storage
Commands for configuring storage policies
The HDFS mover command
Balancing data across an HDFS cluster
Why HDFS data Becomes unbalanced
Configurations and CLI options for the HDFS Balancer
Properties for configuring the Balancer
Balancer commands
Recommended configurations for the Balancer
Cluster balancing algorithm
Storage group classification
Storage group pairing
Block move scheduling
Block move execution
Exit statuses for the HDFS Balancer
Optimizing performance
Improving performance with centralized cache management
Benefits of centralized cache management in HDFS
Use cases for centralized cache management
Centralized cache management architecture
Caching terminology
Properties for configuring centralized caching
Commands for using cache pools and directives
Configuring HDFS rack awareness
Create a rack topology script
Add the topology script property to core-site.xml
Restart HDFS and MapReduce services
Verify rack awareness
Customizing HDFS
Customize the HDFS home directory
Properties to set the size of the NameNode edits directory
Optimizing NameNode disk space with Hadoop archives
Overview of Hadoop archives
Hadoop archive components
Create a Hadoop archive
List files in Hadoop archives
Format for using Hadoop archives with MapReduce
Detecting slow DataNodes
Enable disk IO statistics
Enable detection of slow DataNodes
Allocating DataNode memory as storage (Technical Preview)
HDFS storage types
LAZY_PERSIST memory storage policy
Configure DataNode memory as storage
Improving performance with short-circuit local reads
Prerequisites for configuring short-ciruit local reads
Properties for configuring short-circuit local reads on HDFS
Using the NFS Gateway for accessing HDFS
Configure the NFS Gateway
Start and stop the NFS Gateway services
Verify validity of the NFS services
Access HDFS from the NFS Gateway
How NFS Gateway authenticates and maps users
Using the NFS Gateway with ViewFs
Export ViewFs mounts using the NFS Gateway
Data storage metrics
Using JMX for accessing HDFS metrics
Configure the G1GC garbage collector (Technical Preview)
Recommended settings for G1GC
Switching from CMS to G1GC
APIs for accessing HDFS
Set up WebHDFS on a secure cluster
© 2012-2019, Hortonworks, Inc.
Document licensed under the
Creative Commons Attribution ShareAlike 4.0 License
.
Hortonworks.com
|
Documentation
|
Support
|
Community