Homepage
/
Cloudera on cloud Runtime
7.2.1
(Public Cloud)
Search Documentation
▶︎
Cloudera
Reference Architectures
▼
Cloudera on cloud
Getting Started
Patterns
Preview Features
Data Catalog
Data Engineering
Data Flow
Data Hub
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Management Console
Operational Database
Replication Manager
Cloudera Manager
CDF for Data Hub
Runtime
▶︎
Cloudera on premises
Data Services
Getting Started
Cloudera Manager
Management Console
Replication Manager
Data Catalog
Data Engineering
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Base
Getting Started
Runtime
Upgrade
Storage
Flow Management
Streaming Analytics
Flow Management Operator
Streaming Analytics Operator
Streams Messaging Operator
▶︎
Cloudera Manager
Cloudera Manager
▶︎
Applications
Cloudera Streaming Community Edition
Data Science Workbench
Data Visualization
Edge Management
Observability SaaS
Observability on premises
Workload XM On-Prem
▶︎
Legacy
Cloudera Enterprise
Flow Management
Stream Processing
HDP
HDF
Streams Messaging Manager
Streams Replication Manager
▶︎
Getting Started
Patterns
Preview Features
Data Catalog
Data Engineering
Data Flow
Data Hub
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Management Console
Operational Database
Replication Manager
Cloudera Manager
CDF for Data Hub
Runtime
«
Filter topics
«
Running the balancer
▼
Optimizing data storage
▶︎
Balancing data across disks of a DataNode
▶︎
Plan the data movement across disks
Parameters to configure the Disk Balancer
Execute the Disk Balancer plan
Disk Balancer commands
▶︎
Erasure coding overview
Understanding erasure coding policies
Comparing replication and erasure coding
Best practices for rack and node setup for EC
Prerequisites for enabling erasure coding
Limitations of erasure coding
Using erasure coding for existing data
Using erasure coding for new data
Advanced erasure coding configuration
Erasure coding CLI command
Erasure coding examples
▶︎
Increasing storage capacity with HDFS compression
Enable GZipCodec as the default compression codec
Use GZipCodec with a one-time job
▶︎
Setting HDFS quotas
Set quotas using Cloudera Manager
▶︎
Configuring heterogeneous storage in HDFS
HDFS storage types
HDFS storage policies
Commands for configuring storage policies
Set up a storage policy for HDFS
Set up SSD storage using Cloudera Manager
Configure archival storage
The HDFS mover command
▼
Balancing data across an HDFS cluster
Why HDFS data becomes unbalanced
▶︎
Configurations and CLI options for the HDFS Balancer
Properties for configuring the Balancer
Balancer commands
Recommended configurations for the Balancer
▼
Configuring and running the HDFS balancer using Cloudera Manager
Configuring the balancer threshold
Configuring concurrent moves
Recommended configurations for the balancer
Running the balancer
Configuring block size
▶︎
Cluster balancing algorithm
Storage group classification
Storage group pairing
Block move scheduling
Block move execution
Exit statuses for the HDFS Balancer
HDFS
▶︎
Optimizing performance
▶︎
Improving performance with centralized cache management
Benefits of centralized cache management in HDFS
Use cases for centralized cache management
Centralized cache management architecture
Caching terminology
Properties for configuring centralized caching
Commands for using cache pools and directives
▶︎
Customizing HDFS
Customize the HDFS home directory
Properties to set the size of the NameNode edits directory
▶︎
Optimizing NameNode disk space with Hadoop archives
Overview of Hadoop archives
Hadoop archive components
Create a Hadoop archive
List files in Hadoop archives
Format for using Hadoop archives with MapReduce
▶︎
Detecting slow DataNodes
Enable detection of slow DataNodes
▶︎
Allocating DataNode memory as storage
HDFS storage types
LAZY_PERSIST memory storage policy
Configure DataNode memory as storage
▶︎
Improving performance with short-circuit local reads
Prerequisites for configuring short-ciruit local reads
Properties for configuring short-circuit local reads on HDFS
Configuring Proxy Users to Access HDFS
▶︎
Using DistCp to copy files
Using DistCp
Update and overwrite
DistCp and security settings
Secure-to-secure: Kerberos principal name
Secure-to-secure: ResourceManager mapping rules
DistCp between HA clusters
▶︎
Using DistCp with Amazon S3
Using a credential provider to secure S3 credentials
Examples of DistCp commands using the S3 protocol and hidden credentials
DistCp additional considerations
▶︎
Using the NFS Gateway for accessing HDFS
Configure the NFS Gateway
▶︎
Start and stop the NFS Gateway services
Verify validity of the NFS services
▶︎
Access HDFS from the NFS Gateway
How NFS Gateway authenticates and maps users
▶︎
APIs for accessing HDFS
Set up WebHDFS on a secure cluster
▶︎
Using HttpFS to provide access to HDFS
Add the HttpFS role
Using Load Balancer with HttpFS
▶︎
HttpFS authentication
Use curl to access a URL protected by Kerberos HTTP SPNEGO
▶︎
Data storage metrics
Using JMX for accessing HDFS metrics
▶︎
Configure the G1GC garbage collector
Recommended settings for G1GC
Switching from CMS to G1GC
HDFS Metrics
»
scaling namespaces
Running the balancer
Learn how to run the HDFS Balancer.
Go to the HDFS service.
Ensure that the service has a Balancer role.
Select
Actions
>
Rebalance
.
Click
Rebalance
to confirm.
If you see a
Finished
status, the Balancer ran successfully.
Parent topic:
Configuring and running the HDFS balancer using Cloudera Manager
Feedback
We want your opinion
How can we improve this page?
What kind of feedback do you have?
I like something
I have an idea
Something's not working
Can we contact you for follow-up on this?
Back
Submit
OK
7.3.1
7.2
7.2.18
7.2.17
7.2.16
7.2.15
7.2.14
7.2.12
7.2.11
7.2.10
7.2.9
7.2.8
7.2.7
7.2.6
7.2.2
7.2.1
7.2.0
7.1.0
7.0
7.0.2
7.0.1
7.0.0