Performance and scalability limitations
HDFS replication has some performance and scalability limitations.
Before you create a replication policy, consider the following:
- Maximum number of files for a single replication job: 100 million.
- Maximum number of files for a replication policy that runs more frequently than once in 8 hours: 10 million.
- The throughput of the replication job depends on the absolute read and write throughput of the source and destination clusters.
- Regular rebalancing of your HDFS clusters is required for efficient operation of replications.