Replication ManagerPDF version

Performance and scalability limitations for HDFS replication policies

HDFS replication has some performance and scalability limitations.

Before you create a HDFS replication policy, consider the following:
  • The maximum number of files for a single replication job is 100 million.
  • The maximum number of files for a replication policy that runs more frequently than once in 8 hours is 10 million.
  • The throughput of the replication job depends on the absolute read and write throughput of the source and destination clusters.
  • Regular rebalancing of your HDFS clusters is required for efficient operation of replications.