Monitoring the Performance of HDFS Replications

You can monitor the progress of an HDFS replication schedule using performance data that you download as a CSV file from the Cloudera Manager Admin console. This file contains information about the files being replicated, the average throughput, and other details that can help diagnose performance issues during HDFS replications. You can view this performance data for running HDFS replication schedules and for completed schedules.

To view the performance data for a running HDFS replication schedule:
  1. Go to Backup > Replication Schedules.
  2. Locate the row for the schedule.
  3. Click the Performance Report link to download the CSV file.
  4. To view the data, import the file into a spreadsheet program such as Microsoft Excel.

To view the performance data for a completed HDFS replication schedule:
  1. Go to Backup > Replication Schedules.
  2. In the row for the schedule, click Actions > Show History.

    The Replication History page for the replication schedule displays.

  3. Click to expand the display.
  4. Click the Download Performance CSV link to download the CSV file.
  5. To view the data, import the file into a spreadsheet program such as Microsoft Excel.

The performance data is collected every few minutes. Therefore, no data is available during the initial execution of a replication job because not enough samples are available to estimate throughput and other reported data.

The information provided by the CSV link is a summary that contains the last report of a given MapReduce job. The data returned by the CSV files downloaded from the Cloudera Manager Admin console has the following structure:
Performance Data Columns Description
Timestamp Time when the performance data was collected
Host Name of the host where the MapReduce job was running.
SrcFile Name of the source file being copied by the MapReduce job.
TgtFile Name of the file to which the source file was being copied on the target.
BytesCopied Number of bytes copied so far.
Time Total time elapsed for this copy operation.
CurrThroughput Current throughput in bytes per second.
AvgThroughput Average throughput in bytes per second since the start of this file transfer.
TotalSleepTime Number of seconds the transfer was stalled due to throughput throttling. This is expected to be zero unless the throughput was throttled using the Maximum Bandwidth parameter for the replication schedule. (You configure his parameter on the Advanced tab when creating or editing a replication schedule.)

A sample CSV file, as presented in Excel, is shown here:

Note the following limitations and known issues:
  • If you click the CSV download too soon after the replication job starts, Cloudera Manager returns an empty file or a CSV file that has columns headers only and a message to try later when performance data has actually been collected.

  • If you employ a proxy user with the form user@domain, performance data is not via available through the links.
  • If the replication job only replicates small files that can be transferred in less than a few minutes, no performance statistics are collected.
  • For replication schedules that specify the Dynamic Replication Strategy, statistics regarding the last file transferred by a MapReduce job hide previous transfers performed by that MapReduce job.
  • Only the last trace per MapReduce job is reported in the CSV file.