Monitoring the Performance of HDFS Replications

You can monitor the progress of an HDFS replication schedule using performance data that you download as a CSV file from the Cloudera Manager Admin console.

This file contains information about the files being replicated, the average throughput, and other details that can help diagnose performance issues during HDFS replications. You can view this performance data for running HDFS replication jobs and for completed jobs.

View the performance data for a running HDFS replication schedule

  1. Go to Backup > Replication Schedules.
  2. Locate the schedule.
  3. Click Performance Report and select one of the following options:
    • HDFS Performance Summary – Download a summary report of the performance of the running replication job. An HDFS Performance Summary Report includes the last performance sample for each mapper that is working on the replication job.
    • HDFS Performance Full – Download a full report of the performance of the running replication job. An HDFS Performance Full report includes all samples taken for all mappers during the full execution of the replication job.
  4. To view the data, import the file into a spreadsheet program such as Microsoft Excel.
  5. Go to Backup > Replication Schedules and locate the schedule and click Actions > Show History.
    The Replication History page for the replication schedule displays.
To view the performance data for a completed HDFS replication schedule:
Table 1. HDFS Performance Report Columns
Performance Data Columns Description
Timestamp Time when the performance data was collected
Host Name of the host where the YARN or MapReduce job was running.
SrcFile Name of the source file being copied by the MapReduce job.
TgtFile Name of the file to which the source file was being copied on the target.
BytesCopiedPerFile Number of bytes copied for the file currently being copied.
TimeElapsedPerFile Total time elapsed for this copy operation of the file currently being copied.
CurrThroughput Current throughput in bytes per second.
AvgFileThroughput Average throughput in bytes per second since the start of the file currently being copied.
TotalSleepTime Number of seconds the transfer was stalled due to throughput throttling. This is expected to be zero unless the throughput was throttled using the Maximum Bandwidth parameter for the replication schedule. (You configure his parameter on the Advanced tab when creating or editing a replication schedule.)
AvgMapperThroughput Average throughput for current mapper. This can include samples of throughput taken for various files copied by this mapper.
BytesCopiedPerMapper Total bytes copied by this MapReduce job. This can include multiple files.
TimeElapsedPerMapper Total time elapsed since this MapReduce job started copying files.
  1. Go to Backup > Replication Schedules.
  2. Locate the schedule and click Actions > Show History.

    The Replication History page for the replication schedule displays.

  3. Click the arrow to expand the display for this schedule.
  4. Click Download CSV link and select one of the following options:
    • Listing – a list of files and directories copied during the replication job.
    • Status - full status report of files where the status of the replication is one of the following:
      • ERROR – An error occurred and the file was not copied.
      • DELETED – A deleted file.
      • SKIPPED – A file where the replication was skipped because it was up-to-date.
    • Error Status Only – full status report, filtered to show files with errors only.
    • Deleted Status Only – full status report, filtered to show deleted files only.
    • Skipped Status Only– full status report, filtered to show skipped files only.
    • Performance – summary performance report.
    • Full Performance – full performance report.
  5. To view the data, import the file into a spreadsheet program such as Microsoft Excel.

    The performance data is collected every two minutes. Therefore, no data is available during the initial execution of a replication job because not enough samples are available to estimate throughput and other reported data.

A sample CSV file, as presented in Excel, is shown here:

Note the following limitations and known issues:
  • If you click the CSV download too soon after the replication job starts, Cloudera Manager returns an empty file or a CSV file that has columns headers only and a message to try later when performance data has actually been collected.
  • If you employ a proxy user with the form user@domain, performance data is not available through the links.
  • If the replication job only replicates small files that can be transferred in less than a few minutes, no performance statistics are collected.
  • For replication schedules that specify the Dynamic Replication Strategy, statistics regarding the last file transferred by a MapReduce job hide previous transfers performed by that MapReduce job.
  • Only the last trace per MapReduce job is reported in the CSV file.