HDFS Dashboard
The HDFS Dashboard helps operators better understand how HDFS is being used and which users and jobs are consuming the most resources within the file system.
This dashboard includes the following paragraphs:
File Size Distribution
Top N Users with Small Files
Top N Largest HDFS Users
Average File Size
HDFS File Size Distribution Trend
HDFS Utilization Trend
HDFS File Size Distribution Trend by User
HDFS File Size Distribution Trend by User
Jobs With High Number of HDFS Operations
HDP 2.5: Jobs Creating Many HDFS Files
Jobs With Large Amount of Data Written
Most of these paragraphs have titles that are self-explanatory. A few of them are described below to provide more context:
Paragraph | Description |
---|---|
File Size Distribution |
For any large multi-tenant cluster, it’s important to identify and keep the proliferation of small files in check. The paragraph displays a pie chart showing the relative distribution of files by file size categorized by Tiny (0-10K), Mini (10K-1M), Medium (30M-128M), and Large (128M+) files. The goal is to show how dominant specific file size categories are within HDFS. If there are many small files, you can easily identify (in the next paragraph) who is contributing to those small files. |
Top N Users with Small Files |
Understanding how prevalent files of specific sizes are is helpful, but the next step is understanding who is responsible for creating those files. The goal of this paragraph is to show who is responsible for creating the majority of small files within HDFS. |
Top N Largest HDFS Users |
This paragraph helps you understand where all of the HDFS capacity is being consumed, and who is consuming it. The goal is to help you quickly understand which user or users are storing the most data in HDFS. |
HDFS File Size Distribution Trend by User |
Each “by User” paragraph allows you to see how an individual user’s file sizes are trending. This paragraph helps answer questions related to points in time where large or small files start becoming more or less prevalent for specific users, and can help measure the success of coaching users on Hadoop best practices. |
HDP 2.5: Jobs Creating Many HDFS Files |
When troubleshooting issues related to HDFS NameNode performance, it’s helpful to understand which jobs are creating the most files, and potentially putting the largest amount of load on the NameNode. In HDP 2.5, new counters have been added to track how many files are created by each YARN application. This is helpful in troubleshooting erroneous jobs that are unintentionally creating hundreds of thousands, or even millions of files within HDFS. |