Enabling File Size Reporting

File size reporting allows you to identify tables in HDFS where data is stored in small files or partitions. Data stored inefficientely can cause performance issues such as long query time. Enable file size reporting in Cloudera Manager before you view the report in WXM.

First, verify the following prerequisites:

  • You must have Cloudera Manager 6.2 or later installed.
  • You must have Navigator installed. WXM uses the Navigator Metadata Server to gather file size information.

Complete the following steps:

  1. In Cloudera Manager, open the Cloudera Management Service.
  2. Open the Configuration tab and search for Small Files. The following properties will be displayed:

    The "Small Files Reporting: Enable Data Collection" property is enabled. HDFS-1 is selected for the "Small Files Reporting: HDFS Service for Data Staging" property. The "Small Files Reporting: HDFS Staging Location" property has the following value: /user/cloudera/navigator/smallfiles.

  3. Enable the Small Files Reporting: Enable Data Collection property. This property allows Navigator to pass metadata to the Telemetry Publisher. WXM puts this data into the Table File Size Report.
  4. In the Small Files Reporting: HDFS Service for Data Staging property, select the HDFS service that you want to use for the staging area for file size analysis.
  5. The Small Files Reporting: HDFS Staging Location sets the location of the staging area. You can leave the default of /user/cloudera/navigator/smallfiles.
  6. Restart the Navigator Metadata Server.

For more information about viewing the file size report in WXM, see File Size Reporting.