Configuring YARN resources for MapReduce conversion jobs

Cloudera Storage Optimizer uses MapReduce jobs for data conversion. Learn how to configure sufficient YARN resources for MapReduce conversion jobs.

Table 1. Recommended YARN resources by data volume
Daily conversion volume Concurrent mappers YARN memory YARN vCores Estimated time for conversion
Less than 10 TB 10 10 GB 10 2 to 4 hours
10 to 50 TB 20 20 GB 20 3 to 6 hours
50 to 100 TB 30 30 GB 30 4 to 8 hours
Greater than 100 TB 50 50 GB 50 6 to 12 hours
Minimum requirements are as follows:
  • Container Memory: 1 GB per mapper
  • Container vCores: 1 vCore per mapper
  • Concurrent Mappers: Default value is10 (configurable through UI setting Key Conversion Concurrent Mappers)
  • Total Minimum: 10 GB RAM and 10 vCores available in YARN
  1. Sign in to Cloudera Manager.
  2. In the left navigation, click Clusters and select the Ozone cluster.
  3. Click Instances tab and click on Ozone Tiering in the Role Type column. The Cloudera Storage Optimizer UI page opens.
  4. Click Configuration tab.
  5. Click Ozone Key Settings.
  6. Update the following configurations:
    • Key Conversion Batches Per Mapper: Specify the number of batches that can be converted per mapper to control the split size. Default value is 10.
    • Key Conversion Concurrent Mappers: Specify the number of key conversion mappers are needed concurrently. Increase the value for faster conversion rate. Cloudera recommends to use 20 to 50 mappers for large buckets. Conversion runs as standard priority jobs in the default queue.