Advanced topics

Advanced topics and configuration options for the manifest committer.

Advanced configuration

There are some advanced options which are intended for development and testing, rather than production use.

Option Meaning Default Value
mapreduce.manifest.committer.store.operations.classname Classname for Manifest Store Operations ""
mapreduce.manifest.committer.validate.output Perform output validation? false

Validating output

The option mapreduce.manifest.committer.validate.output triggers a check of every renamed file to verify it has the expected length.

This adds the overhead of a HEAD request per file, and so is recommended for testing only.

There is no verification of the actual contents.

Controlling storage integration

The manifest committer interacts with filesystems through implementations of the interface ManifestStoreOperations. It is possible to provide custom implementations for store-specific features. There is one of these for ABFS; when the abfs-specific committer factory is used this is automatically set.

It can be explicitly set:

<property>
  <name>mapreduce.manifest.committer.store.operations.classname</name>
  <value>org.apache.hadoop.fs.azurebfs.commit.AbfsManifestStoreOperations</value>
</property>

The default implementation may also be configured:

<property>
  <name>mapreduce.manifest.committer.store.operations.classname</name>
  <value>org.apache.hadoop.mapreduce.lib.output.committer.manifest.impl.ManifestStoreOperationsThroughFileSystem</value>
</property>

There is no need to alter these values, except when writing new implementations for other stores, something which is only needed if the store provides extra integration support for the committer.