Advanced topics and configuration options for the manifest committer.
There are some advanced options which are intended for development and testing, rather than production use.
||Classname for Manifest Store Operations||
||Perform output validation?||
triggers a check of every renamed file to verify it has the expected length.
This adds the overhead of a
HEAD request per file, and so is
recommended for testing only.
There is no verification of the actual contents.
Controlling storage integration
The manifest committer interacts with filesystems through implementations of the
ManifestStoreOperations. It is possible to provide custom
implementations for store-specific features. There is one of these for ABFS; when the
abfs-specific committer factory is used this is automatically set.
It can be explicitly set:
<property> <name>mapreduce.manifest.committer.store.operations.classname</name> <value>org.apache.hadoop.fs.azurebfs.commit.AbfsManifestStoreOperations</value> </property>
The default implementation may also be configured:
<property> <name>mapreduce.manifest.committer.store.operations.classname</name> <value>org.apache.hadoop.mapreduce.lib.output.committer.manifest.impl.ManifestStoreOperationsThroughFileSystem</value> </property>
There is no need to alter these values, except when writing new implementations for other stores, something which is only needed if the store provides extra integration support for the committer.