Advanced topics
Advanced topics and configuration options for the manifest committer.
Advanced configuration
There are some advanced options which are intended for development and testing, rather than production use.
Option | Meaning | Default Value |
---|---|---|
mapreduce.manifest.committer.store.operations.classname |
Classname for Manifest Store Operations | "" |
mapreduce.manifest.committer.validate.output |
Perform output validation? | false |
Validating output
The option mapreduce.manifest.committer.validate.output
triggers a check of every renamed file to verify it has the expected length.
This adds the overhead of a HEAD
request per file, and so is
recommended for testing only.
There is no verification of the actual contents.
Controlling storage integration
The manifest committer interacts with filesystems through implementations of the
interface ManifestStoreOperations
. It is possible to provide custom
implementations for store-specific features. There is one of these for ABFS; when the
abfs-specific committer factory is used this is automatically set.
It can be explicitly set:
<property> <name>mapreduce.manifest.committer.store.operations.classname</name> <value>org.apache.hadoop.fs.azurebfs.commit.AbfsManifestStoreOperations</value> </property>
The default implementation may also be configured:
<property> <name>mapreduce.manifest.committer.store.operations.classname</name> <value>org.apache.hadoop.mapreduce.lib.output.committer.manifest.impl.ManifestStoreOperationsThroughFileSystem</value> </property>
There is no need to alter these values, except when writing new implementations for other stores, something which is only needed if the store provides extra integration support for the committer.