Hive replication policy definition JSON file
The policy definition JSON file contains all the parameters required to create a Hive replication policy. When you edit the file to define a Hive replication policy, remove the parameters that are not required for the replication policy.
Parameters in Hive replication policy definition JSON file
The following table lists the parameters in the policy definition JSON file that are required for a Hive replication policy:
Parameter | Description |
---|---|
name |
Provide the unique name for the policy. |
type |
Provide HIVE to create a Hive replication
policy. |
mapReduceService |
Provide the MapReduce or YARN service for the replication policy to use. |
logPath |
Provide an alternate path for the logs, if required. |
replicationStrategy |
Provide one of the following options to determine whether the
file replication tasks must be distributed among the mappers
statically or dynamically:
Default is |
skipChecksumChecks |
Provide true to skip checksum checks. Default
is Checksums are used to
perform the following tasks:
|
skipListingChecksumChecks |
Provide true to skip checksum check while
comparing two files to determine whether they are the same or not.
Otherwise, the file size and last modified time are used to
determine if files are the same or not. Skipping the check improves
performance during the mapper phase. |
abortOnError |
Provide true to stop the policy job when an
error occurs. This ensures that the files copied up to that point
remain on the destination, but no additional files are copied.
Default is |
abortOnSnapshtDiffFailures |
Provide true to stop the replication job if a
snapshot diff fails during replication. |
preserve |
Provide true to preserve the block size,
replication count, permissions (including ACLs), and extended
attributes (XAttrs) as they exist on the source file system.
Provide false to use the settings as configured on the destination file system. By default, the source system settings are preserved. |
deletePolicy |
Provide one of the following options:
Default is |
alert |
Configure the following parameters as required:
|
exclusionFilters |
Provide one or more directory paths to exclude from replication. |
databasesAndTables |
Configure the parameter as required:
|
sentryPermissions |
Provide INCLUDE to import both Hive object and
URL permissions. |
skipUrlPermissions |
Provide true to import only the Hive object
permissions. |
numThreads |
Provide the number of threads to use during replication. |
frequencyInSec |
Auto-populated after the policy runs successfully. Shows the time duration between two replication jobs in seconds. |
targetDataset |
Auto-populated after the policy runs successfully. Shows the target location where the replicated files are available on the target cluster. |
cloudCredential |
Provide the cloud credentials. |
sourceCluster |
Shows the source cluster name. |
targetCluster |
Shows the target cluster name in the
dataCProvideName$clustername format. For
example, "DC-US$My Destination 17". |
startTime |
Shows the start time of the replication job in the
YYYY-MM-DDTHH:MM:SSZ format. |
endTime |
Shows the end time of the replication job in the
YYYY-MM-DDTHH:MM:SSZ format. |
distcpMaxMaps |
Provide the maximum map slots to limit the number of
map slots per mapper. Default is |
distcpMapBandwidth |
Provide the maximum bandwidth to limit the bandwidth
per mapper. Default is |
queueName |
Provide a YARN queue name, if necessary. Default queue name is Default. |
tdeSameKey |
Provide true if the source and destination are
encrypted with the same TDE key. |
description |
Provide a description for the policy. |
enableSnapshotBasedReplication |
Provide true to enable snapshot-based
replication. |
cloudEncryptionAlgorithm |
Provide the cloud encryption algorithm. |
cloudEncryptionKey |
Provide the cloud encryption key. |
plugins |
Provide the plugins to deploy on all the nodes in the cluster if you have multiple repositories configured in your environment. |
hiveExternalTableBaseDirectory |
Provide the Hive external table base directory path. |
cmPolicySubmitUser |
Provide the following options:
|
Sample Hive replication policy definition JSON file
The following snippet shows the contents of the Hive replication policy definition JSON file. While editing the file, ensure that you remove the key-value pairs that are not required for the Hive replication policy.
{
"name": "string",
"type": "HIVE",
"sourceDataset": {
"hdfsArguments": {
"path": "string",
"mapReduceService": "string",
"logPath": "string",
"replicationStrategy": "DYNAMIC"|"STATIC",
"errorHandling": {
"skipChecksumChecks": true|false,
"skipListingChecksumChecks": true|false,
"abortOnError": true|false,
"abortOnSnapshotDiffFailures": true|false
},
"preserve": {
"blockSize": true|false,
"replicationCount": true|false
"permissions": true|false,
"extendedAttributes": true|false
},
"deletePolicy": "KEEP_DELETED_FILES"|"DELETE_TO_TRASH"|"DELETE_PERMANENTLY",
"alert": {
"onFailure": true|false,
"onStart": true|false,
"onSuccess": true|false,
"onAbort": true|false
},
"exclusionFilters": ["string", ...]
},
"hiveArguments": {
"databasesAndTables": [
{
"database": "string",
"tablesIncludeRegex": "string",
"tablesExcludeRegex": "string",
}
...
],
"sentryPermissions": "INCLUDE"|"EXCLUDE",
"skipUrlPermissions": true|false,
"numThreads": integer
}
},
"frequencyInSec": integer,
"targetDataset": "string",
"cloudCredential": "string",
"sourceCluster": "string",
"targetCluster": "string",
"startTime": "string",
"endTime": "string",
"distcpMaxMaps": integer,
"distcpMapBandwidth": integer,
"queueName": "string",
"tdeSameKey": true|false,
"description": "string",
"enableSnapshotBasedReplication": true|false
"cloudEncryptionAlgorithm": "string",
"cloudEncryptionKey": "string",
"plugins": ["string", ...],
"hiveExternalTableBaseDirectory": "string",
"cmPolicySubmitUser": {
"userName": "string",
"sourceUser": "string"
}
}