Creating a Hive replication policy
To replicate Hive metadata from on-premises to cloud, you must set the Ranger policy in Ranger, and then create the Hive replication policy in Replication Manager.
The Apache Ranger access policy model consists of the following components:
-
Specification of the resources that you can apply to a replication policy which includes the HDFS files and directories; Hive databases, tables, and columns; and HBase tables, column-families, and columns.
-
Specification of access conditions for specific users and groups.
On the target cluster, the hive user must have Ranger admin privileges. The same hive user performs the metadata import operation.
- Log in to the Ranger Admin UI.
- In the Hadoop_SQL section, provide the hdfs user permission to "all-database, table, column" in hdfs.
Option | Description |
YARN Queue Name | Enter the name of the YARN queue for the cluster to which the replication job is submitted if you are using Capacity Scheduler queues to limit resource consumption. The default value for this field is default. |
Maximum Maps Slots | Set the maximum number of map tasks (simultaneous copies) per replication job. The default value is 20. |
Maximum Bandwidth | Adjust this setting so that each map task is throttled to
consume only the specified bandwidth. Each map task ((simultaneous copy) is restricted to consume only the specified bandwidth. This is not always exact. The map throttles back its bandwidth consumption during a copy in such a way that the net bandwidth used tends towards the specified value. You can adjust this setting so that each map task is throttled to consume only the specified bandwidth so that the net bandwidth used tends towards the specified value. The default value for the bandwidth is 100MB per second for each mapper. |
Path Exclusion | Enter a regular expression-based path. When you add an
exclusion, include the snapshotted relative path for the regex.
For example, to exclude the /user/bdr directory, use the .*/user/\.snapshot/.+/bdr.* regular expression, which includes the snapshots for the bdr directory. To exclude top-level directories from replication in a globbed source path, you can specify the relative path for the regex without including .snapshot in the path. For example, to exclude the bdr directory from replication, use the .*/user+/bdr.* regular expression. You can add more than one regular expression to exclude. |
Replication Strategy | Choose one of the following replication strategies to
determine whether the file replication tasks should be
distributed among the mappers statically or dynamically.
|
MapReduce Service | Choose the MapReduce or YARN service to use. |
Log Path | Enter an alternate path for the logs, if required. |
Error Handling | Select the following options as necessary:
|
Preserve | Choose the required options to preserve the block size,
replication count, permissions (including ACLs), and extended
attributes (XAttrs) as they exist on the source file system, or
to use the settings as configured on the destination file
system. By default source system settings are preserved. When Permission is selected, and both the source and destination clusters support ACLs, replication preserves ACLs. Otherwise, ACLs are not replicated. When Extended attributes is selected, and both the source and destination clusters support extended attributes, replication preserves them. (This option only displays when both source and destination clusters support extended attributes.) If you select one or more of the Preserve options and you are replicating to S3 or ADLS, the values all of these items are saved in metadata files on S3 or ADLS. When you replicate from S3 or ADLS to HDFS, you can select which of these options you want to preserve. |
Delete Policy | Choose the required options to determine whether the files
that were deleted on the source should also be deleted from the
destination directory. This policy also determines the handling
of files in the destination location that are unrelated to the
source. Options include:
|
Alerts | Choose the required options to generate alerts for various state changes in the replication workflow. You can alert on failure, on start, on success, or when the replication workflow is aborted. |
Sentry Permissions |
Choose the following Sentry permissions as necessary:
|
Replication Option | Specify metadata and data, or metadata only. |
Directory for metadata file | The folder path in the destination cluster to save the metadata file. If the folder does not exist, Replication Manager creates a new folder. |