Creating a default directory for managed tables
You can specify a top-level directory for managed tables when creating a Hive database.
Create a default directory for managed tables only after limiting CREATE
DATABASE and ALTER DATABASE statements to users having the Admin role, which has
hive service user permissions. Permissions to the managed
directory must be limited to the hive service user. In addition to
restricting permissions to the hive user, you can further secure
managed tables using Ranger fine-grained permissions, such as row-level filtering
and column masking.
hive.metastore.warehouse.dir configuration
property to give managed tables a common location for governance policies. The managed
location designates a single root directory for all tenant tables, managed and external.
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION external_table_path]
[MANAGEDLOCATION managed_table_directory_path]
[WITH DBPROPERTIES (property_name=property_value, ...)]; Do
not set LOCATION and MANAGEDLOCATION to the same file system
path.ALTER (DATABASE|SCHEMA) database_name SET MANAGEDLOCATION [managed_table_directory_path];
Generating surrogate keys
You can use the built-in SURROGATE_KEY user-defined function (UDF) to automatically generate numerical Ids for rows as you enter data into a table. The generated surrogate keys can replace wide, multiple composite keys.
| Table Type | ACID | Surrogate Keys | File Format | INSERT | UPDATE/DELETE |
|---|---|---|---|---|---|
| Managed: CRUD transactional | Yes | Yes | ORC | Yes | Yes |
| Managed: Insert-only transactional | Yes | Yes | Any | Yes | No |
| Managed: Temporary | No | No | Any | Yes | No |
| External | No | No | Any | Yes | No |
Joins using the generated keys are faster than joins using strings. Using generated keys does not force data into a single node by a row number. You can generate keys as abstractions of natural keys. Surrogate keys have an advantage over UUIDs, which are slower and probabilistic.
The SURROGATE_KEY UDF generates a unique Id for every row that you insert into a table. It generates keys based on the execution environment in a distributed system, which includes a number of factors, such as internal data structures, the state of a table, and the last transaction id. Surrogate key generation does not require any coordination between compute tasks.
The UDF takes either no arguments or two arguments:
- Write Id bits
- Task Id bits
