Changed Behavior after HDFS Encryption is Enabled
You must consider various factors when working with Hive tables after enabling HDFS transparent encryption for Hive.
- Loading data from one encryption zone to another results in a
copy of the data. Distcp is used to speed up the process if the size
of the files being copied is higher than the value specified by
HIVE_EXEC_COPYFILE_MAXSIZE
. The minimum size limit forHIVE_EXEC_COPYFILE_MAXSIZE
is 32 MB, which you can modify by changing the value for thehive.exec.copyfile.maxsize
configuration property. - When loading data to encrypted tables, Cloudera strongly recommends using a
landing zone inside the same encryption zone as the table.
- Example 1: Loading unencrypted data to an encrypted
table - Use one of the following methods:
- If you are loading new unencrypted data to an encrypted
table, use the
LOAD DATA ...
statement. Because the source data is not inside the encryption zone, theLOAD
statement results in a copy. For this reason, Cloudera recommends landing data that you need to encrypt inside the destination encryption zone. You can usedistcp
to speed up the copying process if your data is inside HDFS. - If the data to be loaded is already inside a Hive table,
you can create a new table with a
LOCATION
inside an encryption zone as follows:
The location specified in theCREATE TABLE encrypted_table [STORED AS] LOCATION ... AS SELECT * FROM <unencrypted_table>
CREATE TABLE
statement must be inside an encryption zone. Creating a table pointingLOCATION
to an unencrypted directory does not encrypt your source data. You must copy your data to an encryption zone, and then pointLOCATION
to that zone.
- If you are loading new unencrypted data to an encrypted
table, use the
- Example 2: Loading encrypted data to an encrypted table
- If the data is already encrypted, use the
CREATE TABLE
statement pointingLOCATION
to the encrypted source directory containing the data. This is the fastest way to create encrypted tables.CREATE TABLE encrypted_table [STORED AS] LOCATION ... AS SELECT * FROM <encrypted_source_directory>
- Example 1: Loading unencrypted data to an encrypted
table - Use one of the following methods:
- Users reading data from encrypted tables that are read-only must have access to a temporary directory which is encrypted with at least as strong encryption as the table.
- Temporary data is now written to a directory named
.hive-staging
in each table or partition - Previously, an
INSERT OVERWRITE
on a partitioned table inherited permissions for new data from the existing partition directory. With encryption enabled, permissions are inherited from the table.