Transparent Encryption Recommendations for Hive
HDFS encryption has been designed so that files cannot be moved from one encryption
zone to another or from encryption zones to unencrypted directories. Therefore, the landing zone
for data when using the LOAD DATA INPATH command must always be inside the
destination encryption zone.
To use HDFS encryption with Hive, ensure you are using one of the following configurations:
Single Encryption Zone
With this configuration, you can use HDFS encryption by having all
Hive data inside the same encryption zone. In Cloudera Manager,
configure the Hive Scratch Directory
(hive.exec.scratchdir) to be inside the encryption
zone.
Recommended HDFS Path:
/warehouse/tablespace/
To use the auto-generated KMS ACL, make sure you name the encryption key
hive-key.
For example, to configure a single encryption zone for the entire Hive warehouse, you can
rename /warehouse/tablespace/ to
/warehouse/tablespace-old, create an encryption zone at
/warehouse/tablespace/, and then distcp all the data
from /warehouse/tablespace-old to
/warehouse/tablespace/.
In Cloudera Manager, configure the Hive Scratch Directory
(hive.exec.scratchdir) to be inside the encryption zone by setting it to
/warehouse/tablespace/tmp, ensuring that permissions are
1777 on /warehouse/tablespace/tmp.
Multiple Encryption Zones
With this configuration, you can use encrypted databases or tables with different encryption keys. To read data from read-only encrypted tables, users must have access to a temporary directory that is encrypted at least as strongly as the table.
For example:
- Configure two encrypted tables,
ezTbl1andezTbl2. - Create two new encryption zones,
/data/ezTbl1and/data/ezTbl2. - Load data to the tables in Hive using
LOADstatements.
Other Encrypted Directories
LOCALSCRATCHDIR: The MapJoin optimization in Hive writes HDFS tables to a local directory and then uploads them to the distributed cache. To ensure these files are encrypted, either disable MapJoin by settinghive.auto.convert.jointofalse, or encrypt the local Hive Scratch directory (hive.exec.local.scratchdir) using Cloudera Navigator Encrypt.DOWNLOADED_RESOURCES_DIR: JARs that are added to a user session and stored in HDFS are downloaded tohive.downloaded.resources.diron the HiveServer2 local filesystem. To encrypt these JAR files, configure Cloudera Navigator Encrypt to encrypt the directory specified byhive.downloaded.resources.dir.- NodeManager Local Directory List: Hive stores JARs and MapJoin files in the
distributed cache. To use MapJoin or encrypt JARs and other resource files, the
yarn.nodemanager.local-dirsYARN configuration property must be configured to a set of encrypted local directories on all nodes.
