Transparent Encryption Recommendations for Hive
HDFS encryption has been designed so that files cannot be moved from one encryption
zone to another or from encryption zones to unencrypted directories. Therefore, the landing zone
for data when using the LOAD DATA INPATH
command must always be inside the
destination encryption zone.
To use HDFS encryption with Hive, ensure you are using one of the following configurations:
Single Encryption Zone
With this configuration, you can use HDFS encryption by having all
Hive data inside the same encryption zone. In Cloudera Manager,
configure the Hive Scratch Directory
(hive.exec.scratchdir
) to be inside the encryption
zone.
Recommended HDFS Path:
/warehouse/tablespace/
To use the auto-generated KMS ACL, make sure you name the encryption key
hive-key
.
For example, to configure a single encryption zone for the entire Hive warehouse, you can
rename /warehouse/tablespace/
to
/warehouse/tablespace-old
, create an encryption zone at
/warehouse/tablespace/
, and then distcp
all the data
from /warehouse/tablespace-old
to
/warehouse/tablespace/
.
In Cloudera Manager, configure the Hive Scratch Directory
(hive.exec.scratchdir
) to be inside the encryption zone by setting it to
/warehouse/tablespace/tmp
, ensuring that permissions are
1777
on /warehouse/tablespace/tmp
.
Multiple Encryption Zones
With this configuration, you can use encrypted databases or tables with different encryption keys. To read data from read-only encrypted tables, users must have access to a temporary directory that is encrypted at least as strongly as the table.
For example:
- Configure two encrypted tables,
ezTbl1
andezTbl2
. - Create two new encryption zones,
/data/ezTbl1
and/data/ezTbl2
. - Load data to the tables in Hive using
LOAD
statements.
Other Encrypted Directories
LOCALSCRATCHDIR
: The MapJoin optimization in Hive writes HDFS tables to a local directory and then uploads them to the distributed cache. To ensure these files are encrypted, either disable MapJoin by settinghive.auto.convert.join
tofalse
, or encrypt the local Hive Scratch directory (hive.exec.local.scratchdir
) using Cloudera Navigator Encrypt.DOWNLOADED_RESOURCES_DIR
: JARs that are added to a user session and stored in HDFS are downloaded tohive.downloaded.resources.dir
on the HiveServer2 local filesystem. To encrypt these JAR files, configure Cloudera Navigator Encrypt to encrypt the directory specified byhive.downloaded.resources.dir
.- NodeManager Local Directory List: Hive stores JARs and MapJoin files in the
distributed cache. To use MapJoin or encrypt JARs and other resource files, the
yarn.nodemanager.local-dirs
YARN configuration property must be configured to a set of encrypted local directories on all nodes.