Configuring Encryption for Data Spills
Certain CDH services have the ability to encrypt data that lives temporarily on the local filesystem outside HDFS. This usually includes data that may spill to disk when operations are too memory intensive and the service exceeds its allotted memory limit on a host. You can enable on-disk spill encryption for the following services:
MapReduce v2 (YARN)
mapreduce.job.encrypted-intermediate-data | Enable or disable encryption for intermediate MapReduce spills.
Default: false |
mapreduce.job.encrypted-intermediate-data-key-size-bits | The key length used to encrypt data spilled to disk.
Default: 128 |
mapreduce.job.encrypted-intermediate-data.buffer.kb | The buffer size in kb for stream written to disk after encryption.
Default: 128 |
HBase
HBase does not write data outside HDFS, and does not require spill encryption.
Impala
Impala allows certain memory-intensive operations to be able to write temporary data to disk in case these operations come close to exceeding their memory limit on a host. For details, read SQL Operations that Spill to Disk. To enable disk spill encryption in Impala:
- Go to the Cloudera Manager Admin Console.
- Click the Configuration tab.
- Select .
- Select .
- Check the checkbox for the Disk Spill Encryption property.
- Click Save Changes to commit the changes.
Hive
Hive jobs occasionally write data temporarily to local directories. If you enable HDFS encryption, then you must ensure that the following intermediate local directories are also protected:
- LOCALSCRATCHDIR: The MapJoin optimization in Hive writes HDFS tables to a local directory and then uploads them to the distributed cache. To ensure these files are encrypted, either disable MapJoin by setting hive.auto.convert.join to false, or encrypt the local Hive Scratch directory (hive.exec.local.scratchdir) using Cloudera Navigator Encrypt.
- DOWNLOADED_RESOURCES_DIR: JARs that are added to a user session and stored in HDFS are downloaded to hive.downloaded.resources.dir on the HiveServer2 local filesystem. To encrypt these JAR files, configure Cloudera Navigator Encrypt to encrypt the directory specified by hive.downloaded.resources.dir.
- NodeManager Local Directory List: Hive stores JARs and MapJoin files in the distributed cache. To use MapJoin or encrypt JARs and other resource files, the yarn.nodemanager.local-dirs YARN configuration property must be configured to a set of encrypted local directories on all nodes.
For more information on Hive behavior with HDFS encryption enabled, see Using HDFS Encryption with Hive.
Flume
Flume supports on-disk encryption for log files written by the Flume file channels. See Using an On-disk Encrypted File Channel.