HBase
HBase stores all of its data under its root directory in HDFS, configured with
hbase.rootdir
. The only other directory that the HBase service will
read or write is hbase.bulkload.staging.dir
.
On HDP clusters, hbase.rootdir
is typically configured as
/apps/hbase/data
, and hbase.bulkload.staging.dir
is
configured as /apps/hbase/staging
. HBase data, including the root
directory and staging directory, can reside in an encryption zone on HDFS.
The HBase service user needs to be granted access to the encryption key in the Ranger KMS, because it performs tasks that require access to HBase data (unlike Hive or HDFS).
By design, HDFS-encrypted files cannot be bulk-loaded from one encryption zone into another encryption zone, or from an encryption zone into an unencrypted directory. Encrypted files can only be copied. An attempt to load data from one encryption zone into another will result in a copy operation. Within an encryption zone, files can be copied, moved, bulk-loaded, and renamed.
Recommendations
Make the parent directory for the HBase root directory and bulk load staging directory an encryption zone, instead of just the HBase root directory. This is because HBase bulk load operations need to move files from the staging directory into the root directory.
In typical deployments,
/apps/hbase
can be made an encryption zone.Do not create encryption zones as subdirectories under
/apps/hbase
, because HBase may need to rename files across those subdirectories.The landing zone for unencrypted data should always be within the destination encryption zone.
Steps
On a cluster without HBase currently installed:
Create the
/apps/hbase
directory, and make it an encryption zone.Configure
hbase.rootdir=/apps/hbase/data
.Configure
hbase.bulkload.staging.dir=/apps/hbase/staging
.
On a cluster with HBase already installed, perform the following steps:
Stop the HBase service.
Rename the
/apps/hbase
directory to/apps/hbase-tmp
.Create an empty
/apps/hbase
directory, and make it an encryption zone.DistCp -skipcrccheck -update
all data from/apps/hbase-tmp
to/apps/hbase
, preserving user-group permissions and extended attributes.Start the HBase service and verify that it is working as expected.
Remove the
/apps/hbase-tmp
directory.
Changes in Behavior after HDFS Encryption is Enabled
The HBase bulk load process is a MapReduce job that typically runs under the user who owns the source data.
HBase data files created as a result of the job are then bulk loaded in to HBase RegionServers.
During this process, HBase RegionServers move the bulk-loaded files from the user's directory and move (rename)
the files into the HBase root directory (/apps/hbase/data
).
When data at rest encryption is used, HDFS cannot do a rename across encryption zones with different keys.
Workaround: run the MapReduce job as the hbase
user, and specify an output directory that resides in the same
encryption zone as the HBase root directory.