Configuring Apache HDFS Encryption
Also available as:
PDF
loading table of contents...

Loading Data into an Encrypted Table

By design, HDFS-encrypted files cannot be moved or loaded from one encryption zone into another encryption zone, or from an encryption zone into an unencrypted directory. Encrypted files can only be copied.

Within an encryption zone, files can be copied, moved, loaded, and renamed.

Recommendations:

  • When loading unencrypted data into encrypted tables (e.g., LOAD DATA INPATH), we recommend placing the source data (to be encrypted) into a landing zone within the destination encryption zone.

  • An attempt to load data from one encryption zone into another will result in a copy operation. Distcp will be used to speed up the process if the size of the files being copied is higher than the value specified by the hive.exec.copyfile.maxsize property. The default limit is 32 MB.

Here are two approaches for loading unencrypted data into an encrypted table:
  • To load unencrypted data into an encrypted table, use the LOAD DATA ... statement.

    If the source data does not reside inside the encryption zone, the LOAD statement will result in a copy. If your data is already inside HDFS, though, you can use distcp to speed up the copying process.

  • If the data is already inside a Hive table, create a new table with a LOCATION inside an encryption zone, as follows:

    CREATE TABLE encrypted_table [STORED AS] LOCATION ... AS SELECT * FROM <unencrypted_table>

    Note
    Note

    The location specified in the CREATE TABLE statement must be within an encryption zone. If you create a table that points LOCATION to an unencrypted directory, your data will not be encrypted. You must copy your data to an encryption zone, and then point LOCATION to that encryption zone.

If your source data is already encrypted, use the CREATE TABLE statement. Point LOCATION to the encrypted source directory where your data resides:

CREATE TABLE encrypted_table [STORED AS] LOCATION ... AS SELECT * FROM <encrypted_source_directory>

This is the fastest way to create encrypted tables.