Import Data into an External Hive Table Backed by S3
The AWS credentials must be set in the Hive configuration file (hive-site.xml) to import data from RDBMS into an external Hive table backed by S3. The configuration file can be edited manually or by using the advanced configuration snippets.
Both --target-dir
and --external-table-dir
options have
to be set. The --external-table-dir
has to point to the Hive table
location in the S3 bucket.
Parquet
import into an external Hive table backed by S3 is supported if the Parquet Hadoop API based
implementation is used, meaning that the --parquet-configurator-implementation
option is set to hadoop
.
Example Commands: Create an External Hive Table Backed by S3
Create an external Hive table backed by S3 using
HiveServer2:
sqoop import --connect $CONN --username $USER --password $PWD --table $TABLE_NAME --hive-import --create-hive-table --hs2-url $HS2_URL --hs2-user $HS2_USER --hs2-keytab $HS2_KEYTAB --hive-table $HIVE_TABLE_NAME --target-dir s3a://example-bucket/target-directory --external-table-dir s3a://example-bucket/external-directory
Create and external Hive table backed by S3 using Hive
CLI:
sqoop import --connect $CONN --username $USER --password $PWD --table $TABLE_NAME --hive-import --create-hive-table --hive-table $HIVE_TABLE_NAME --target-dir s3a://example-bucket/target-directory --external-table-dir s3a://example-bucket/external-directory
Create an external Hive table backed by S3 as Parquet file using Hive
CLI:
sqoop import --connect $CONN --username $USER --password $PWD --table $TABLE_NAME --hive-import --create-hive-table --hive-table $HIVE_TABLE_NAME --target-dir s3a://example-bucket/target-directory --external-table-dir s3a://example-bucket/external-directory --as-parquetfile --parquet-configurator-implementation hadoop
Example Commands: Import Data into an External Hive Table Backed by S3
Import data into an external Hive table backed by S3 using
HiveServer2:
sqoop import --connect $CONN --username $USER --password $PWD --table $TABLE_NAME --hive-import --hs2-url $HS2_URL --hs2-user $HS2_USER --hs2-keytab $HS2_KEYTAB --target-dir s3a://example-bucket/target-directory --external-table-dir s3a://example-bucket/external-directory
Import data into an external Hive table backed by S3 using Hive
CLI:
sqoop import --connect $CONN --username $USER --password $PWD --table $TABLE_NAME --hive-import --target-dir s3a://example-bucket/target-directory --external-table-dir s3a://example-bucket/external-directory
Import data into an external Hive table backed by S3 as Parquet file using Hive
CLI:
sqoop import --connect $CONN --username $USER --password $PWD --table $TABLE_NAME --hive-import --target-dir s3a://example-bucket/target-directory --external-table-dir s3a://example-bucket/external-directory --as-parquetfile --parquet-configurator-implementation hadoop