Import Data into S3 Bucket in Incremental Mode
The --temporary-rootdir
option must be set to point to a location in the
S3 bucket to import data into an S3 bucket in incremental mode.
Append Mode
When importing data into a target directory in an Amazon S3 bucket in incremental
append mode, the location of the temporary root directory must be in the same bucket as
the directory. For example: s3a://example-bucket/temporary-rootdir
or
s3a://example-bucket/target-directory/temporary-rootdir
.
Example command: Import data into a target directory in an Amazon S3 bucket in incremental append mode.
sqoop import --connect $CONN --username $USER --password $PWD --table $TABLE_NAME --target-dir s3a://example-bucket/target-directory --incremental append --check-column $CHECK_COLUMN --last-value $LAST_VALUE --temporary-rootdir s3a://example-bucket/temporary-rootdir
Data from RDBMS can be imported into S3 in incremental append mode as Sequence or Avro file format. too
Parquet import into S3 in incremental append mode is also supported
if the Parquet Hadoop API based implementation is used, meaning that the --parquet-configurator-implementation
option is
set to hadoop
.
Example command: Import data into a target directory in an Amazon S3 bucket in incremental append mode as Parquet file.
sqoop import --connect $CONN --username $USER --password $PWD --table $TABLE_NAME --target-dir s3a://example-bucket/target-directory --incremental append --check-column $CHECK_COLUMN --last-value $LAST_VALUE --temporary-rootdir s3a://example-bucket/temporary-rootdir --as-parquetfile --parquet-configurator-implementation hadoop
Lastmodified Mode
When importing data into a target directory in an Amazon S3 bucket in incremental
lastmodified mode, the location of the temporary root directory must be in the same
bucket and in the same directory as the target directory. For example:
s3a://example-bucket/temporary-rootdir
in case of
s3a://example-bucket/target-directory
.
Example command: Import data into a target directory in an Amazon S3 bucket in incremental lastmodified mode.
sqoop import --connect $CONN --username $USER --password $PWD --table $TABLE_NAME --target-dir s3a://example-bucket/target-directory --incremental lastmodified --check-column $CHECK_COLUMN --merge-key $MERGE_KEY --last-value $LAST_VALUE --temporary-rootdir s3a://example-bucket/temporary-rootdir
Parquet import into S3 in incremental lastmodified mode is supported
if the Parquet Hadoop API based implementation is used, meaning that the --parquet-configurator-implementation
option is
set to hadoop
.
Example command: Import data into a target directory in an Amazon S3 bucket in incremental lastmodified mode as Parquet file.
sqoop import --connect $CONN --username $USER --password $PWD --table $TABLE_NAME --target-dir s3a://example-bucket/target-directory --incremental lastmodified --check-column $CHECK_COLUMN --merge-key $MERGE_KEY --last-value $LAST_VALUE --temporary-rootdir s3a://example-bucket/temporary-rootdir
--as-parquetfile --parquet-configurator-implementation hadoop