Merge process stops during Sqoop incremental imports
During Sqoop incremental import operations, if the target directory is located outside of Hadoop Distributed File System (HDFS), such as in Amazon S3 or Azure Blob Storage, the merge phase of the import process does not take effect.
Condition
Sqoop, by default, creates temporary directories within HDFS. However, you must be aware of certain considerations in choosing the target directory location while working with Sqoop's incremental import modes. By default, Sqoop operates seamlessly when the target directory resides within HDFS. However, the merge phase of the import process does not take effect outside the box if the target directory is located outside of HDFS.
Cause
During an import operation, Sqoop generally imports data to a target directory. If this target directory is a non-HDFS location, the merge process tries to acquire the temporary directory required for the merge on the same non-HDFS file system. Since Sqoop creates the temporary directory in HDFS by default, the merge process checks if the temporary directory exists in the target directory's file system and when it does not find it, the merge process simply stops.
Solution
--temporary-rootdir
Sqoop option and pointing to a
path on the same file where the target directory is located. By aligning the
temporary directory path with the file system of the target directory, Sqoop can
effectively complete the import process.Example:
--temporary-rootdir
Sqoop option as shown
below:sqoop-import --connect jdbc:mysql://.../transaction --username [***USER NAME***] --table [***TABLE NAME***] --password [***PASSWORD***] --target-dir abfs://foo@bar/targetdir -m 1 --temporary-rootdir abfs://foo@bar/_sqoop