Data Movement and Integration
Also available as:
PDF

Netezza Connector

Netezza connector for Sqoop is an implementation of the Sqoop connector interface for accessing a Netezza data warehouse appliance, so that data can be exported and imported to a Hadoop environment from Netezza data warehousing environments.

The HDP 2 Sqoop distribution includes Netezza connector software. To deploy it, the only requirement is that you acquire the JDBC jar file (named nzjdbc.jar) from IBM and copy it to the /usr/local/nz/lib directory.

Extra Arguments

This table describes extra arguments supported by the Netezza connector:

ArgumentDescription

--partitioned-access

Whether each mapper acts on a subset of data slices of a table or all.

-max-errors

Applicable only in direct mode. This option specifies the error threshold per mapper while transferring data. If the number of errors encountered exceeds this threshold, the job fails.

--log-dir

Applicable only in direct mode. Specifies the directory where Netezza external table operation logs are stored

Direct Mode

Netezza connector supports an optimized data transfer facility using the Netezza external tables feature. Each map task of Netezza connector’s import job works on a subset of the Netezza partitions and transparently creates and uses an external table to transport data.

Similarly, export jobs use the external table to push data quickly onto the NZ system. Direct mode does not support staging tables and upsert options.

Direct mode is specified by the --direct Sqoop option.

Here is an example of a complete command line for import using the Netezza external table feature:

$ sqoop import \
 --direct \
 --connect jdbc:netezza://nzhost:5480/sqoop \
 --table nztable \
 --username nzuser \
 --password nzpass \
 --target-dir hdfsdir 

Here is an example of a complete command line for export with tab (\t) as the field terminator character:

$ sqoop export \
 --direct \
 --connect jdbc:netezza://nzhost:5480/sqoop \
 --table nztable \
 --username nzuser \
 --password nzpass \
 --export-dir hdfsdir \
 --input-fields-terminated-by "\t"

Null String Handling

In direct mode the Netezza connector supports the null-string features of Sqoop. Null string values are converted to appropriate external table options during export and import operations.

ArgumentDescription

--input-null-non-string <null-string>

The string to be interpreted as null for non-string columns.

--input-null-non-string <null-string>

The string to be interpreted as null for non-string columns.

In direct mode, both the arguments must either be left to the default values or explicitly set to the same value. The null string value is restricted to 0-4 UTF-8 characters.

On export, for non-string columns, if the chosen null value is a valid representation in the column domain, then the column might not be loaded as null. For example, if the null string value is specified as "1", then on export, any occurrence of "1" in the input file will be loaded as value 1 instead of NULL for int columns.

For performance and consistency, specify the null value as an empty string.

Supported Import Control Arguments

ArgumentDescription

--null-string <null-string>

The string to be interpreted as null for string columns

--null-non-string <null-string>

The string to be interpreted as null for non-string columns.

In direct mode, both the arguments must either be left to the default values or explicitly set to the same value. The null string value is restricted to 0-4 UTF-8 characters.

On import, for non-string columns in the current implementation, the chosen null value representation is ignored for non-character columns. For example, if the null string value is specified as "\N", then on import, any occurrence of NULL for non-char columns in the table will be imported as an empty string instead of \N, the chosen null string representation.

For performance and consistency, specify the null value as an empty string.