Executes a HiveQL DDL/DML command (UPDATE, INSERT, e.g.). The content of an incoming FlowFile is expected to be the HiveQL command to execute. The HiveQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention hiveql.args.N.type and hiveql.args.N.value, where N is a positive integer. The hiveql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format.
sql, hive, put, database, update, insert
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name | API Name | Default Value | Allowable Values | Description |
---|---|---|---|---|
Hive Database Connection Pooling Service | hive3-dbcp-service | Controller Service API: ClouderaHiveDBCPService Implementation: ClouderaHiveConnectionPool | The Hive Controller Service that is used to obtain connection(s) to the Hive database | |
Batch Size | hive-batch-size | 100 | The preferred number of FlowFiles to put to the database in a single transaction | |
Query timeout | hive3-query-timeout | 0 | Sets the number of seconds the driver will wait for a query to execute. A value of 0 means no timeout. NOTE: Non-zero values may not be supported by the driver. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) | |
Character Set | hive3-charset | UTF-8 | Specifies the character set of the record data. | |
Statement Delimiter | statement-delimiter | ; | Statement Delimiter used to separate SQL statements in a multiple statement script | |
Rollback On Failure | rollback-on-failure | false |
| Specify how to handle error. By default (false), if an error occurs while processing a FlowFile, the FlowFile will be routed to 'failure' or 'retry' relationship based on error type, and processor can continue with next FlowFile. Instead, you may want to rollback currently processed FlowFiles and stop further processing immediately. In that case, you can do so by enabling this 'Rollback On Failure' property. If enabled, failed FlowFiles will stay in the input relationship without penalizing it and being processed repeatedly until it gets processed successfully or removed by other means. It is important to set adequate 'Yield Duration' to avoid retrying too frequently. |
Name | Description |
---|---|
retry | A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed |
success | A FlowFile is routed to this relationship after the database is successfully updated |
failure | A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, such as an invalid query or an integrity constraint violation |
Name | Description |
---|---|
hiveql.args.N.type | Incoming FlowFiles are expected to be parametrized HiveQL statements. The type of each Parameter is specified as an integer that represents the JDBC Type of the parameter. |
hiveql.args.N.value | Incoming FlowFiles are expected to be parametrized HiveQL statements. The value of the Parameters are specified as hiveql.args.1.value, hiveql.args.2.value, hiveql.args.3.value, and so on. The type of the hiveql.args.1.value Parameter is specified by the hiveql.args.1.type attribute. |
Name | Description |
---|---|
query.input.tables | This attribute is written on the flow files routed to the 'success' relationships, and contains input table names (if any) in comma delimited 'databaseName.tableName' format. |
query.output.tables | This attribute is written on the flow files routed to the 'success' relationships, and contains the target table names in 'databaseName.tableName' format. |