Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)Hive Configuration Resources | hive-config-resources | | | A file or comma separated list of files which contains the Hive configuration (hive-site.xml, e.g.). Without this, Hadoop will search the classpath for a 'hive-site.xml' file or will revert to a default configuration. Note that to enable authentication with Kerberos e.g., the appropriate properties must be set in the configuration files. Also note that if Max Concurrent Tasks is set to a number greater than one, the 'hcatalog.hive.client.cache.disabled' property will be forced to 'true' to avoid concurrency issues. Please see the Hive documentation for more details.
This property expects a comma-separated list of file resources.
Supports Expression Language: true (will be evaluated using variable registry only) |
Database Name | hive-stream-database-name | | | The name of the database in which to put the data. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
Table Name | hive-stream-table-name | | | The name of the database table in which to put the data. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
Partition Columns | hive-stream-partition-cols | | | A comma-delimited list of column names on which the table has been partitioned. The order of values in this list must correspond exactly to the order of partition columns specified during the table creation. Supports Expression Language: true (will be evaluated using variable registry only) |
Auto-Create Partitions | hive-stream-autocreate-partition | true | | Flag indicating whether partitions should be automatically created |
Max Open Connections | hive-stream-max-open-connections | 8 | | The maximum number of open connections that can be allocated from this pool at the same time, or negative for no limit. |
Heartbeat Interval | hive-stream-heartbeat-interval | 60 | | Indicates that a heartbeat should be sent when the specified number of seconds has elapsed. A value of 0 indicates that no heartbeat should be sent. Note that although this property supports Expression Language, it will not be evaluated against incoming FlowFile attributes. Supports Expression Language: true (will be evaluated using variable registry only) |
Transactions per Batch | hive-stream-transactions-per-batch | 100 | | A hint to Hive Streaming indicating how many transactions the processor task will need. This value must be greater than 1. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
Records per Transaction | hive-stream-records-per-transaction | 10000 | | Number of records to process before committing the transaction. This value must be greater than 1. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
Call Timeout | hive-stream-call-timeout | 0 | | The number of seconds allowed for a Hive Streaming operation to complete. A value of 0 indicates the processor should wait indefinitely on operations. Note that although this property supports Expression Language, it will not be evaluated against incoming FlowFile attributes. Supports Expression Language: true (will be evaluated using variable registry only) |
Rollback On Failure | rollback-on-failure | false | | Specify how to handle error. By default (false), if an error occurs while processing a FlowFile, the FlowFile will be routed to 'failure' or 'retry' relationship based on error type, and processor can continue with next FlowFile. Instead, you may want to rollback currently processed FlowFiles and stop further processing immediately. In that case, you can do so by enabling this 'Rollback On Failure' property. If enabled, failed FlowFiles will stay in the input relationship without penalizing it and being processed repeatedly until it gets processed successfully or removed by other means. It is important to set adequate 'Yield Duration' to avoid retrying too frequently.NOTE: When an error occurred after a Hive streaming transaction which is derived from the same input FlowFile is already committed, (i.e. a FlowFile contains more records than 'Records per Transaction' and a failure occurred at the 2nd transaction or later) then the succeeded records will be transferred to 'success' relationship while the original input FlowFile stays in incoming queue. Duplicated records can be created for the succeeded ones when the same FlowFile is processed again. |
Kerberos Credentials Service | kerberos-credentials-service | | Controller Service API: KerberosCredentialsService Implementation: KeytabCredentialsService | Specifies the Kerberos Credentials Controller Service that should be used for authenticating with Kerberos |
Kerberos Principal | Kerberos Principal | | | Kerberos principal to authenticate as. Requires nifi.kerberos.krb5.file to be set in your nifi.properties Supports Expression Language: true (will be evaluated using variable registry only) |
Kerberos Keytab | Kerberos Keytab | | | Kerberos keytab associated with the principal. Requires nifi.kerberos.krb5.file to be set in your nifi.properties
This property requires exactly one file to be provided..
Supports Expression Language: true (will be evaluated using variable registry only) |
Kerberos Password | Kerberos Password | | | Kerberos password associated with the principal. Sensitive Property: true |
Relationships:
Name | Description |
---|
retry | The incoming FlowFile is routed to this relationship if its records cannot be transmitted to Hive. Note that some records may have been processed successfully, they will be routed (as Avro flow files) to the success relationship. The combination of the retry, success, and failure relationships indicate how many records succeeded and/or failed. This can be used to provide a retry capability since full rollback is not possible. |
success | A FlowFile containing Avro records routed to this relationship after the record has been successfully transmitted to Hive. |
failure | A FlowFile containing Avro records routed to this relationship if the record could not be transmitted to Hive. |
Reads Attributes:
None specified.Writes Attributes:
Name | Description |
---|
hivestreaming.record.count | This attribute is written on the flow files routed to the 'success' and 'failure' relationships, and contains the number of records from the incoming flow file written successfully and unsuccessfully, respectively. |
query.output.tables | This attribute is written on the flow files routed to the 'success' and 'failure' relationships, and contains the target table name in 'databaseName.tableName' format. |
State management:
This component does not store state.Restricted:
This component is not restricted.System Resource Considerations:
None specified.