PutHiveStreaming

Deprecation notice:

Please be aware this processor is deprecated and may be removed in the near future.

Please consider using one the following alternatives: PutHive3Streaming

Description:

This processor uses Hive Streaming to send flow file data to an Apache Hive table. The incoming flow file is expected to be in Avro format and the table must exist in Hive. Please see the Hive documentation for requirements on the Hive table (format, partitions, etc.). The partition values are extracted from the Avro record based on the names of the partition columns as specified in the processor. NOTE: If multiple concurrent tasks are configured for this processor, only one table can be written to at any time by a single thread. Additional tasks intending to write to the same table will wait for the current task to finish writing to the table.

Tags:

hive, streaming, put, database, store

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display Name	API Name	Default Value	Allowable Values	Description
Hive Metastore URI	hive-stream-metastore-uri			The URI location for the Hive Metastore. Note that this is not the location of the Hive Server. The default port for the Hive metastore is 9043. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Hive Configuration Resources	hive-config-resources			A file or comma separated list of files which contains the Hive configuration (hive-site.xml, e.g.). Without this, Hadoop will search the classpath for a 'hive-site.xml' file or will revert to a default configuration. Note that to enable authentication with Kerberos e.g., the appropriate properties must be set in the configuration files. Also note that if Max Concurrent Tasks is set to a number greater than one, the 'hcatalog.hive.client.cache.disabled' property will be forced to 'true' to avoid concurrency issues. Please see the Hive documentation for more details. This property expects a comma-separated list of file resources. Supports Expression Language: true (will be evaluated using variable registry only)
Database Name	hive-stream-database-name			The name of the database in which to put the data. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Table Name	hive-stream-table-name			The name of the database table in which to put the data. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Partition Columns	hive-stream-partition-cols			A comma-delimited list of column names on which the table has been partitioned. The order of values in this list must correspond exactly to the order of partition columns specified during the table creation. Supports Expression Language: true (will be evaluated using variable registry only)
Auto-Create Partitions	hive-stream-autocreate-partition	true	true false	Flag indicating whether partitions should be automatically created
Max Open Connections	hive-stream-max-open-connections	8		The maximum number of open connections that can be allocated from this pool at the same time, or negative for no limit.
Heartbeat Interval	hive-stream-heartbeat-interval	60		Indicates that a heartbeat should be sent when the specified number of seconds has elapsed. A value of 0 indicates that no heartbeat should be sent. Note that although this property supports Expression Language, it will not be evaluated against incoming FlowFile attributes. Supports Expression Language: true (will be evaluated using variable registry only)
Transactions per Batch	hive-stream-transactions-per-batch	100		A hint to Hive Streaming indicating how many transactions the processor task will need. This value must be greater than 1. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Records per Transaction	hive-stream-records-per-transaction	10000		Number of records to process before committing the transaction. This value must be greater than 1. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Call Timeout	hive-stream-call-timeout	0		The number of seconds allowed for a Hive Streaming operation to complete. A value of 0 indicates the processor should wait indefinitely on operations. Note that although this property supports Expression Language, it will not be evaluated against incoming FlowFile attributes. Supports Expression Language: true (will be evaluated using variable registry only)
Rollback On Failure	rollback-on-failure	false	true false	Specify how to handle error. By default (false), if an error occurs while processing a FlowFile, the FlowFile will be routed to 'failure' or 'retry' relationship based on error type, and processor can continue with next FlowFile. Instead, you may want to rollback currently processed FlowFiles and stop further processing immediately. In that case, you can do so by enabling this 'Rollback On Failure' property. If enabled, failed FlowFiles will stay in the input relationship without penalizing it and being processed repeatedly until it gets processed successfully or removed by other means. It is important to set adequate 'Yield Duration' to avoid retrying too frequently.NOTE: When an error occurred after a Hive streaming transaction which is derived from the same input FlowFile is already committed, (i.e. a FlowFile contains more records than 'Records per Transaction' and a failure occurred at the 2nd transaction or later) then the succeeded records will be transferred to 'success' relationship while the original input FlowFile stays in incoming queue. Duplicated records can be created for the succeeded ones when the same FlowFile is processed again.
Kerberos Credentials Service	kerberos-credentials-service		Controller Service API: KerberosCredentialsService Implementation: KeytabCredentialsService	Specifies the Kerberos Credentials Controller Service that should be used for authenticating with Kerberos
Kerberos Principal	Kerberos Principal			Kerberos principal to authenticate as. Requires nifi.kerberos.krb5.file to be set in your nifi.properties Supports Expression Language: true (will be evaluated using variable registry only)
Kerberos Keytab	Kerberos Keytab			Kerberos keytab associated with the principal. Requires nifi.kerberos.krb5.file to be set in your nifi.properties This property requires exactly one file to be provided.. Supports Expression Language: true (will be evaluated using variable registry only)
Kerberos Password	Kerberos Password			Kerberos password associated with the principal. Sensitive Property: true

Relationships:

Name	Description
retry	The incoming FlowFile is routed to this relationship if its records cannot be transmitted to Hive. Note that some records may have been processed successfully, they will be routed (as Avro flow files) to the success relationship. The combination of the retry, success, and failure relationships indicate how many records succeeded and/or failed. This can be used to provide a retry capability since full rollback is not possible.
success	A FlowFile containing Avro records routed to this relationship after the record has been successfully transmitted to Hive.
failure	A FlowFile containing Avro records routed to this relationship if the record could not be transmitted to Hive.

Reads Attributes:

None specified.

Writes Attributes:

Name	Description
hivestreaming.record.count	This attribute is written on the flow files routed to the 'success' and 'failure' relationships, and contains the number of records from the incoming flow file written successfully and unsuccessfully, respectively.
query.output.tables	This attribute is written on the flow files routed to the 'success' and 'failure' relationships, and contains the target table name in 'databaseName.tableName' format.

State management:

This component does not store state.

Restricted:

This component is not restricted.

System Resource Considerations:

None specified.