SelectClouderaHiveQL 2.3.0.4.10.0.0-147

Bundle
com.cloudera | nifi-cdf-hive-nar
Description
This component uses the Hive client version 3.1.3000.7.3.1.400-76. Execute provided HiveQL SELECT query against a Hive database connection. Query result will be converted to Avro or CSV format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute 'selecthiveql.row.count' indicates how many rows were selected.
Tags
database, hive, jdbc, query, select, sql
Input Requirement
ALLOWED
Supports Sensitive Dynamic Properties
false
Properties
Relationships
Name Description
success Successfully created FlowFile from HiveQL query result set.
failure HiveQL query execution failed. Incoming FlowFile will be penalized and routed to this relationship.
Writes Attributes
Name Description
mime.type Sets the MIME type for the outgoing flowfile to application/avro-binary for Avro or text/csv for CSV.
filename Adds .avro or .csv to the filename attribute depending on which output format is selected.
selecthiveql.row.count Indicates how many rows were selected/returned by the query.
selecthiveql.query.duration Combined duration of the query execution time and fetch time in milliseconds. If 'Max Rows Per Flow File' is set, then this number will reflect only the fetch time for the rows in the Flow File instead of the entire result set.
selecthiveql.query.executiontime Duration of the query execution time in milliseconds. This number will reflect the query execution time regardless of the 'Max Rows Per Flow File' setting.
selecthiveql.query.fetchtime Duration of the result set fetch time in milliseconds. If 'Max Rows Per Flow File' is set, then this number will reflect only the fetch time for the rows in the Flow File instead of the entire result set.
fragment.identifier If 'Max Rows Per Flow File' is set then all FlowFiles from the same query result set will have the same value for the fragment.identifier attribute. This can then be used to correlate the results.
fragment.count If 'Max Rows Per Flow File' is set then this is the total number of FlowFiles produced by a single ResultSet. This can be used in conjunction with the fragment.identifier attribute in order to know how many FlowFiles belonged to the same incoming ResultSet.
fragment.index If 'Max Rows Per Flow File' is set then the position of this FlowFile in the list of outgoing FlowFiles that were all derived from the same result set FlowFile. This can be used in conjunction with the fragment.identifier attribute to know which FlowFiles originated from the same query result set and in what order FlowFiles were produced
query.input.tables Contains input table names in comma delimited 'databaseName.tableName' format.