RecordReaderFactory
Implementations: Syslog5424Reader
CEFReader
ReaderLookup
CiscoEmblemSyslogMessageReader
CSVReader
GrokReader
SyslogReader
JsonTreeReader
JsonPathReader
XMLReader
AvroReader
JASN1Reader
ExcelReader
ParquetReader
EBCDICRecordReader
WindowsEventLogReader
IPFIXReader
ScriptedReader
The service for reading incoming flow files. The reader is only used to determine the schema of the records, the actual records will not be processed. | Hive Database Connection Pooling Service | hive-dbcp-service | | Controller Service API: HiveDBCPService Implementations: Hive3ConnectionPool HiveConnectionPool | The Hive Controller Service that is used to obtain connection(s) to the Hive database |
Table Name | hive-table-name | | | The name of the database table to update. If the table does not exist, then it will either be created or an error thrown, depending on the value of the Create Table property. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
Partition Clause | hive-partition-clause | | | Specifies a comma-separated list of attribute names and optional data types corresponding to the partition columns of the target table. Simply put, if the table is partitioned or is to be created with partitions, each partition name should be an attribute on the FlowFile and listed in this property. This assumes all incoming records belong to the same partition and the partition columns are not fields in the record. An example of specifying this field is if PartitionRecord is upstream and two partition columns 'name' (of type string) and 'age' (of type integer) are used, then this property can be set to 'name string, age int'. The data types are optional and if partition(s) are to be created they will default to string type if not specified. For non-string primitive types, specifying the data type for existing partition columns is helpful for interpreting the partition value(s). If the table exists, the data types need not be specified (and are ignored in that case). This property must be set if the table is partitioned, and there must be an attribute for each partition column in the table. The values of the attributes will be used as the partition values, and the resulting output.path attribute value will reflect the location of the partition in the filesystem (for use downstream in processors such as PutHDFS). Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
Create Table Strategy | hive-create-table | Fail If Not Exists | - Create If Not Exists
- Fail If Not Exists
| Specifies how to process the target table when it does not exist (create it, fail, e.g.). |
Create Table Management Strategy | hive-create-table-management | Managed | - Managed
- External
- Use 'hive.table.management.strategy' Attribute
| Specifies (when a table is to be created) whether the table is a managed table or an external table. Note that when External is specified, the 'External Table Location' property must be specified. If the 'hive.table.management.strategy' value is selected, 'External Table Location' must still be specified, but can contain Expression Language or be set to the empty string, and is ignored when the attribute evaluates to 'Managed'.
This Property is only considered if the [Create Table Strategy] Property has a value of "Create If Not Exists". |
External Table Location | hive-external-table-location | | | Specifies (when an external table is to be created) the file path (in HDFS, e.g.) to store table data. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
This Property is only considered if the [Create Table Management Strategy] Property is set to one of the following values: [Use 'hive.table.management.strategy' Attribute], [External] |
Create Table Storage Format | hive-storage-format | TEXTFILE | - TEXTFILE
- SEQUENCEFILE
- ORC
- PARQUET
- AVRO
- RCFILE
| If a table is to be created, the specified storage format will be used.
This Property is only considered if the [Create Table Strategy] Property has a value of "Create If Not Exists". |
Update Field Names | hive-update-field-names | false | | This property indicates whether to update the output schema such that the field names are set to the exact column names from the specified table. This should be used if the incoming record field names may not match the table's column names in terms of upper- and lower-case. For example, this property should be set to true if the output FlowFile (and target table storage) is Avro format, as Hive/Impala expects the field names to match the column names exactly. |
Record Writer | hive-record-writer | | Controller Service API: RecordSetWriterFactory Implementations: JsonRecordSetWriter ParquetRecordSetWriter CSVRecordSetWriter ScriptedRecordSetWriter XMLRecordSetWriter FreeFormTextRecordSetWriter AvroRecordSetWriter RecordSetWriterLookup | Specifies the Controller Service to use for writing results to a FlowFile. The Record Writer should use Inherit Schema to emulate the inferred schema behavior, i.e. an explicit schema need not be defined in the writer, and will be supplied by the same logic used to infer the schema from the column types. If Create Table Strategy is set 'Create If Not Exists', the Record Writer's output format must match the Record Reader's format in order for the data to be placed in the created table location. Note that this property is only used if 'Update Field Names' is set to true and the field names do not all match the column names exactly. If no update is needed for any field names (or 'Update Field Names' is false), the Record Writer is not used and instead the input FlowFile is routed to success or failure without modification.
This Property is only considered if the [Update Field Names] Property has a value of "true". |
Query Timeout | hive-query-timeout | 0 | | Sets the number of seconds the driver will wait for a query to execute. A value of 0 means no timeout. NOTE: Non-zero values may not be supported by the driver. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
Relationships:
Name | Description |
---|
success | A FlowFile containing records routed to this relationship after the record has been successfully transmitted to Hive. |
failure | A FlowFile containing records routed to this relationship if the record could not be transmitted to Hive. |
Reads Attributes:
Name | Description |
---|
hive.table.management.strategy | This attribute is read if the 'Table Management Strategy' property is configured to use the value of this attribute. The value of this attribute should correspond (ignoring case) to a valid option of the 'Table Management Strategy' property. |
Writes Attributes:
Name | Description |
---|
output.table | This attribute is written on the flow files routed to the 'success' and 'failure' relationships, and contains the target table name. |
output.path | This attribute is written on the flow files routed to the 'success' and 'failure' relationships, and contains the path on the file system to the table (or partition location if the table is partitioned). |
mime.type | Sets the mime.type attribute to the MIME Type specified by the Record Writer, only if a Record Writer is specified and Update Field Names is 'true'. |
record.count | Sets the number of records in the FlowFile, only if a Record Writer is specified and Update Field Names is 'true'. |
State management:
This component does not store state.Restricted:
This component is not restricted.Input requirement:
This component requires an incoming relationship.System Resource Considerations:
None specified.