PutKudu

Description:

Reads records from an incoming FlowFile using the provided Record Reader, and writes those records to the specified Kudu's table. The schema for the Kudu table is inferred from the schema of the Record Reader. If any error occurs while reading records from the input, or writing records to Kudu, the FlowFile will be routed to failure

Additional Details...

Tags:

put, database, NoSQL, kudu, HDFS, record

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Kudu MastersKudu MastersComma separated addresses of the Kudu masters to connect to.
Supports Expression Language: true (will be evaluated using Environment variables only)
Table NameTable NameThe name of the Kudu Table to put data into
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Failure StrategyFailure StrategyRoute to Failure
  • Route to Failure The FlowFile containing the Records that failed to insert will be routed to the 'failure' relationship
  • Rollback Session If any Record cannot be inserted, all FlowFiles in the session will be rolled back to their input queue. This means that if data cannot be pushed, it will block any subsequent data from be pushed to Kudu as well until the issue is resolved. However, this may be advantageous if a strict ordering is required.
If one or more Records in a batch cannot be transferred to Kudu, specifies how to handle the failure
Kerberos User Servicekerberos-user-serviceController Service API:
KerberosUserService
Implementations: KerberosTicketCacheUserService
KerberosKeytabUserService
KerberosPasswordUserService
Specifies the Kerberos User Controller Service that should be used for authenticating with Kerberos
Skip head lineSkip head linefalse
  • true
  • false
Deprecated. Used to ignore header lines, but this should be handled by a RecordReader (e.g. "Treat First Line as Header" property of CSVReader)
Lowercase Field NamesLowercase Field NamesfalseConvert column names to lowercase when finding index of Kudu table columns
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Handle Schema DriftHandle Schema DriftfalseIf set to true, when fields with names that are not in the target Kudu table are encountered, the Kudu table will be altered to include new columns for those fields.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Record Readerrecord-readerController Service API:
RecordReaderFactory
Implementations: JASN1Reader
JsonTreeReader
GrokReader
Syslog5424Reader
CiscoEmblemSyslogMessageReader
AvroReader
JsonPathReader
CEFReader
IPFIXReader
WindowsEventLogReader
XMLReader
ScriptedReader
ReaderLookup
YamlTreeReader
ParquetReader
CSVReader
EBCDICRecordReader
ExcelReader
SyslogReader
The service for reading records from incoming flow files.
Data RecordPathData RecordPathIf specified, this property denotes a RecordPath that will be evaluated against each incoming Record and the Record that results from evaluating the RecordPath will be sent to Kudu instead of sending the entire incoming Record. If not specified, the entire incoming Record will be published to Kudu.
Operation RecordPathOperation RecordPathIf specified, this property denotes a RecordPath that will be evaluated against each incoming Record in order to determine the Kudu Operation Type. When evaluated, the RecordPath must evaluate to one of the valid Kudu Operation Types (Debezium style operation types are also supported: "r" and "c" for INSERT, "u" for UPDATE, and "d" for DELETE), or the incoming FlowFile will be routed to failure. If this property is specified, the <Kudu Operation Type> property will be ignored.
Kudu Operation TypeInsert OperationINSERTSpecify operationType for this processor. Valid values are: INSERT, INSERT_IGNORE, UPSERT, UPDATE, DELETE, UPDATE_IGNORE, DELETE_IGNORE. This Property will be ignored if the <Operation RecordPath> property is set.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Flush ModeFlush ModeAUTO_FLUSH_BACKGROUND
  • AUTO_FLUSH_SYNC
  • AUTO_FLUSH_BACKGROUND
  • MANUAL_FLUSH
Set the new flush mode for a kudu session. AUTO_FLUSH_SYNC: the call returns when the operation is persisted, else it throws an exception. AUTO_FLUSH_BACKGROUND: the call returns when the operation has been added to the buffer. This call should normally perform only fast in-memory operations but it may have to wait when the buffer is full and there's another buffer being flushed. "MANUAL_FLUSH: the call returns when the operation has been added to the buffer, else it throws a KuduException if the buffer is full.
FlowFiles per BatchFlowFiles per Batch1The maximum number of FlowFiles to process in a single execution, between 1 and 100,000. Depending on your memory size, and data size per row set an appropriate batch size for the number of FlowFiles to process per client connection setup.Gradually increase this number, only if your FlowFiles typically contain a few records.
Supports Expression Language: true (will be evaluated using Environment variables only)
Max Records per BatchBatch Size100The maximum number of Records to process in a single Kudu-client batch, between 1 - 100000. Depending on your memory size, and data size per row set an appropriate batch size. Gradually increase this number to find out the best one for best performances.
Supports Expression Language: true (will be evaluated using Environment variables only)
Ignore NULLIgnore NULLfalseIgnore NULL on Kudu Put Operation, Update only non-Null columns if set true
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Kudu Operation Timeoutkudu-operations-timeout-ms30000msDefault timeout used for user operations (using sessions and scanners)
Supports Expression Language: true (will be evaluated using Environment variables only)
Kudu Keep Alive Period Timeoutkudu-keep-alive-period-timeout-ms15000msDefault timeout used for user operations
Supports Expression Language: true (will be evaluated using Environment variables only)
Kudu Client Worker Countworker-count2The maximum number of worker threads handling Kudu client read and write operations. Defaults to the number of available processors.
Kudu SASL Protocol Namekudu-sasl-protocol-namekuduThe SASL protocol name to use for authenticating via Kerberos. Must match the service principal name.
Supports Expression Language: true (will be evaluated using Environment variables only)

Relationships:

NameDescription
successA FlowFile is routed to this relationship after it has been successfully stored in Kudu
failureA FlowFile is routed to this relationship if it cannot be sent to Kudu

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
record.countNumber of records written to Kudu

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

System Resource Considerations:

ResourceDescription
MEMORYAn instance of this component can cause high usage of this system resource. Multiple instances or high concurrency settings may result a degradation of performance.