This processor is for accessing the Elasticsearch Bulk API. It provides the ability to configure bulk operations on a per-FlowFile basis, which is what separates it from PutElasticsearchRecord.
As part of the Elasticsearch REST API bundle, it uses a controller service to manage connection information and that controller service is built on top of the official Elasticsearch client APIs. That provides features such as automatic master detection against the cluster which is missing in the other bundles.
This processor builds one Elasticsearch Bulk API body per (batch of) FlowFiles. Care should be taken to batch FlowFiles into appropriately-sized chunks so that NiFi does not run out of memory and the requests sent to Elasticsearch are not too large for it to handle. When failures do occur, this processor is capable of attempting to route the FlowFiles that failed to an errors queue so that only failed FlowFiles can be processed downstream or replayed.
The index, operation and (optional) type fields are configured with default values. The ID (optional unless the operation is "index") can be set as an attribute on the FlowFile(s).
Index and Create operations can use Dynamic Templates. The Dynamic Templates property must be parsable as a JSON object.
{ "message": "Hello, world" }
The Dynamic Templates property below would be parsable:
{"message": "keyword_lower"}
Would create Elasticsearch action:
{ "index" : {"_id" : "1", "_index" : "test", "dynamic_templates" : {"message" : "keyword_lower"}} } { "doc" : {"message" : "Hello, world"} }
Update and Upsert operations can use a script. Scripts must contain all the elements required by Elasticsearch, e.g. source and lang. The Script property must be parsable as a JSON object.
If a script is defined for an upset, the Flowfile content will be used as the upsert fields in the Elasticsearch action. If no script is defined, the FlowFile content will be used as the update doc (or doc_as_upsert for upsert operations).
{ "message": "Hello, world", "from": "john.smith" }
Would create Elasticsearch action:
{ "update" : {"_id" : "1", "_index" : "test"} } { "doc" : {"message" : "Hello, world", "from" : "john.smith"} }
{ "counter": 1 }
The script property below would be parsable:
{"source": "ctx._source.counter += params.param1", "lang": "painless", "params": {"param1": 1}}
Would create Elasticsearch action:
{ "update" : {"_id" : "1", "_index" : "test"} } { "script" : { "source": "ctx._source.counter += params.param1", "lang" : "painless", "params" : {"param1" : 1}}, "upsert" : {"counter" : 1}}
Dynamic Properties can be defined on the processor with BULK: prefixes. Users must ensure that only known Bulk action fields are sent to Elasticsearch for the relevant index operation defined for the FlowFile, Elasticsearch will reject invalid combinations of index operation and Bulk action fields.
{ "message": "Hello, world", "from": "john.smith" }
With the Dynamic Property below:
Would create Elasticsearch action:
{ "update" : {"_id" : "1", "_index" : "test", "retry_on_conflict" : "3"} } { "doc" : {"message" : "Hello, world", "from" : "john.smith"} }
Valid values for "operation" are: