QueryCassandra

Description:

Execute provided Cassandra Query Language (CQL) select query on a Cassandra 1.x, 2.x, or 3.0.x cluster. Query result may be converted to Avro or JSON format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute 'executecql.row.count' indicates how many rows were selected.

Tags:

cassandra, cql, select

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Cassandra Connection Providercassandra-connection-providerController Service API:
CassandraSessionProviderService
Implementation: CassandraSessionProvider
Specifies the Cassandra connection providing controller service to be used to connect to Cassandra cluster.
Cassandra Contact PointsCassandra Contact PointsContact points are addresses of Cassandra nodes. The list of contact points should be comma-separated and in hostname:port format. Example node1:port,node2:port,.... The default client port for Cassandra is 9042, but the port(s) must be explicitly specified.
Supports Expression Language: true (will be evaluated using Environment variables only)
KeyspaceKeyspaceThe Cassandra Keyspace to connect to. If no keyspace is specified, the query will need to include the keyspace name before any table reference, in case of 'query' native processors or if the processor exposes the 'Table' property, the keyspace name has to be provided with the table name in the form of <KEYSPACE>.<TABLE>
Supports Expression Language: true (will be evaluated using Environment variables only)
SSL Context ServiceSSL Context ServiceController Service API:
SSLContextService
Implementations: StandardRestrictedSSLContextService
StandardSSLContextService
The SSL Context Service used to provide client certificate information for TLS/SSL connections.
Client AuthClient AuthREQUIRED
  • WANT
  • REQUIRED
  • NONE
Client authentication policy when connecting to secure (TLS/SSL) cluster. Possible values are REQUIRED, WANT, NONE. This property is only used when an SSL Context has been defined and enabled.
UsernameUsernameUsername to access the Cassandra cluster
Supports Expression Language: true (will be evaluated using Environment variables only)
PasswordPasswordPassword to access the Cassandra cluster
Sensitive Property: true
Supports Expression Language: true (will be evaluated using Environment variables only)
Consistency LevelConsistency LevelONE
  • ANY
  • ONE
  • TWO
  • THREE
  • QUORUM
  • ALL
  • LOCAL_QUORUM
  • EACH_QUORUM
  • SERIAL
  • LOCAL_SERIAL
  • LOCAL_ONE
The strategy for how many replicas must respond before results are returned.
Compression TypeCompression TypeNONE
  • NONE
  • SNAPPY
  • LZ4
Enable compression at transport-level requests and responses
Character SetCharacter SetUTF-8Specifies the character set of the record data.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
CQL select queryCQL select queryCQL select query
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Max Wait TimeMax Wait Time0 secondsThe maximum amount of time allowed for a running CQL select query. Must be of format <duration> <TimeUnit> where <duration> is a non-negative integer and TimeUnit is a supported Time Unit, such as: nanos, millis, secs, mins, hrs, days. A value of zero means there is no limit.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Fetch sizeFetch size0The number of result rows to be fetched from the result set at a time. Zero is the default and means there is no limit.
Supports Expression Language: true (will be evaluated using Environment variables only)
Max Rows Per Flow FileMax Rows Per Flow File0The maximum number of result rows that will be included in a single FlowFile. This will allow you to break up very large result sets into multiple FlowFiles. If the value specified is zero, then all rows are returned in a single FlowFile.
Supports Expression Language: true (will be evaluated using Environment variables only)
Output Batch Sizeqdbt-output-batch-size0The number of output FlowFiles to queue before committing the process session. When set to zero, the session will be committed when all result set rows have been processed and the output FlowFiles are ready for transfer to the downstream relationship. For large result sets, this can cause a large burst of FlowFiles to be transferred at the end of processor execution. If this property is set, then when the specified number of FlowFiles are ready for transfer, then the session will be committed, thus releasing the FlowFiles to the downstream relationship. NOTE: The maxvalue.* and fragment.count attributes will not be set on FlowFiles when this property is set.
Supports Expression Language: true (will be evaluated using Environment variables only)
Output FormatOutput FormatAvro
  • Avro
  • JSON
The format to which the result rows will be converted. If JSON is selected, the output will contain an object with field 'results' containing an array of result rows. Each row in the array is a map of the named column to its value. For example: { "results": [{"userid":1, "name":"Joe Smith"}]}
Timestamp Format Pattern for JSON outputtimestamp-format-patternyyyy-MM-dd HH:mm:ssZPattern to use when converting timestamp fields to JSON. Note: the formatted timestamp will be in UTC timezone.

Relationships:

NameDescription
retryA FlowFile is transferred to this relationship if the operation cannot be completed but attempting it again may succeed.
successA FlowFile is transferred to this relationship if the operation completed successfully.
failureA FlowFile is transferred to this relationship if the operation failed.

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
executecql.row.countThe number of rows returned by the CQL query
fragment.identifierIf 'Max Rows Per Flow File' is set then all FlowFiles from the same query result set will have the same value for the fragment.identifier attribute. This can then be used to correlate the results.
fragment.countIf 'Max Rows Per Flow File' is set then this is the total number of FlowFiles produced by a single ResultSet. This can be used in conjunction with the fragment.identifier attribute in order to know how many FlowFiles belonged to the same incoming ResultSet. If Output Batch Size is set, then this attribute will not be populated.
fragment.indexIf 'Max Rows Per Flow File' is set then the position of this FlowFile in the list of outgoing FlowFiles that were all derived from the same result set FlowFile. This can be used in conjunction with the fragment.identifier attribute to know which FlowFiles originated from the same query result set and in what order FlowFiles were produced

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component allows an incoming relationship.

System Resource Considerations:

None specified.