Sink Configuration Values
Table 6.10. Apache Cassandra
Configuration Field | Description, requirements, tips for configuration |
General Sink Description | Enables users to send events into a given Cassandra table. |
Table Name | Name of the table into which events should be written |
Column Name | Name of the column to which a respective field is mapped. |
Field Name | Name of the field to be mapped as a respective column |
Cassandra Configurations- User Name | User name to connect to Cassandra cluster. |
Password | Password to connect to Cassandra cluster. |
Keyspace | Keyspace in which a table exists |
Nodes | Cassandra nodes configuration to be passed |
Port | Port number for the Cassandra cluster |
Row Batch Size | Maximum number of rows to be taken in a batch |
Retry Policy | Class name of the retry policy to be applied. Default value is “DefaultRetryPolicy”. Valid options are "DowngradingConsistencyRetryPolicy," "FallthroughRetryPolicy," and "DefaultRetryPolicy." |
Consistency Level | Consistency level at which data is inserted. Default value is QUORUM; valid values are "ANY", "ONE", "TWO", "THREE", "QUORUM", "ALL", "LOCAL_QUORUM", "EACH_QUORUM", "SERIAL", "LOCAL_SERIAL", and "LOCAL_ONE." |
Reconnection Base Delay | Base delay (in milliseconds) while reconnecting to target. |
Reconnection Maximum Delay | Maximum delay (in milliseconds) while reconnecting to target. |
Table 6.11. Druid
Configuration Field | Description, requirements, tips for configuration |
General Sink Description |
Druid sink is used to push Druid data store. This sink uses the Druid Tranquility library to push data. More details : http://druid.io/docs/latest/ingestion/stream-push.html |
Name of the Indexing Service |
The druid.service name of the indexing service overlord node (mandatory) |
Service Discovery path | Curator service discovery path. It is mandatory parameter. |
ZooKeeper Connect String | ZooKeeper connect string. It is mandatory parameter. |
Datasource name | The name of the ingested data source. Data sources can be thought of as tables. It is mandatory parameter. |
Dimensions | Specifies the dimensions( columns) of the data. It is mandatory parameter. |
TimeStamp Field Name | Specifies the column and format of the timestamp. It is mandatory parameter. |
Window Period | Window Period takes ISO 8601 Period format (https://en.wikipedia.org/wiki/ISO_8601). It is mandatory parameter. |
Index Retry Period | Period during which to retry a failed indexing service overlord call. It takes ISO 8601 Period format (https://en.wikipedia.org/wiki/ISO_8601). It is mandatory parameter. |
Segment Granularity | The granularity at which to create segments. |
Query Granularity | The minimum granularity at which to query results, and the granularity of the data inside the segment. |
Batch Size | Maximum number of messages to send simultaneously |
Max Pending Batches | Maximum number of batches that might be in flight |
Linger millis | Number of milliseconds to wait for batches to collect more messages (up to maxBatchSize) before sending them. |
Block On Full | Whether a send operation blocks (true) or throws an exception (false) when called on a full outgoing queue |
Druid partitions | Number of Druid partitions to create. |
Partition Replication | Number of instances of each Druid partition to create. |
Aggregator Info | A list of aggregators. Currently supported aggregators are Count Aggregator, Double Sum Aggregator, Double Max Aggregator, Double Min Aggregator, Long Sum Aggregator, Long Max Aggregator, and Long Min Aggregator. |
Table 6.12. Apache Hive
Configuration Field | Description, requirements, tips for configuration |
General Sink Description | Hive sink is used to write data to Hive tables. |
Metastore URI | URI of the metastore to connect to: for example, thrift://localhost:9083 |
Database Name | Name of the Hive database |
Table name | Name of table to stream to |
Fields | The event fields to stream to Hive |
Partition fields | The event fields on which to partition the data |
Flush Interval | The interval (in seconds) at which a transaction batch is committed |
Transactions per batch | The number of transactions per batch |
Max open connections | The maximum number of open connections to Hive |
Batch size | The number of events per batch |
Idle timeout | The idle timeout |
Call timeout | The call timeout |
Heartbeat Interval | The heartbeat interval |
Auto create partitions | If true, the partition specified in the endpoint is automatically created if it does not already exist |
Kerberos keytab | Kerberos keytab file path |
Kerberos principal | Kerberos principal name |
Table 6.13. Apache HBase
Configuration Field | Description, requirements, tips for configuration |
General Sink Description | Writes to events to HBase |
HBase table | Hbase table to write to |
Column Family | Hbase table column family |
Batch Size | Number of records in the batch to trigger flushing. Note that every batch needs to be full before it can be flushed because tick tuple is not currently supported. |
Row Key Field | Field to be used as a row key for the table |
Table 6.14. Hadoop Distributed File System (HDFS)
Configuration Field | Description, requirements, tips for configuration |
General Sink Description | Writes events to HDFS |
Hdfs URL | HDFS NameNode URL |
Path | Directory to which the files arewritten |
Flush Count | Number of records to wait for before flushing to HDFS |
Rotation Policy | Strategy to rotate files in HDFS |
Rotation Interval Multiplier | Rotation interval multiplier for timed rotation policy |
Rotation Interval Unit | Rotation interval unit for timed rotation policy |
Output fields | Specifies the output fields, in the desired order |
Prefix | Prefix for default file name format |
Extension | Extension for default file name format |
Table 6.15. Java Database Connectivity (JDBC)
Configuration Field | Description, requirements, tips for configuration |
General Sink Description | Writes events to a database using JDBC. |
Driver Class Name | The driver class name: for example, com.mysql.jdbc.Driver |
JDBC URL | JDBC URL: for example, jdbc:mysql://localhost:3306/test |
User Name | Database user name |
Password | Database password |
Table Name | Table to write to |
Column Names | Names of the database columns |
Table 6.16. Apache Kafka
Configuration Field | Description, requirements, tips for configuration |
General Sink Description | Kafka sink in which to write SAM events to a Kafka topic |
Cluster Name | Mandatory. Service pool defined in SAM to get metadata information about Kafka cluster |
Kafka Topic | Mandatory. Kafka topic to write data to. The schema for the corresponding topic must exist in SR, and the incoming SAM event must adhere to the version of schema selected. |
Security Protocol | Mandatory. Protocol to be used to communicate with Kafka brokers: for example, PLAINTEXT. Auto suggest with a list of protocols supported by Kafka service based on cluster name selected. If you select a protocol with SSL or SASL make sure to fill out the related config fields |
Bootstrap Servers | Mandatory. A comma separated string of host:port representing Kafka broker listeners. Auto suggest with a list of options based on security protocol selected above |
Fire And Forget? | Optional. A flag to indicate whether the Kafka producer should wait for an acknowledgement. Default value is false. |
Async? | Optional. Indicate swhether to use an asynchronous Kafka producer. Default value is true. |
Key serializer | Optional. Type of key serializer to use. Options are String, Integer, Long, and ByteArray. Default value is ByteArray. Note that this field does not save any key in the Kafka message. The incoming SAM event is stored as a value in the Kafka message, with a null key. |
Key field | Optional. Name of the key field. One of the fields from the incoming event schema. |
Writer schema version | Optional. Version of schema for topic to use for serializing the message. Default is the latest version for the schema. |
Ack mode | Optional. Ack mode used in producer request for a record sent to server(None|Leader|Min in-sync replicas). Options are [“None”, “Leader”, “All”]. Default value is “Leader” |
Buffer memory | Optional. The total bytes of memory the producer can use to buffer records waiting to be sent to the server. Default value is 33554432 |
Compression type | Optional. The compression type for all data generated by the producer. Options are ["none", "gzip", "snappy", "lz4"]. Default value is “none” |
Retries | Optional. Number of retry attempts for a record send failure. Default value is 0 |
Batch size | Optional. Producer batch size in bytes for records sent to same partition. Default value is 16384 |
Client id | Optional. Id sent to server in producer request for tracking in server logs |
Max connection idle | Optional. Time in milliseconds for which connections can be idle before getting closed. Default value is 540000 |
Linger time | Optional. Time in milliseconds to wait before sending a record out when batch is not full. Default value is 0 |
Max block | Optional. Time in milliseconds that send and partitionsFor methods will block for. Default value is 60000 |
Max request size | Optional. Maximum size of a request in bytes. Default value is 1048576 |
Receive buffer size | Optional. Size in bytes of TCP receive buffer (SO_RCVBUF) to use when reading data. Default value is 32768 |
Request timeout | Optional. Maximum amount of time in milliseconds the producer will wait for the response of a request. Default value is 30000 |
Kerberos client principal | Optional(Mandatory for SASL). Client principal to use to connect to brokers while using SASL GSSAPI mechanism for Kerberos(used in case of security protocol being SASL_PLAINTEXT or SASL_SSL) |
Kerberos keytab file | Optional(Mandatory for SASL). Keytab file location on worker node containing the secret key for client principal while using SASL GSSAPI mechanism for Kerberos(used in case of security protocol being SASL_PLAINTEXT or SASL_SSL) |
Kafka service name | Optional(Mandatory for SASL). Service name that Kafka broker is running as(used in case of security protocol being SASL_PLAINTEXT or SASL_SSL) |
Send buffer size | Optional.Size in bytes of TCP send buffer (SO_SNDBUF) to use when sending data. Default value is 131072 |
Timeout | Optional. Maximum amount of time in milliseconds server will wait for acks from followers. Default value is 30000 |
Block on buffer full? | Optional. Boolean to indicate whether to block on a full buffer or throw an exception.Default value is true |
Max in-flight requests | Optional. Maximum number of unacknowledged requests producer will send per connection before blocking. Default value is 5 |
Metadata fetch timeout | Optional. Timeout in milliseconds for a topic metadata fetch request. Default value is 60000 |
Metadata max age | Optional. Time in milliseconds after which a metadata fetch request is forced. Default value is 300000 |
Reconnect backoff | Optional. Amount of time in milliseconds to wait before attempting to reconnect to a host. Default value is 50 |
Retry backoff | Optional. Amount of time in milliseconds to wait before attempting to retry a failed fetch request. Default value is 100 |
SSL keystore location | Optional.The location of the key store file. Used when Kafka client connectivity is over SSL |
SSL keystore location | Optional. The store password for the key store file |
SSL key password | Optional. The password of the private key in the key store file |
SSL truststore location | Optional(Mandatory for SSL). The location of the trust store file |
SSL truststore password | Optional(Mandatory for SSL). The password for the trust store file |
SSL enabled protocols | Optional. Comma separated list of protocols enabled for SSL connections |
SSL keystore type | Optional. File format of keystore file. Default value is JKS |
SSL truststore type | Optional. File format of truststore file. Default value is JKS |
SSL protocol | Optional. SSL protocol used to generate SSLContext. Default value is TLS |
SSL provider | Optional. Security provider used for SSL connections. Default value is default security provider for JVM |
SSL cipher suites | Optional. Comma separated list of cipher suites. This is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol. By default all the available cipher suites are supported |
SSL endpoint identification algorithm | Optional. The endpoint identification algorithm to validate server hostname using server certificate |
SSL key manager algorithm | Optional. The algorithm used by key manager factory for SSL connections. Default value is SunX509 |
SSL secure random implementation | Optional. The SecureRandom PRNG implementation to use for SSL cryptographic operations |
SSL trust manager algorithm | Optional. The algorithm used by trust manager factory for SSL connections. Default value is the trust manager factory algorithm configured for the Java Virtual Machine. Default value is PKIX |
Table 6.17. Notification
Configuration Field | Description, requirements, tips for configuration |
General Sink Description | Can be used to send email notifications |
Username | The user name for the mail server |
Password | The password for the mail server |
Host | Mail server host name |
Port | Mail server port |
SSL? | If the connection should be over SSL |
Start TLS | Flag to indicate the TLS setting |
Debug? | Whether to log debug messages |
Email Server Protocol | The email server protocol: for example, SMTP |
Authenticate | Flag to indicate if authentication is to be performed |
Table 6.18. Open TSDB
Configuration Field | Description, requirements, tips for configuration |
General Sink Description | Sink to which events can be written given OpenTSDB cluster. |
REST API URL | The URL of the REST API: for example, http://localhost:4242 |
Metric Field Name | Field name of the metric |
Timestamp Field Name | Field name of the timestamp |
Tags Field Name | Field name of the tags |
Value Field Name | Field name of the value |
Fail Tuple for Failed Metrics? | Whether to fail tuple for any failed metrics to OpenTSDB |
Sync? | Whether to sync |
Sync Timeout | Sync timeout (in milliseconds) when the Sync value is true |
Return Summary? | Whether to return summary |
Return Details? | Whether to return details |
Enable Chunked Encoding? | Whether to enable chunked encoding for REST API calls to OpenTSDB |
Table 6.19. Solr
Configuration Field | Description, requirements, tips for configuration |
General Sink Description | Enables indexing of live input data into Apache Solr collections |
Apache Solr ZooKeeper Host String | Information about the Apache ZooKeeper ensemble used to coordinate the Solr cluster. This string is specified in a comma-separated value as folows: zk1.host.com:2181,zk2.host.com:2181,zk3.example.com:2181 |
Apache Solr Collection Name | The name of the Apache Solr collection to which to index live data |
Commit Batch Size | Defines how often the indexed data is committed into Apache Solr. It is specified using an integral number. For instance, if set to 100, every 100 tuples, Apache Solr commits the data. |