Hortonworks Streaming Analytics Manager User Guide
Also available as:
PDF

Sink Configuration Values

Table 6.10. Apache Cassandra

Configuration FieldDescription, requirements, tips for configuration
General Sink DescriptionEnables users to send events into a given Cassandra table.
Table NameName of the table into which events should be written
Column NameName of the column to which a respective field is mapped.
Field NameName of the field to be mapped as a respective column
Cassandra Configurations- User NameUser name to connect to Cassandra cluster.
PasswordPassword to connect to Cassandra cluster.
KeyspaceKeyspace in which a table exists
Nodes Cassandra nodes configuration to be passed
PortPort number for the Cassandra cluster
Row Batch SizeMaximum number of rows to be taken in a batch
Retry PolicyClass name of the retry policy to be applied. Default value is “DefaultRetryPolicy”. Valid options are "DowngradingConsistencyRetryPolicy," "FallthroughRetryPolicy," and "DefaultRetryPolicy."
Consistency LevelConsistency level at which data is inserted. Default value is QUORUM; valid values are "ANY", "ONE", "TWO", "THREE", "QUORUM", "ALL", "LOCAL_QUORUM", "EACH_QUORUM", "SERIAL", "LOCAL_SERIAL", and "LOCAL_ONE."
Reconnection Base DelayBase delay (in milliseconds) while reconnecting to target.
Reconnection Maximum DelayMaximum delay (in milliseconds) while reconnecting to target.

Table 6.11. Druid

Configuration FieldDescription, requirements, tips for configuration
General Sink Description

Druid sink is used to push Druid data store. This sink uses the Druid Tranquility library to push data. More details : http://druid.io/docs/latest/ingestion/stream-push.html

Name of the Indexing Service

The druid.service name of the indexing service overlord node (mandatory)

Service Discovery pathCurator service discovery path. It is mandatory parameter.
ZooKeeper Connect StringZooKeeper connect string. It is mandatory parameter.
Datasource nameThe name of the ingested data source. Data sources can be thought of as tables. It is mandatory parameter.
DimensionsSpecifies the dimensions( columns) of the data. It is mandatory parameter.
TimeStamp Field NameSpecifies the column and format of the timestamp. It is mandatory parameter.
Window PeriodWindow Period takes ISO 8601 Period format (https://en.wikipedia.org/wiki/ISO_8601). It is mandatory parameter.
Index Retry PeriodPeriod during which to retry a failed indexing service overlord call. It takes ISO 8601 Period format (https://en.wikipedia.org/wiki/ISO_8601). It is mandatory parameter.
Segment GranularityThe granularity at which to create segments.
Query Granularity The minimum granularity at which to query results, and the granularity of the data inside the segment.
Batch SizeMaximum number of messages to send simultaneously
Max Pending BatchesMaximum number of batches that might be in flight
Linger millisNumber of milliseconds to wait for batches to collect more messages (up to maxBatchSize) before sending them.
Block On FullWhether a send operation blocks (true) or throws an exception (false) when called on a full outgoing queue
Druid partitionsNumber of Druid partitions to create.
Partition ReplicationNumber of instances of each Druid partition to create.
Aggregator InfoA list of aggregators. Currently supported aggregators are Count Aggregator, Double Sum Aggregator, Double Max Aggregator, Double Min Aggregator, Long Sum Aggregator, Long Max Aggregator, and Long Min Aggregator.

Table 6.12. Apache Hive

Configuration FieldDescription, requirements, tips for configuration
General Sink DescriptionHive sink is used to write data to Hive tables.
Metastore URIURI of the metastore to connect to: for example, thrift://localhost:9083
Database NameName of the Hive database
Table nameName of table to stream to
FieldsThe event fields to stream to Hive
Partition fieldsThe event fields on which to partition the data
Flush IntervalThe interval (in seconds) at which a transaction batch is committed
Transactions per batchThe number of transactions per batch
Max open connectionsThe maximum number of open connections to Hive
Batch sizeThe number of events per batch
Idle timeoutThe idle timeout
Call timeoutThe call timeout
Heartbeat IntervalThe heartbeat interval
Auto create partitionsIf true, the partition specified in the endpoint is automatically created if it does not already exist
Kerberos keytab Kerberos keytab file path
Kerberos principal Kerberos principal name

Table 6.13. Apache HBase

Configuration FieldDescription, requirements, tips for configuration
General Sink DescriptionWrites to events to HBase
HBase tableHbase table to write to
Column FamilyHbase table column family
Batch SizeNumber of records in the batch to trigger flushing. Note that every batch needs to be full before it can be flushed because tick tuple is not currently supported.
Row Key FieldField to be used as a row key for the table

Table 6.14. Hadoop Distributed File System (HDFS)

Configuration FieldDescription, requirements, tips for configuration
General Sink DescriptionWrites events to HDFS
Hdfs URLHDFS NameNode URL
PathDirectory to which the files arewritten
Flush CountNumber of records to wait for before flushing to HDFS
Rotation PolicyStrategy to rotate files in HDFS
Rotation Interval MultiplierRotation interval multiplier for timed rotation policy
Rotation Interval UnitRotation interval unit for timed rotation policy
Output fieldsSpecifies the output fields, in the desired order
PrefixPrefix for default file name format
ExtensionExtension for default file name format

Table 6.15. Java Database Connectivity (JDBC)

Configuration FieldDescription, requirements, tips for configuration
General Sink DescriptionWrites events to a database using JDBC.
Driver Class NameThe driver class name: for example, com.mysql.jdbc.Driver
JDBC URLJDBC URL: for example, jdbc:mysql://localhost:3306/test
User NameDatabase user name
PasswordDatabase password
Table Name Table to write to
Column NamesNames of the database columns

Table 6.16. Apache Kafka

Configuration FieldDescription, requirements, tips for configuration
General Sink DescriptionKafka sink in which to write SAM events to a Kafka topic
Cluster NameMandatory. Service pool defined in SAM to get metadata information about Kafka cluster
Kafka TopicMandatory. Kafka topic to write data to. The schema for the corresponding topic must exist in SR, and the incoming SAM event must adhere to the version of schema selected.
Security ProtocolMandatory. Protocol to be used to communicate with Kafka brokers: for example, PLAINTEXT. Auto suggest with a list of protocols supported by Kafka service based on cluster name selected. If you select a protocol with SSL or SASL make sure to fill out the related config fields
Bootstrap ServersMandatory. A comma separated string of host:port representing Kafka broker listeners. Auto suggest with a list of options based on security protocol selected above
Fire And Forget?Optional. A flag to indicate whether the Kafka producer should wait for an acknowledgement. Default value is false.
Async?Optional. Indicate swhether to use an asynchronous Kafka producer. Default value is true.
Key serializerOptional. Type of key serializer to use. Options are String, Integer, Long, and ByteArray. Default value is ByteArray. Note that this field does not save any key in the Kafka message. The incoming SAM event is stored as a value in the Kafka message, with a null key.
Key fieldOptional. Name of the key field. One of the fields from the incoming event schema.
Writer schema versionOptional. Version of schema for topic to use for serializing the message. Default is the latest version for the schema.
Ack modeOptional. Ack mode used in producer request for a record sent to server(None|Leader|Min in-sync replicas). Options are [“None”, “Leader”, “All”]. Default value is “Leader”
Buffer memoryOptional. The total bytes of memory the producer can use to buffer records waiting to be sent to the server. Default value is 33554432
Compression typeOptional. The compression type for all data generated by the producer. Options are ["none", "gzip", "snappy", "lz4"]. Default value is “none”
RetriesOptional. Number of retry attempts for a record send failure. Default value is 0
Batch sizeOptional. Producer batch size in bytes for records sent to same partition. Default value is 16384
Client idOptional. Id sent to server in producer request for tracking in server logs
Max connection idleOptional. Time in milliseconds for which connections can be idle before getting closed. Default value is 540000
Linger timeOptional. Time in milliseconds to wait before sending a record out when batch is not full. Default value is 0
Max blockOptional. Time in milliseconds that send and partitionsFor methods will block for. Default value is 60000
Max request sizeOptional. Maximum size of a request in bytes. Default value is 1048576
Receive buffer sizeOptional. Size in bytes of TCP receive buffer (SO_RCVBUF) to use when reading data. Default value is 32768
Request timeoutOptional. Maximum amount of time in milliseconds the producer will wait for the response of a request. Default value is 30000
Kerberos client principalOptional(Mandatory for SASL). Client principal to use to connect to brokers while using SASL GSSAPI mechanism for Kerberos(used in case of security protocol being SASL_PLAINTEXT or SASL_SSL)
Kerberos keytab fileOptional(Mandatory for SASL). Keytab file location on worker node containing the secret key for client principal while using SASL GSSAPI mechanism for Kerberos(used in case of security protocol being SASL_PLAINTEXT or SASL_SSL)
Kafka service nameOptional(Mandatory for SASL). Service name that Kafka broker is running as(used in case of security protocol being SASL_PLAINTEXT or SASL_SSL)
Send buffer sizeOptional.Size in bytes of TCP send buffer (SO_SNDBUF) to use when sending data. Default value is 131072
TimeoutOptional. Maximum amount of time in milliseconds server will wait for acks from followers. Default value is 30000
Block on buffer full?Optional. Boolean to indicate whether to block on a full buffer or throw an exception.Default value is true
Max in-flight requestsOptional. Maximum number of unacknowledged requests producer will send per connection before blocking. Default value is 5
Metadata fetch timeoutOptional. Timeout in milliseconds for a topic metadata fetch request. Default value is 60000
Metadata max ageOptional. Time in milliseconds after which a metadata fetch request is forced. Default value is 300000
Reconnect backoffOptional. Amount of time in milliseconds to wait before attempting to reconnect to a host. Default value is 50
Retry backoffOptional. Amount of time in milliseconds to wait before attempting to retry a failed fetch request. Default value is 100
SSL keystore locationOptional.The location of the key store file. Used when Kafka client connectivity is over SSL
SSL keystore locationOptional. The store password for the key store file
SSL key passwordOptional. The password of the private key in the key store file
SSL truststore locationOptional(Mandatory for SSL). The location of the trust store file
SSL truststore passwordOptional(Mandatory for SSL). The password for the trust store file
SSL enabled protocolsOptional. Comma separated list of protocols enabled for SSL connections
SSL keystore typeOptional. File format of keystore file. Default value is JKS
SSL truststore typeOptional. File format of truststore file. Default value is JKS
SSL protocolOptional. SSL protocol used to generate SSLContext. Default value is TLS
SSL providerOptional. Security provider used for SSL connections. Default value is default security provider for JVM
SSL cipher suitesOptional. Comma separated list of cipher suites. This is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol. By default all the available cipher suites are supported
SSL endpoint identification algorithmOptional. The endpoint identification algorithm to validate server hostname using server certificate
SSL key manager algorithmOptional. The algorithm used by key manager factory for SSL connections. Default value is SunX509
SSL secure random implementationOptional. The SecureRandom PRNG implementation to use for SSL cryptographic operations
SSL trust manager algorithmOptional. The algorithm used by trust manager factory for SSL connections. Default value is the trust manager factory algorithm configured for the Java Virtual Machine. Default value is PKIX

Table 6.17. Notification

Configuration FieldDescription, requirements, tips for configuration
General Sink DescriptionCan be used to send email notifications
UsernameThe user name for the mail server
PasswordThe password for the mail server
HostMail server host name
PortMail server port
SSL?If the connection should be over SSL
Start TLSFlag to indicate the TLS setting
Debug? Whether to log debug messages
Email Server ProtocolThe email server protocol: for example, SMTP
AuthenticateFlag to indicate if authentication is to be performed

Table 6.18. Open TSDB

Configuration FieldDescription, requirements, tips for configuration
General Sink DescriptionSink to which events can be written given OpenTSDB cluster.
REST API URLThe URL of the REST API: for example, http://localhost:4242
Metric Field NameField name of the metric
Timestamp Field NameField name of the timestamp
Tags Field NameField name of the tags
Value Field NameField name of the value

Fail Tuple for Failed Metrics?

Whether to fail tuple for any failed metrics to OpenTSDB
Sync?Whether to sync
Sync TimeoutSync timeout (in milliseconds) when the Sync value is true
Return Summary?Whether to return summary
Return Details?Whether to return details
Enable Chunked Encoding?Whether to enable chunked encoding for REST API calls to OpenTSDB

Table 6.19. Solr

Configuration FieldDescription, requirements, tips for configuration
General Sink DescriptionEnables indexing of live input data into Apache Solr collections
Apache Solr ZooKeeper Host StringInformation about the Apache ZooKeeper ensemble used to coordinate the Solr cluster. This string is specified in a comma-separated value as folows: zk1.host.com:2181,zk2.host.com:2181,zk3.example.com:2181
Apache Solr Collection NameThe name of the Apache Solr collection to which to index live data
Commit Batch SizeDefines how often the indexed data is committed into Apache Solr. It is specified using an integral number. For instance, if set to 100, every 100 tuples, Apache Solr commits the data.