Hortonworks Streaming Analytics Manager User Guide
Also available as:
PDF

Source Configuration Values

Table 6.1. Kafka

Configuration FieldDescription, requirements, tips for configuration
Cluster NameMandatory. Service pool defined in SAM to get metadata information about Kafka cluster
Security ProtocolMandatory. Protocol to be used to communicate with kafka brokers. E.g. PLAINTEXT. Auto suggest with a list of protocols supported by Kafka service based on cluster name selected. If you select a protocol with SSL or SASL make sure to fill out the related config fields
Bootstrap ServersMandatory. A comma separated string of host:port representing Kafka broker listeners. Auto suggest with a list of options based on security protocol selected above
Kafka topicMandatory. Kafka topic to read data from. Make sure that corresponding schema for topic is defined in Schema Registry
Consumer Group IdMandatory. A unique string that identifies the consumer group it belongs to. Used to keep track of consumer offsets
Reader schema versionOptional. Version of schema for topic to read from. Default value is the version used by producer to write data to topic
Kerberos client principalOptional(Mandatory for SASL). Client principal to use to connect to brokers while using SASL GSSAPI mechanism for Kerberos(used in case of security protocol being SASL_PLAINTEXT or SASL_SSL)
Kerberos keytab fileOptional(Mandatory for SASL). Keytab file location on worker node containing the secret key for client principal while using SASL GSSAPI mechanism for Kerberos(used in case of security protocol being SASL_PLAINTEXT or SASL_SSL)
Kafka service nameOptional(Mandatory for SASL). Service name that Kafka broker is running as(used in case of security protocol being SASL_PLAINTEXT or SASL_SSL)
Fetch minimum bytesOptional. The minimum number of bytes the broker should return for a fetch request. Default value is 1
Maximum fetch bytes per partitionOptional. The maximum amount of data per-partition the broker will return. Default value is 1048576
Maximum records per pollOptional. The maximum number of records a poll will return. Default value is 500
Poll timeout(ms)Optional. Time in milliseconds spent waiting in poll if data is not available. Default value is 200
Offset commit period(ms)Optional. Period in milliseconds at which offsets are committed. Default value is 30000
Maximum uncommitted offsetsOptional.Defines the max number of polled records that can be pending commit, before another poll can take place. Default value is 10000000. This value should depend on the size of each message in Kafka and the memory available to the worker jvm process
First poll offset strategyOptional. Offset used by the Kafka spout in the first poll to Kafka broker. Pick one from enum values. ["EARLIEST", "LATEST", "UNCOMMITTED_EARLIEST", "UNCOMMITTED_LATEST"]. Default value is EARLIEST_UNCOMMITTED. It means that by default it will start from the earliest uncommitted offset for the consumer group id provided above
Partition refresh period(ms)Optional. Period in milliseconds at which Kafka will be polled for new topics and/or partitions. Default value is 2000
Emit null tuples?Optional. A flag to indicate if null tuples should be emitted to downstream components or not. Default value is false
First retry delay(ms)Optional. Interval delay in milliseconds for first retry for a failed Kafka spout message. Default value is 0
Retry delay period(ms)Optional. Retry delay period(geometric progression) in milliseconds for second and subsequent retries for a failed Kafka spout message. Default value is 2
Maximum retriesOptional. Maximum number of times a failed message is retried before it is acked and committed. Default value is 2147483647
Maximum retry delay(ms)Optional. Maximum interval in milliseconds to wait before successive retries for a failed Kafka spout message. Default value is 10000
Consumer startup delay(ms)Optional. Delay in milliseconds after which Kafka will be polled for records. This value is to make sure all executors come up before first poll from each executor happens so that partitions are well balanced among executors and onPartitionsRevoked and onPartitionsAssigned is not called later causing duplicate tuples to be emitted. Default value is 60000
SSL keystore locationOptional. The location of the key store file. Used when Kafka client connectivity is over SSL
SSL keystore locationOptional. The store password for the key store file
SSL key passwordOptional. The password of the private key in the key store file
SSL truststore locationOptional(Mandatory for SSL). The location of the trust store file
SSL truststore passwordOptional(Mandatory for SSL). The password for the trust store file
SSL enabled protocolsOptional. Comma separated list of protocols enabled for SSL connections
SSL keystore typeOptional. File format of keystore file. Default value is JKS
SSL truststore typeOptional. File format of truststore file. Default value is JKS
SSL protocolOptional. SSL protocol used to generate SSLContext. Default value is TLS
SSL providerOptional. Security provider used for SSL connections. Default value is default security provider for JVM
SSL cipher suitesOptional. Comma separated list of cipher suites. This is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol. By default all the available cipher suites are supported
SSL endpoint identification algorithmOptional. The endpoint identification algorithm to validate server hostname using server certificate
SSL key manager algorithmOptional. The algorithm used by key manager factory for SSL connections. Default value is SunX509
SSL secure random implementationOptional. The SecureRandom PRNG implementation to use for SSL cryptographic operations
SSL trust manager algorithmOptional. The algorithm used by trust manager factory for SSL connections. Default value is the trust manager factory algorithm configured for the Java Virtual Machine. Default value is PKIX

Table 6.2. Event Hubs

Configuration FieldDescription, requirements, tips for configuration
UsernameThe Event Hubs user name (policy name in Event Hubs Portal)
PasswordThe Event Hubs password (shared access key in Event Hubs Portal)
NamespaceThe Event Hubs namespace
Entity PathThe Event Hubs entity path
Partition CountThe number of partitions in the Event Hubs
ZooKeeper Connection StringThe ZooKeeper connection string
Checkpoint IntervalThe frequency at which offsets are checkpointed
Receiver CreditsReceiver credits
Max Pending Messages Per PartitionThe max pending messages per partition
Enqueue Time Filter The enqueue time filter
Consumer Group NameThe consumer group name

Table 6.3. HDFS

Configuration FieldDescription, requirements, tips for configuration
Cluster NameService pool defined in SAM to get metadata information about HDFS cluster
HDFS URLHDFS namenode URL
Input File FormatThe format of the file being consumed dictates the type of reader used to read the file. Currently only ‘com.hortonworks.streamline.streams.runtime.storm.spout.JsonFileReader’ is supported
Source DirThe HDFS directory from which to read the files.
Archive DirFiles from source dir will be moved to this HDFS location after being completely read.
Bad Files DirFiles from Source Dir will be moved to this HDFS location if there is an error encountered while processing them.
Lock DirLock files (used to synchronize multiple reader instances) will be created in this location. Defaults to a '.lock' subdirectory under the source directory.
Commit Frequency CountRecords progress in the lock file after specified number of records are processed. Setting it to 0 disables this.
Commit Frequency SecsRecords progress in the lock file after specified secs have elapsed. Must be greater than 0.
Max OutstandingLimits the number of unACKed tuples by pausing tuple generation (if ACKers are used in the topology).
Lock Timeout SecondsDuration of inactivity after which a lock file is considered to be abandoned and ready for another spout to take ownership.
Ignore SuffixFile names with this suffix in the source dir will not be processed.