Hortonworks Streaming Analytics Manager User Guide
Also available as:
PDF

Source Configuration Values

Table 6.1. Apache Kafka

Configuration FieldDescription, requirements, tips for configuration
Cluster NameMandatory. Specifies the service pool defined in SAM to get metadata about Kafka cluster
Security ProtocolMandatory. Specifies the protocol to be used to communicate with Kafka brokers such as PLAINTEXT. A list of protocols supported by the Kafka service and based on the cluster name selected are automatically suggested. If you select a protocol with SSL or SASL, you must complete the related configuration fields.
Bootstrap ServersMandatory. A comma-separated string of host:port values representing Kafka broker listeners. Auto suggest with a list of options based on the selected security protocol.
Kafka topicMandatory. The Kafka topic from which to read data. You must ensure that the corresponding topic schema is defined in Schema Registry.
Consumer Group IdMandatory. A unique string that identifies the consumer group it belongs to. Used to keep track of consumer offsets.
Reader schema versionOptional. The version of the schema for the topic to read from. The default value is the version used by the producer to write data to the topic.
Kerberos client principalMandatory for SASL only. Client principal to use to connect to brokers while using SASL GSSAPI mechanism for Kerberos (used in case of security protocol being SASL_PLAINTEXT or SASL_SSL).
Kerberos keytab fileOptional(Mandatory for SASL). Keytab file location on worker node containing the secret key for client principal while using SASL GSSAPI mechanism for Kerberos(used in case of security protocol being SASL_PLAINTEXT or SASL_SSL).
Kafka service nameOptional(Mandatory for SASL). Service name under which Kafka broker is running (used in case of security protocol being SASL_PLAINTEXT or SASL_SSL).
Fetch minimum bytesOptional. The minimum number of bytes the broker should return for a fetch request. Default value is 1.
Maximum fetch bytes per partitionOptional. The maximum amount of data per partition that the broker can return. Default value is 1048576.
Maximum records per pollOptional. The maximum number of records a poll can return. Default value is 500.
Poll timeout(ms)Optional. Time, in milliseconds, spent waiting in poll if data is not available. Default value is 200.
Offset commit period(ms)Optional. Period, in milliseconds, after which offsets are committed. Default value is 30000.
Maximum uncommitted offsetsOptional.Defines the maximum number of polled records that can be pending commit status before another poll can take place. Default value is 10000000. This value depends on the size of each message in Kafka and the memory available to the worker jvm process.
First poll offset strategyOptional. Offset used by the Kafka spout in the first poll to Kafka broker. You must choose one of EARLIEST", "LATEST", "UNCOMMITTED_EARLIEST", and "UNCOMMITTED_LATEST". Default value is EARLIEST_UNCOMMITTED, which means that, by default, it starts from the earliest uncommitted offset for the consumer group ID.
Partition refresh period(ms)Optional. Period, in milliseconds, after which Kafka is polled for new topics or partitions. Default value is 2000.
Emit null tuples?Optional. A flag to indicate if null tuples should be emitted to downstream components or not. Default value is false.
First retry delay(ms)Optional. Interval delay, in milliseconds, for first retry of a failed Kafka spout message. Default value is 0.
Retry delay period(ms)Optional. Retry delay period(geometric progression) in milliseconds for second and subsequent retries for a failed Kafka spout message. Default value is 2.
Maximum retriesOptional. Maximum number of times a failed message is retried before it is acked and committed. Default value is 2147483647.
Maximum retry delay(ms)Optional. Maximum interval, in milliseconds, to wait before successive retries for a failed Kafka spout message. Default value is 10000.
Consumer startup delay(ms)Optional. Delay, in milliseconds, after which Kafka is polled for records. This specified delay is intended to ensure that all executors are active before they are polled, so that partitions are well balanced among executors. This also ensures that onPartitionsRevoked and onPartitionsAssigned status does not occur and cause duplicate tuples. Default value is 60000.
SSL keystore locationOptional. The location of the key store file. Used when Kafka client connectivity is over SSL.
SSL keystore locationOptional. The store password for the key store file.
SSL key passwordOptional. The password of the private key in the key store file.
SSL truststore locationOptional(Mandatory for SSL). The location of the trust store file.
SSL truststore passwordOptional(Mandatory for SSL). The password for the trust store file.
SSL enabled protocolsOptional. Comma-separated list of protocols enabled for SSL connections.
SSL keystore typeOptional. File format of keystore file. Default value is JKS.
SSL truststore typeOptional. File format of truststore file. Default value is JKS
SSL protocolOptional. SSL protocol used to generate SSLContext. Default value is TLS.
SSL providerOptional. Security provider used for SSL connections. Default value is default security provider for JVM.
SSL cipher suitesOptional. Comma-separated list of cipher suites. This is a named combination of authentication, encryption, MAC, and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol. By default, all the available cipher suites are supported.
SSL endpoint identification algorithmOptional. The endpoint identification algorithm to validate server host name using server certificate.
SSL key manager algorithmOptional. The algorithm used by key manager factory for SSL connections. Default value is SunX509.
SSL secure random implementationOptional. The SecureRandom PRNG implementation to use for SSL cryptographic operations.
SSL trust manager algorithmOptional. The algorithm used by trust manager factory for SSL connections. Default value is the trust manager factory algorithm configured for the Java Virtual Machine. Default value is PKIX.

Table 6.2. Event Hubs

Configuration FieldDescription, requirements, tips for configuration
UsernameThe Event Hubs user name (policy name in Event Hubs Portal)
PasswordThe Event Hubs password (shared access key in Event Hubs Portal)
NamespaceThe Event Hubs namespace
Entity PathThe Event Hubs entity path
Partition CountThe number of partitions in the Event Hubs
ZooKeeper Connection StringThe ZooKeeper connection string
Checkpoint IntervalThe frequency at which offsets are checkpointed
Receiver CreditsReceiver credits
Max Pending Messages Per PartitionThe max pending messages per partition
Enqueue Time Filter The enqueue time filter
Consumer Group NameThe consumer group name

Table 6.3. HDFS

Configuration FieldDescription, requirements, tips for configuration
Cluster NameService pool defined in SAM to get metadata information about HDFS cluster
HDFS URLHDFS namenode URL
Input File FormatThe format of the file being consumed dictates the type of reader used to read the file. Currently, only com.hortonworks.streamline.streams.runtime.storm.spout.JsonFileReader is supported.
Source DirThe HDFS directory from which to read the files.
Archive DirThe Hortonworks Data File System location to which files from the source dir are moved after being completely read.
Bad Files DirFiles from Source Dir will be moved to this HDFS location if there is an error encountered while processing them.
Lock DirLocation in which lock files (used to synchronize multiple reader instances) are created. Defaults to a .lock'subdirectory under the source directory.
Commit Frequency CountIf not set to 0, records progress in the lock file after the specified number of records are processed.
Commit Frequency SecsThe number of seconds after which progress in the lock file is recorded.
Max OutstandingLimits the number of unACKed tuples by pausing tuple generation (if ACKers are used in the topology).
Lock Timeout SecondsDuration of inactivity after which a lock file is considered abandoned and ready for another spout to take ownership.
Ignore SuffixFile names with this suffix in the source directory are not processed.