Trident APIs
The following example shows construction of a Kafka bolt using core Storm APIs, followed by details about the code:
DelimitedRecordHiveMapper mapper = new DelimitedRecordHiveMapper() .withColumnFields(new Fields(colNames)) .withTimeAsPartitionField("YYYY/MM/DD"); HiveOptions hiveOptions = new HiveOptions(metaStoreURI,dbName,tblName,mapper) .withTxnsPerBatch(10) .withBatchSize(1000) .withIdleTimeout(10) StateFactory factory = new HiveStateFactory().withOptions(hiveOptions); TridentState state = stream.partitionPersist(factory, hiveFields, new HiveUpdater(), new Fields());
- Instantiate an Implementation of HiveMapper Interface
The
storm-hive
streaming bolt uses theHiveMapper
interface to map the names of tuple fields to the names of Hive table columns. Storm provides two implementations:DelimitedRecordHiveMapper
andJsonRecordHiveMapper
. Both implementations take the same arguments.Table 1. HiveMapper Arguments Argument
Data Type
Description
withColumnFields
org.apache.storm.tuple.Fields
The name of the tuple fields that you want to map to table column names.
withPartitionFields
org.apache.storm.tuple.Fields
The name of the tuple fields that you want to map to table partitions.
withTimeAsPartitionField
String
Requests that table partitions be created with names set to system time. Developers can specify any Java-supported date format, such as "YYYY/MM/DD".
The following sample code illustrates how to use
DelimitedRecordHiveMapper
:... DelimitedRecordHiveMapper mapper = new DelimitedRecordHiveMapper() .withColumnFields(new Fields(colNames)) .withPartitionFields(new Fields(partNames)); DelimitedRecordHiveMapper mapper = new DelimitedRecordHiveMapper() .withColumnFields(new Fields(colNames)) .withTimeAsPartitionField("YYYY/MM/DD"); ...
- Instantiate a
HiveOptions
class with theHiveMapper
ImplementationUse the
HiveOptions
class to configure the transactions used by Hive to ingest the streaming data, as illustrated in the following code sample.... HiveOptions hiveOptions = new HiveOptions(metaStoreURI,dbName,tblName,mapper) .withTxnsPerBatch(10) .withBatchSize(1000) .withIdleTimeout(10); ...
See "HiveOptions Class Configuration Properties" for a list of configuration properties for the
HiveOptions
class. - Instantiate the
HiveBolt
with theHiveOptions
class:... StateFactory factory = new HiveStateFactory().withOptions(hiveOptions); TridentState state = stream.partitionPersist(factory, hiveFields, new HiveUpdater(), new Fields()); ...
- Before building your topology code, add the following dependency to your topology
pom.xml
file:
<dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> <version>4.3.3</version> </dependency>