Streaming Analytics Data Hub cluster definitons
There are four Streaming Analytics cluster definitions available to deploy Streaming Analytics in CDP Public Cloud. You can choose from Light and Heavy duty options, and you can further select the cluster definitions depending on your cloud provider.
- Streaming Analytics Light Duty for AWS
- Streaming Analytics Light Duty for Azure
- Streaming Analytics Light Duty for GCP
- Streaming Analytics Heavy Duty for AWS
- Streaming Analytics Heavy Duty for Azure
- Streaming Analytics Heavy Duty for GCP
Streaming Analytics offers real-time stream processing and stream analytics with low-latency and high scaling capabilities powered by Apache Flink.
Streaming Analytics templates include Apache Flink that works out of the box in stateless or heavy state environments. Beside Flink, the template includes its supporting services namely YARN, Zookeeper and HDFS. The Heavy Duty template comes preconfigured with RocksDB as state backend, while Light Duty clusters use the default Heap state backend. You can create your streaming application by choosing between Kafka, Kudu, and HBase as datastream connectors.
You can also use SQL to query real-time data with SQL Stream Builder (SSB) in the Streaming Analytics template. By supporting the SSB service in CDP Public Cloud, you can simply and easily declare expressions that filter, aggregate, route, and otherwise mutate streams of data. SSB is a job management interface that you can use to compose and run SQL on streams, as well as to create durable data APIs for the results.
In CDP Public Cloud, you can use the predefined cluster definitions within your environment to connect Flink and SSB with the supported service connectors.
Cluster Definition | SSB Connector1 | Flink Connector |
---|---|---|
Streams Messaging | Kafka, Schema Registry | Kafka, Schema Registry |
Real-Time Data Mart | Kudu | Kudu |
Data Engineering | Hive2 | Hive |
Operational Database | - | HBase |