Introduction to SQL Stream Builder

Cloudera Streaming Analytics offers an easy to use and interactive SQL Stream Builder as a service to create queries on streams of data through SQL.

The SQL Stream Builder (SSB) is a comprehensive interactive user interface for creating stateful stream processing jobs using SQL. By using SQL, you can simply and easily declare expressions that filter, aggregate, route, and otherwise mutate streams of data. SSB is a job management interface that you can use to compose and run SQL on streams, as well as to create durable data APIs for the results.

What is Continuous SQL?

SSB runs Structured Query Language (SQL) statements continuously, this is called Continuous SQL or Streaming SQL. Continuous SQL can run against both bounded and unbounded streams of data. The results are sent to a sink of some type, and can be connected to other applications through a Materialized View interface. Compared to traditional SQL, in Continuous SQL the data has a start, but no end. This means that queries continuously process results. When you define your job in SQL, the SQL statement is interpreted and validated against a schema. After the statement is executed, the results that match the criteria are continuously returned.

Integration with Flink

SSB runs in an interactive fashion where you can quickly see the results of your query and iterate on your SQL syntax. The executed SQL queries run as jobs on the Flink cluster, operating on boundless streams of data until cancelled. This allows you to author, launch, and monitor stream processing jobs within SSB as every SQL query is a Flink job. You can use Flink and submit Flink jobs without using Java, as SSB automatically builds and runs the Flink job in the background.

As a result of Flink integration, you are able to use the basic functionalities offered by Flink. You can choose exactly once processing, process your data stream using event time, save your jobs with savepoints, and use Flink DDL to create tables and use custom connectors based on your requirements. As a result of customizable connectors, you are able to enrich your streaming data with data from slowly changing connectors.

The following table summarizes the supported connectors and how they can be used in SSB:
Connector Type Description
Kafka source/sink Supported as exactly-once-sink
Hive source/sink Can be used as catalog
Kudu source/sink Can be used as catalog
Schema Registry source/sink Can be used as catalog
JDBC source/sink Can be used with Flink DDL. PostgreSQL, MySQL and Hive are supported.
Filesystems source/sink Filesystems such as HDFS, S3 and so on. Can be used with Flink DDL
Webhook sink Can be used as HTTP POST/PUT with templates and headers
PostgreSQL sink Materialized View connection for reading views. Can be used with anything that reads PostgreSQL wire protocol
REST sink Materialized View connection for reading views. Can be used with anything that reads REST (such as notebooks, applications, and so on)