Getting started with SQL Stream Builder

Get started with SQL Stream Builder in CSP Community Edition.

The Getting Started guide walks you through the following steps from which you can learn how to use SQL Stream Builder (SSB):
  • How to access Streaming SQL Console
  • How to create a SQL job
  • How to generate data to SSB
  • How to create tables and views
  • How to query data
  • How to output the queried results using different solutions

1. Creating a job and table

As a first step to use SQL Stream Builder, learn how to create a job and a table that generates random data using the included Faker connector after accessing the Streaming SQL Console.

After accessing Streaming SQL Console using http://localhost:18121 from your browser, you need to create a job on the Getting Started page to be redirected to the Console page.

You have the following options to name your SQL job:
  • Use the predefined name in the Job Name field when opening the Streaming SQL Console
  • Use the Generate Job Name button to generate a new random name for your job
  • Manually provide the name of the job to the Job Name field

As a first step in executing a SQL query, you need to create a table. At this point, you can use the predefined templates under the SQL Editor.

In the following CREATE TABLE statement, the faker template is used, with the properties already filled out. You can copy and paste it to the SQL Editor.

DROP TABLE IF EXISTS orders;
CREATE TABLE  orders (
    order_id INTEGER,
    city STRING,
    street_address STRING,
    amount INTEGER,
    order_time TIMESTAMP(3),
    order_status STRING,
    WATERMARK FOR `order_time` AS order_time - INTERVAL '15' SECOND
) WITH (
    'connector' = 'faker',
    'rows-per-second' = '1',
    'fields.order_id.expression' = '#{number.numberBetween ''0'',''99999999''}',
    'fields.city.expression' = '#{Address.city}',
    'fields.street_address.expression' = '#{Address.street_address}',
    'fields.amount.expression' = '#{number.numberBetween ''0'',''100''}',
    'fields.order_time.expression' = '#{date.past ''15'',''SECONDS''}',
    'fields.order_status.expression' = '#{regexify ''(RECEIVED|PREPARING|DELIVERING|DELIVERED|CANCELED){1}''}'
);

To create the table, you need to click on the Execute button under the SQL Editor.

After executing the CREATE TABLE statement, there are two ways to view the created table:
  • You can view the table under the Virtual Tables tab by clicking the arrow to expand the table properties:
  • In the SQL editor window, you can replace the earlier statement by DESCRIBE orders;, and click Execute again:

2. Running a SELECT query

After creating a table, learn how to run a SELECT query that outputs a sample of the resulting data in Streaming SQL Console.

You have created your table using the Faker template that currently generates data. As a next step, you can run a SELECT query and ouput the results on the Console page under the Results tab.

Copy and paste the following SQL query to the SQL window:
SELECT * FROM orders;

To execute the query, you need to click on the Execute button under the SQL window.

After a short amount of time, the samples of the result continously show up under the Results tab.

When you execute a query, a Flink job is created in the background. You can see more details about the created Flink job by navigating to the SQL Jobs page, and accessing the Flink Dashboard for the running job. You can also reload the SQL job into the SQL Editor to edit the job, or stop the job from the SQL Jobs tab.

3. Creating and querying a view

Now that you know the basics of creating a table and querying data from a table, learn how to create a more complex query using views.

In the following example, you will use the columns from the created Faker table. In this derived computation that can be saved as a view, the orders are aggregated in half minute intervals based on their status.

Copy and paste the following CREATE VIEW statement to the SQL Editor:

DROP VIEW IF EXISTS summaries;
CREATE VIEW summaries AS
SELECT window_start, window_end, order_status, COUNT(*) as order_count, SUM(amount) as total_amount
FROM TABLE(TUMBLE(TABLE orders, DESCRIPTOR(order_time), INTERVAL '20' SECONDS))
GROUP BY window_start, window_end, order_status;

If your previous query is still running, click on the Stop button under the SQL Editor to stop the job. Afterwards, to create the view, you need to click on the Execute button under the SQL Editor.

When you execute a query, not all the records produced by the query are shown in the Results window. By default, 1 record is sampled every second to be displayed. Before selecting data from this view, you can change the Sample Behavior to Sample all messages on the Job Settings window. This way the Results window will contain all messages, up to the specified count.

Add the following SQL query to the SQL window to run the query:
SELECT * FROM summaries

To execute the query, you need to click on the Execute button under the SQL Editor.

After a short amount of time, the samples of the aggregated result continously show up under the Results tab. As it is an aggregated view, the results can appear slower after another.

4. Outputting results of SQL query

As you already know how to query tables and views, the next step is to write the queried results into a sink or create a Materialized View.

Adding a sink using a Local Kafka template

So far you have sampled the results to the Console window, so the next step is to learn how to create a sink using one of the predefined Templates to output the results of the executed SQL queries.

Previously, you created the summaries view, executed the SELECT * FROM summaries query, and examined its results on the Streaming SQL Console.

Now, the next step is to stream the aggregated records to an external system, such as Kafka. The Community Edition comes with a preconfigured Kafka container that can be selected from the predefined Templates.

The created view has a complex schema. It would be tedious to describe it, and manually create a Kafka table with a matching schema. Luckily, SQL Stream Builder can do this task automatically.

As you can add more statements in one SQL window, you can create the Kafka table and add a SELECT query at the same time.

Add the following SQL query to the SQL window, but do not execute it. At this point the statement is only used to match the schema from summaries table with the schema of the output_table that will be created at the next step:
INSERT INTO output_table SELECT * FROM summaries;

As the output_table does not exist yet, you need to import the CREATE TABLE statement to the SQL window by selecting Templates > local-kafka > json. SSB automatically inserts the appropriate CREATE TABLE statement with the schema retrieved from the summaries table. Before executing the statement, you must fill out the topic in the connector option. This means you need to replace '...' in the template with 'mytopic' as it is shown in the following example:

Add the SELECT query after the CREATE TABLE statement:
INSERT INTO output_table SELECT * FROM summaries;

Click on the Execute button to run the script, both statements will be executed.

Adding a sink using a Schema Registry catalog

You can use Schema Registry as a catalog to output results based on a defined schema in Schema Registry.

SQL Stream Builder (SSB) in the CSP Community Edition comes with a preconfigured Schema Registry catalog. This means that you do not have to register the local Schema Registry service running in the Community Edition as it is already set up.

To be able to use Schema Registry as a sink, you need to create a schema and a topic that will be used as the output sink.

Navigate to the Schema Registry user interface by opening http://localhost:7788 from your browser. Click on the Add New Schema button, and provide the schema and the necessary information to the window:
Setting Value
NAME sr_summaries
DESCRIPTION Provide a desctiption based on your requirements.
TYPE Avro schema provider
SCHEMA GROUP Kafka
COMPATIBILITY BACKWARD

The following illustration also details how you need to fill out the Add New Schema settings.

The following schema was used in the example:
{
 "doc": "Default schema - modify as necessary",
 "namespace": "com.cloudera.exampleschema",
 "type": "record",
 "name": "exampleSchema",
 "fields": [
  {
   "type": "string",
   "name": "status"
  },
  {
   "type": "int",
   "name": "total"
  }
]
}
After you created the schema, you need to create a new Kafka topic with the same name. You can easily create a Kafka topic using Streams Messaging Manager (SMM). Navigate to the SMM user interface by opening localhost:9991 from your browser. Select Topics from the main menu, and click on the Add New button. Fill out the fields, and click Save to create the topic:
Setting Value
TOPIC NAME sr_summaries
PARTITIONS 3
Availability LOW
CLEANUP POLICY delete

The following illustration also details how you need to fill out the Add Topic settings.

Navigate back to the Streaming SQL Console using http://localhost:18121 or refresh the browser window, and the table will appear under Virtual Tables. When you describe the created table, you can see the added schema and also that it is automatically connected to the local Kafka and Kafka topic.
After setting up the schema and topic, you can submit a SELECT query where the order_status and total_amount are inserted into the sr_summaries table.
INSERT INTO `schema_reg`.`default_database`.`sr_summaries`
SELECT order_status, total_amount FROM summaries
A sample of the results is displayed on the browser under the Results tab, however you can also view the messages generated to the Kafka topic using SMM.

When you navigate to the SMM user interface, search for the sr_summaries topic, and click on the Data Explorer view to see the received messages. Set the Values deserializer type to Avro for the messages to be shown correctly. SMM will automatically fetch the schema from Schema Registry to deserialize the messages.

Adding a Materialized View

One of the features of SQL Stream Builder is creating Materialized Views. Materialized Views let you create a persistent snapshot of the stream that can be queried through a REST endpoint. This can then be queried by external applications and tools. Now, that you know how to query simple and aggregated data, learn how to create Materialized Views.

In the following example, you will create a Materialized View that shows the latest order count and total amount for each order status. The key will be the status, and the other fields will be updated with the value from the latest 20 second interval.

First, you need a New Job with a SELECT query. Add the following SQL query to the SQL Editor to run the query:
SELECT * FROM summaries
Before executing the query, you need to enable the Materialized View feature. On the Materialized View tab, set the Materialized View configuration to Enabled, and select the order_status as the primary key.

You also must add an API key to the configuration that will be a part of the generated URL. Click on the Add API Key button, and provide any type of name to the API key. For example orders_api_key, and click Save. After the API key is created, select it as the API Key of the current Materialized View.

After configuring the Materialized View, you can add a query by clicking on the Add New Query button. On the Query Configuration page, specify the URL Pattern and select the fields that you want to include in the Materialized View. You can also add filters to further narrow down the result. In the following example, summary_query is the URL Pattern, order_status, order_count, total_amount and window_start columns are selected as fields. You can add these fields by clicking Select All, and removing window_end with the red bin icon under Actions. You also have the option to select the fields from the drop-down menu under Query Builder, and add the fields individually. A filter is also added to exclude the orders with a CANCELED status.

Click on the Apply and Save Job button to add the query to the Materialized View. The URL will appear under the Materialized View Queries.

When you click on the URL, you are redirected to the http://localhost:18131 address. You will receive an error message that no data is generated, because the SELECT query is not executed yet. Go back to the SQL tab on the Streaming SQL Console, and click on the Execute button to start the SQL job. After a couple of seconds, the endpoint becomes active and the results show up on the localhost address.

Next step

You have completed the Getting Started guide of Streaming Analytics. As you already know the basic steps to use SSB, feel free to experiment more and get more familiar with the features of SSB by trying out the PostgreSQL CDC and Filesystem connectors, or learning how to manage and monitor your SQL jobs.