Running SQL Stream jobs

Every time you run an SQL statement in the SQL Stream console, it becomes a job and runs on the deployment as a Flink job. You can manage the running jobs using the Jobs tab on the UI.

There are two logical phases to run a job:
  1. Parse: The SQL is parsed and checked for validity and then compared against the virtual table schema(s) for correct typing and key/columns.
  2. Execution: If the parse phase is successful, a job is dynamically created, and runs on an open slot on your cluster. The job is a valid Flink job.
  • Make sure that you have registered a Data Provider if you use the Kafka service on your cluster.
  • Make sure that you have added Kudu, Hive or Schema Registry as a catalog if you use them for your SQL job.
  1. Navigate to the Streaming SQL Console.
    1. Go to your cluster in Cloudera Manager.
    2. Select SQL Stream Builder from the list of services.
    3. Click SQLStreamBuilder Console.
    The Streaming SQL Console opens in a new window.
  2. Provide a name for the SQL job.
    1. Optionally, you can click Random Name to generate a name for the SQL job.
  3. Create a table in SQL window.
    You have the option to create a table in the following ways:
    • Using the Console wizard under Tables tab to add Kafka and Webhook tables.
    • Using the Templates under the SQL window.
    • Add your custom CREATE TABLE statement to the SQL window.
  4. Add a SQL query to the SQL window.

    When starting a job, the number of slots consumed on the specified cluster is equal to the parallelism setting. The default is one slot. To change the parallelism setting and more job related configurations, click Settings.

  5. Click Execute.
    The Logs window updates the status of SSB.
  6. Click Results to check the sampled data.
    These results are only samples, not the entire result of the new stream being created from the output of the query. The entire result set is written out based on what you define at the INSERT INTO statement. You can also output the result to a Materialized View.
A job is generated that runs the SQL continuously on the stream of data from a table, and pushes the results to a table, to the Console under the Results tab or to a Materialized View.