Running SQL Stream jobs

Every time you run an SQL statement in the SQL Stream console, it becomes a job and runs on the deployment as a Flink job. You can manage the running jobs using the Jobs tab on the UI.

There are two logical phases to run a job:
  1. Parse: The SQL is parsed and checked for validity and then compared against the virtual table schema(s) for correct typing and key/columns.
  2. Execution: If the parse phase is successful, a job is dynamically created, and runs on an open slot on your cluster. The job is a valid Flink job.
  • Make sure that you have registered a Data Source if you use the Kafka service on your cluster.
  • Make sure that you have added Kudu, Hive or Schema Registry as a catalog if you use them for your SQL job.
  1. Navigate to the Streaming SQL Console.
    1. Navigate to Management Console > Environments, and select the environment where you have created your cluster.
    2. Select the Streaming Analytics cluster from the list of Data Hub clusters.
    3. Select Streaming SQL Console from the list of services.
      The Streaming SQL Console opens in a new window.
  2. Open a project from the Projects page of Streaming SQL Console.
    1. Select an already existing project from the list by clicking the Open button or Switch button.
    2. Create a new project by clicking the New Project button.
    3. Import a project by clicking the Import button.
    You are redirected to the Explorer view of the project.
  3. Click next to Jobs from the Explorer.
  4. Select New Job .
    The SQL Editor for the created job opens in a tab.
  5. Provide a name for the SQL job.
    1. Optionally, you can click Generate Random Name to generate a name for the SQL job.
  6. Click Create.
  7. Create a table using one of the following options:
    • Using the Add Kafka or Webhook table wizard.
    • Using the Templates.
    • Add your custom CREATE TABLE statement to the SQL window.
  8. Add a SQL query to the SQL Editor.

    When starting a job, the number of slots consumed on the specified cluster is equal to the parallelism setting. The default is one slot. To change the parallelism setting and more job related configurations, click Job Settings.

  9. Click Execute.
    The Logs window updates the status of SSB.
  10. Click Results to check the sampled data.

    You can pause or restart the sampling of a SQL job with the Pause and Refresh buttons. The status of the polling can also be seen next to the Pause button. The Results tab also allows you to change the order of the samples, and to filter the samples using different types of conditions that you can also combine. For changing the order, click on the name of the column, the direction of the ordering is indicated next to the column name by arrows.

    For filtering, you can either click the button next to the column name or under the row of the column name. The filtering value can be changed by typing the values directly to the filtering row, the conditions need to be changed using the filtering arrow.

  11. Click Events to check the job related information.
A job is generated that runs the SQL continuously on the stream of data from a table, and pushes the results to a table, to the Results tab or to a Materialized View.