Running SQL Stream jobs

Every time you execute an SQL statement in the SQL Stream console, it becomes a job and runs on the deployment as a Flink job. You can manage the running jobs using the Jobs tab on the UI.

There are two logical phases to run a job:
  1. Parse: The SQL is parsed and checked for validity and then compared against the virtual table schema(s) for correct typing and key/columns.
  2. Execution: If the parse phase is successful, a job is dynamically created, and runs on an open slot on your cluster. The job is a valid Flink job.
  • Make sure that you have registered a data source.
  • Make sure that you have created a Virtual Table Source.
  1. Go to your cluster in Cloudera Manager.
  2. Click on SQL Stream Builder from the list of Services.
  3. Click on SQLStreamBuilder Console.
    The Streaming SQL Console opens up in a new window.
  4. Provide a name for the SQL job.
    1. Optionally, you can click on the Random Name button to generate a name for the SQL job.
  5. Select a Virtual Table Sink.
    1. Optionally, you can leave the sink to None.
  6. Add a SQL query to the SQL window.

    When starting a job, the number of slots consumed on the specified cluster is equal to the Parallelism setting. The default is 1 slot. To change the parallelism setting, click Advanced settings.

  7. Click Execute.
    The Logs window updates the status of SSB.
  8. Click Results to check the sampled data.
    These results are only samples, not the entire result of the new stream being created from the output of the query. The entire result set is sent to the Sink Virtual Table and/or a Materialized View.
A job is generated that performs the SQL continuously on the stream of data from the Source Virtual Table, and pushes the results to a Sink Virtual Table.