Running SQL Stream jobs

Every time you run an SQL statement in the SQL Stream console, it becomes a job and runs on the deployment as a Flink job. You can manage the running jobs using the Jobs tab on the UI.

There are two logical phases to run a job:
  1. Parse: The SQL is parsed and checked for validity and then compared against the virtual table schema(s) for correct typing and key/columns.
  2. Execution: If the parse phase is successful, a job is dynamically created, and runs on an open slot on your cluster. The job is a valid Flink job.
  • Make sure that you have registered a Data Provider.
  • Make sure that you have created a Table that can be used as a source in the SQL query.
  1. Go to your cluster in Cloudera Manager.
  2. Click SQL Stream Builder from the list of services.
  3. Click SQLStreamBuilder Console.
    The Streaming SQL Console opens in a new window.
  4. Provide a name for the SQL job.
    1. Optionally, you can click Random Name to generate a name for the SQL job.
  5. Select a Sink Table.
    1. Optionally, you can leave the sink as None.
  6. Add a SQL query to the SQL window.

    When starting a job, the number of slots consumed on the specified cluster is equal to the parallelism setting. The default is one slot. To change the parallelism setting, click Advanced settings.

  7. Click Execute.
    The Logs window updates the status of SSB.
  8. Click Results to check the sampled data.
    These results are only samples, not the entire result of the new stream being created from the output of the query. The entire result set is sent to the Sink Table and/or a Materialized View.
A job is generated that runs the SQL continuously on the stream of data from the source Table, and pushes the results to a sink Table, to the Console under the Results tab or to a Materialized View.