Enriching streaming data with join
In SQL Stream builder, you can enrich your streaming data with values from a slowly changing dataset using join statements.
JOINsyntax of SQL. The following example shows a regular
INNER JOINwhere the Orders table is joined with Product table based on the productId:
SELECT * FROM Orders INNER JOIN Product ON Orders.productId = Product.id
A regular join can only be used with bounded tables. In a streaming context data is produced continuously, and with a regular join both sides of the join would need to be buffered indefinitely to store all of the events that would match with the result of the SQL query. To get results from a given amount of time and to join streaming tables, a time boundary needs to be specified. This means the tables not only need to be joined by a key or column, but also on a time attribute.
BETWEENand an interval value:
SELECT * FROM Orders o, Shipments s WHERE o.id = s.order_id AND o.order_time BETWEEN s.ship_time - INTERVAL '4' HOUR AND s.ship_time
In this case, Orders table is joined with the Shipments table and the results are going to be generated based on the id column as long as the order time and shipment time is within four hours of each other. The condition of this scenario is that the events in the streams happen almost at the same time with minimal delay, so the time boundary can be defined between an approximate interval.
When you want to join a streaming table with a slowly changing table, time attributes can differ as one of the tables stores data over a long period of time, while the streaming table receives new data continuiously. To join these types of tables, a time needs to be defined that can serve as a reference point for both types of tables.