Enriching streaming data with join
In SQL Stream builder, you can enrich your streaming data with values from a slowly changing dataset using join statements.
Regular join
JOIN
syntax of SQL. The following
example shows a regular INNER JOIN
where the Orders
table is joined with Product table based on the
productId:SELECT * FROM Orders
INNER JOIN Product
ON Orders.productId = Product.id
A regular join can only be used with bounded tables. In a streaming context data is produced continuously, and with a regular join both sides of the join would need to be buffered indefinitely to store all of the events that would match with the result of the SQL query. To get results from a given amount of time and to join streaming tables, a time boundary needs to be specified. This means the tables not only need to be joined by a key or column, but also on a time attribute.
Interval join
BETWEEN
and an interval
value:SELECT *
FROM Orders o, Shipments s
WHERE o.id = s.order_id
AND o.order_time BETWEEN s.ship_time - INTERVAL '4' HOUR AND s.ship_time
In this case, Orders table is joined with the Shipments table and the results are going to be generated based on the id column as long as the order time and shipment time is within four hours of each other. The condition of this scenario is that the events in the streams happen almost at the same time with minimal delay, so the time boundary can be defined between an approximate interval.
When you want to join a streaming table with a slowly changing table, time attributes can differ as one of the tables stores data over a long period of time, while the streaming table receives new data continuously. To join these types of tables, a time needs to be defined that can serve as a reference point for both types of tables.