As we have seen in the previous post, Azure Stream Analytics is designed to handle real time, high velocity streaming data. To efficiently analyze streaming data, we need to create batches or groups of the incoming data items. This is where windowing functions come in.
A window in Azure Stream Analytics context, means a block of time-stamped event data (e.g. IoT, web clickstream etc.) that enables users to perform various statistical operations (most commonly, aggregations) on the event data.
To partition and analyze a window in Azure Stream Analytics, four types of windowing functions are available:
- Tumbling Window: This is the easiest to understand, of all the Azure Stream Analytics windowing functions. In tumbling window function, the data stream is segmented into distinct fixed length time segments. This can be easily understood by the diagram below.
- Hopping Window: Hopping windows are like Tumbling windows (both have fixed duration segments) but in hopping windows, the data segments can overlap. So, while defining a hopping window there are two parameters that we need to specify, the window size (length of data segment) and hop (duration of the overlap). In the example below, window size is 10 seconds and hop size is 5 seconds.
- Sliding Window: This windowing function does not necessarily produce aggregation after a fixed time interval, unlike the tumbling and hopping window functions. Aggregation occurs every time a new event occurs, or an existing event falls out of the time window.
To understand this concept, lets have a look at the diagram below:
- Window a : aggregation occurs when the first event, (1) arrives at the 10th second.
- Window b: When the next event, (5) arrives at the 12th second, another aggregation occurs from 2 secs to 12 secs.
- Window c: When the next two events, (9 and 7) arrive at the 15th second, aggregation occurs from 5 secs to 15 secs
- Window d: At the 20th sec, the event 1 (which arrived at the 10th sec) “drops out” of the 10 second aggregation window, triggering a new aggregation, and the process continues.
- Session Window: This function groups events based on time of arrival, so there is no fixed window size. Instead, there are three parameters, timeout, max duration and partitioning key(optional). The purpose of session window is to eliminate quiet periods in the data stream i.e. time periods when no events arrive.
A new window begins when the first event arrives. The window is kept open until the specified timeout period counting from the arrival time of the preceeding event. This is like a countdown for closing the window. If a new event arrives within the timeout period, the window close countdown is reset to the timeout period, otherwise the window is closed.
If events keep arriving before the countdown hits 0, then the window keeps growing until the maximum duration (specified at the time of defining the window) is reached. Another important point to note is that, the check for the maximum duration is done at the defined interval e.g. if the max duration is set to 10 mins then, the check whether the current window has reached max duration would happen at 10th, 20th ,30th mins and so on.