Azure Stream Analytics has been designed to enable users to create jobs that can run 24×7, ingesting and processing continuous stream of events. Microsoft guarantees 99.9% availability for Azure Stream Analytics. Further, to analyze the root cause if and when an error occurs, Azure Stream Analytics comes integrated with performance metrics, logs, job states etc. These can be accessed via the Azure Monitor Service.
Let’s have a look at the two most important metrics that need to be monitored in Azure Stream Analytics:
Job State: The most important metric to monitor is the status of the Azure Stream Analytics job. We have discussed Azure Stream Analytics job states in a previous post. A job which is not in a running state, cannot generate logs or metrics. A job can be in degraded mode for a while and can recover automatically after a while. But the job state to watch out for is the failed state. A job can enter failed state due to various reasons such as runtime errors including running out of SUs (out of resources).
Watermark Delay: This metric is a measure of lag between the event arrival and event processing in the processing pipeline i.e. how many seconds of delay is being introduced due to things such as processing logic. The more complex the processing logic, the bigger the lag. Ideally, watermark delay should be stable once the job is in running state. An increasing trend indicates issues in the processing pipeline. To address issues with watermark delay, the processing logic for the Azure Stream Analytics job should be analyzed and refactored if required.