A primer on AIMS predictive anomaly detection



January 25, 2022

I was so excited about the anomaly detection feature in AIMS, that I decided to experiment with it. Check my previous post for more details about AIMS monitoring platform.

I already had defined my environment topology in AIMS, as per the online guide, to include the three platforms I have: Windows Server, SQL Server, and BizTalk Server.

 

For my first black-box testing, I decided to focus on just one factor: Message throughput, this way I could be able to control and follow the anomaly detection outcomes easily, without too much noise and complexities from other factors. So, I implemented a simple BizTalk solution that has one SQL receive port which does SQL polling on a custom DB Table records, and one Send Port that subscribes upon it and nullify those incoming messages. 


The polling statement and interval will act as my control knob for test, where I can change the polling statement throughput, that is, the amount of Messages (m) being polled every (x) number of seconds, in the SQL receive location configuration.

 

During the first two weeks, AIMS platform used machine learning to analyze different metrics across the platforms I already assigned to it. This way AIMS had an initial understanding of the weekly pattern behavior in my environment based on these ingested metrics.

To make the messaging throughput a bit more realistic during this initial period, I tried to randomize messaging loads at different days and different times of the day, with values that ranged between (100 messages/30 seconds, and all the way to 3000 messages/10 seconds). Keeping in mind that I am only changing this one dimension, that is, message throughput in my solution.

The following weeks, I started introducing way unprecedented spikes with much higher message throughput (4000 messages/10 seconds and above), and indeed AIMS started to report some anomalies on my SQL receive location component.

 

The key part here, is that it didn’t wait for my systems to encounter critical errors, and then report it, but, it actually acted proactively based on the previously analyzed patterns. This is the proactive monitoring approach where AIMS monitoring platform is differentiated, it is based on its machine learning capabilities and proprietary set of algorithms that detects such unexpected patterns at its back-end engine.

Inspecting the anomaly details in AIMS, you will find that the anomaly manifests itself as several fine-grained set of parameters and the corresponding impacted nodes. For example, you can see below how BizTalk Message Box (node) was affected in terms of CPU time and Physical Writes (parameters), you will also find how much those parameters were deviated from the expected normal behavior range, and for how long.

 

One thing to note here is that those deviations will not be necessarily linear with the message count increase rate (i.e. the root cause I induced), or with other parameters. It is actually interesting to witness how my throughput change affected the other systems components, it was like it triggered a ripple effect that impacted several components/systems, and it is in fact this ripple effect that is being analyzed and its corresponding anomalies being detected by AIMS.

You don’t really need to inspect each and every parameter in the set of those fine-grained parameters being displayed, but it is a good chance for you to start recognizing the correlation between those highly impacted parameters, and maybe select a few of those parameters to be added to your custom dashboard to monitor them more closely in the future. 

 

 

This test scenario could be translated into a real-world scenario where incoming business-related requests are starting to get much higher on your on-ramp receive port, you may consequently need to scale-up/scale-out your integration solution, or may be conduct performance tuning for your platforms, all of which will require adequate time to plan and execute.

Having received an anomaly notification in near real-time will allow you to have that much-needed time to properly assess and act upon it in the early stages, instead of having to react to critical issues after they already impacted your integration solution, delayed, or disrupted the business you serve.

On a different note, you can provide your feedback to AIMS anomaly events to contribute to the engine’s learning process, by taking an actions such as ignoring, resolving, or re-opening those reported events.

You can also force the platform to re-learn the normal behavior patterns, this is particularly useful, if you a have behavior change(s) in your environment that is now considered normal, this will allow AIMS to reset and start recognizing these new change(s) as part of the new normal pattern, and what deviates as the new anomalies. Moreover, the re-learn action can be selectively performed on specific components, as shown below on my SQL port.

 
One final note, AIMS in all its glory will still need us, humans, for having the final assessment, pinpointing, and fixing the root cause, we are not there yet where we can rely entirety on AI for doing 100% of our jobs.

This article is merely scratching the surface of anomaly detection in AIMS, and what it can provide for monitoring your integration solutions, I believe that the potential for this kind of proactive intelligent monitoring is huge, and it certainly has room to grow much more in the future.

Download the AIMS Integration Monitoring Tool Checklist 
This blog post was originally published on Ahmed Taha's technical blog, To Integration and Beyond on 18 June 2018.

Topics from this blog: Blog

Author

Technical Architect at AIMS Partner Link Development and #aimsperformancepro. More than 10 years of hands-on experience with Microsoft Integration Stack.

Ahmed Taha

Technical Architect at AIMS Partner Link Development and #aimsperformancepro. More than 10 years of hands-on experience with Microsoft Integration Stack.

Share this Post

Subscribe to our newsletter

RECENT ARTICLES

IT operations monitoring

IT operations monitoring tools will help you better understand and control all your IT data and information. AIMS automated monitoring solution is powered by AI, which gives you even greater...

A comparison of the top AIOps tools

Are you looking for an AIOps tool to improve your IT operations? Then you should first compare available tools on the market. Here are the aspects that make AIMS stand out. The AIMS AIOps tool at a...

AIMS - the AIOps tool with the highest customer satisfaction

AIMS breaks into the AIOps market disrupting the traditional players as 100% of users believe AIMS is headed in the right direction with the truly automated monitoring and AI at its core. In its...