AIMS teams up with the Norwegian Computing Center to make predictions to alert of a possible problem even before the problem itself arises



October 11, 2021

At AIMS, we strive to alert as early as possible that some problem is arising for a business-critical system. We want to arrive as far as to make predictions to alert of a possible problem even before the problem itself arises. However, this is an incredibly complicated task, especially considering that AIMS builds monitoring tools that are technology agnostic and that simultaneously monitor an incredible number of parameters.

For this reason, we decided that this was the time to venture into research and team up with the experts of the Norwegian Computing Center (Norsk Regnesentral). The Norwegian Research Council will support our research project for the next three years. The name of our project is PReVENT (prediction + events) which would focus on finding new ways of predicting problems combining analysis of time series data (data that contain a numerical value and a timestamp) and natural language processing on what we call events. "Events" are all messages and logs which indicate that an event occurred in an IT system and AIMS collects in text form.

Aims didgi no

General manager and one of the founders of AIMS Innovation Ivar Sagemo (left), PhD in theoretical physics Alessandra Cagnazzo, and researcher at the Norwegian Computing Center Annabelle Redelmeier. Photo: Odd Richard Valmot

What is the difference between reacting and predicting?

Monitoring tools for the most part are designed to alert when something out of normality happens. However, as precious as this information is, it allows us to react only when something is happening. But what if by looking simultaneously at different factors, that if considered singularly are perfectly normal, we can predict that things could go in the wrong direction, trying to find patterns through events and data and pre-alert a problem?

Let’s take for example the situation in which while using a browser I notice that my laptop slows down to an annoying point. It does not always happen, but it happens if I have more than 5 tabs open and two of these are a news website and a social media page, and only if the social media has been open for more than 5 minutes. Having more than 5 tabs open does not create any problem, as well as having the news website open or the social media page. Also, if I close the social media page soon enough I don’t get any performance problems. I will only have a problem when all the factors are present at the same time. In this scenario, making a prediction would mean for example to pre-alert that something might go wrong as soon as I open a social media page when having 5 tabs open with the news site among them, without that 5 minutes have passed. It might be that I will not have the social media page open for 5 minutes or that in the meanwhile I will close some of the other tabs, but when we pre-alert based on a prediction that is what we get: the pre-alert that something might go wrong, but not always necessary.

This example gives me the chance to speak about another problem, which is how to prevent something from going wrong. Predicting that something might go wrong does not mean you are automatically proposing a solution. In the example, the solution was clear, close something before the laptop slows down, but in real life, the factors that interplay can be thousands, and pinpointing which one is the one we have to intervene to prevent our problem is not an easy task.

Just thinking about a laptop, numerous factors combinations might slow it down, and it is a gargantuan task to find all of them, even in a small system like a laptop. Imagine the struggle for a business that has hundreds or thousands of machines.


Our project will focus on unveiling patterns to make predictions and alert that something might potentially go wrong or that is on the way to be resolved by itself if nothing major happens in the meanwhile. It is out of the scope of the project to find a way to propose solutions, but having a better understanding of the patterns within data and events will guide us also on the path to the resolution.

Why combine time series and events?

Many monitoring solutions focus either on time series/numerical data or on events (text data that describe that something has happened). But if we want to unveil the full picture, we need to combine both types of information. In the example of the browser, to know what is going on, we need to keep track of numerical factors, like the number of tabs open and how many minutes the social media page has been open, and of events, like the fact that we opened a news web page or a social media page.

Why research? Alessandra Cagnazzo & Annabelle Redelmeier

There are many techniques and studies on the prediction made on time series, and an increased number of positive applications of natural language processing, but when we started to look into possible applications of these we were faced with many challenges. Prediction sometimes is included in monitoring solutions that are highly custom to a single technology/company. These types of models are costly to develop, maintain and require extensive and constant training. How can we efficiently predict in a technology-agnostic setting with thousands of variables? How can we apply NLP to text that is not a human language? How can we combine time series and events in an automatised way and with a close-to-real-time response? 

All these questions are not standard problems that find solutions in the literature, so we need to research answers, explore limits and push the boundaries!

Topics from this blog: aiops

RECENT ARTICLES

Technical

BizTalk: Undiscovered secret of DTA purge and Archive stored procedure

DTA Database is one of the most important databases in the BizTalk DB component. The issue in this database can cause a lot of performance issues in BizTalk. To keep the DB healthy, Microsoft...
Technical

Recovery of Master Secret Server

As a part of the recovery process during DR (Disaster Recovery) or any other unforeseen scenarios, one might have to restore the master secret to re-use the existing setup and data. This article will...
aiops

AIMS teams up with the Norwegian Computing Center to make predictions to alert of a possible problem even before the problem itself arises

At AIMS, we strive to alert as early as possible that some problem is arising for a business-critical system. We want to arrive as far as to make predictions to alert of a possible problem even...