AIOps is an emerging buzzword, short for "Artificial Intelligence for IT Operations". As with Cloud, Edge Computing, or any other buzzword you can think of, every vendor has its own definition. But when you cut through all the marketing, what really is AIOps?
As AIMS is a vendor selling AIOps technology, it naturally has its own definition. Like every other vendor, the Very Official Marketing Definition of AIOps can be summed up as "AIOps is what AIMS sells. Now please shovel sacks of cash💰 in the general direction of the AIMS sales team."
So let us dispense with marketing definitions, attempts to position AIMS favorably against competitors, and all the rest of what you'd expect to see on a vendor blog. This blog isn't about selling you AIMS's AIOpsy goodness. This is about the philosophy behind AIOps, and a cold, hard examination of what the state of the market for AIOps is really like.
More importantly, this blog aims to answer what value AIOps has to real-world IT practitioners, and will even address the omnipresent boogeyman of the robots coming for our jobs.
What is AIOps?
Artificial Intelligence for IT Operations, is a term coined by Gartner. AIOps applies and is powered by two core technologies that continue to go through a riveting pace of development - Big Data and Machine Learning (ML). Down to a single phrase, effective AIOps is a digital assistant for systems administrators. Think Siri, or Alexa. Instead of turning lightbulbs on, randomly buying stuff for you from Amazon, or playing the wrong song, AIOps is about being a digital assistant for your data center.
In most cases, "Artificial Intelligence" is anything but intelligent. This is not exactly a secret. Apple Maps still randomly tells people to do crazy things like drive off bridges, or get lost in the desert, even 8 years after the first snarky news reports emerged. Letting AI anywhere near our precious mission-critical workloads sounds like pure lunacy!
This is both true and false at the same time. Unlike a consumer-focused AI assistant, or the AI underpinning maps applications, the adoption of AIOps can be thought of as more quantum than binary. AIOps also occupies a superposition of possible levels of utility that don't collapse into an actual outcome until the humans interfacing with it decide how much effort they're going to put into learning from it.
Let's consider Google Search for a moment. Not Google Search in 2022, but Google Search in the year 2000. In the year 2000, there were many search engines that had mainstream mindshare, and they were all crap. Google started gaining market share – and mind share – for the simple reason that it was less crap than the competition.
Google was founded in September of 1998. By the year 2000, its competitors were three and four years older than it. Google began indexing the web in 1996, but its prominent competitors had started in 1994, giving them at least two years head start in indexing the web, and another two years head starts in gathering data about user search behavior. So how did this tiny startup end up being less awful than the competition?
The answer is that Google wasn't just about indexing the web. Google ranked sites with a proprietary algorithm. Google used machine learning algorithms to study everything from how to tell whether or not sites were malicious, to what drove user attention and clicks. On top of this knowledge, they built an empire.
This is the kind of AI that underpins AIOps. Every vendor's approach is different, but most boil down to some form of software that comes pre-loaded with some basic knowledge, and a learning engine that learns about your data center and/or applications, microservices, analyses data, and tools for IT practitioners to actually solve problems.
The above is rather vague. Most people who work in IT could guess that AI-anything involves some flavor of analytics and machine learning. Isn't there something more specific that can be said?
Again, the answer is both yes and no. Beyond the generalization provided above, the AIOps technology offered by vendors really starts to differ. Some AIOps platforms are heavily infrastructure-focused. Others are aimed at helping developers. AIMS is an AIOps provider focused on application administration and application insight. AIOps comes in flavors.
The broad strokes do serve a purpose, however, and there are some common threads to AIOps, regardless of the vendor. Perhaps the most important thread woven through AIOps offerings is that they can almost be thought of as a next-generation ticketing system, with added AI and ML sauce.
Put aside for a moment how awful the customer experience of most ticketing systems is, and think about what a ticketing system is supposed to be. In a perfect world, ticketing systems not only serve as a means to track what needs to be done, but they are also a knowledge base. If the humans involved do their jobs, every problem that IT encounters is logged in the ticketing system, along with a root cause analysis, and the eventual solution.
On day 1, the ticketing system is nothing more than a glorified to-do list. Ten years in, however, the knowledge contained in that ticketing system is absolutely invaluable. Hypothetically, of course.
Ticketing systems have a lot of problems. The search capabilities of ticketing systems are almost universally panned by practitioners. Humans are lazy and don't enter all relevant information. A lot of the time tickets are closed simply by rebooting something, and root cause analysis is never performed.AIOps tries to automate as much of this as possible, and provide a search that's actually useful. Again, while every vendor in this space has a different area of interest, and a different approach, all of them are, on some level, trying to do the following things:
- Learn what "good" looks like
- Identify when things are not "good"
- Find an adult if things are not "good"
- Tell the adult what the AI thinks the solution is to get back to "good"
Why would you need AIOps?
Traditional IT Operations setup is broken. The use of AI and machine learning is becoming a necessity for DevOps teams to optimize IT operations. As businesses go digital their use of more complex technologies, cloud platforms can put them at risk of failing with crucial governance and monitoring requirements for large volumes of data. Failure will impact the corporate Profit & Loss through revenue- and productivity loss.
Here are a number of benefits of AIOps tools:
- Effective AIOps can help you identify situations you would otherwise not capture and leverage insights that you did not have access to before.
- Address performance issues automatically preventing the need for middle-of-the-night emergency wake-up calls, accelerate incident response in case of an outage and easily get to the root cause of problems.
- AIOps platforms provide prioritization of notifications that helps you focus only on those that matter.
- Reduce detection time and accelerate remediation.
- AIOps can also help you obtain improved operational observability.
- AIOps enables advanced analytics in real-time including predictive analytics to address problems faster.
- AIOps brings recommendations that are data-driven, making use of both real-time and historical data.
- Using AIOps you can reduce or eliminate costly and time-consuming human errors.
- Automate IT operations and free up IT capacity from mundane, repetitive tasks.
- AIOps systems provide improved visibility to deploy code faster.
- The ability to get topology context with anomalies allows a drastic reduction in MTTK (Mean Time To Know) and the MTTR (Mean Time To Respond) beyond anything that humans are capable of doing alone.
A practical example: AIMS
In order to usefully discuss AIOps in more detail, one must pick a vendor and explore that specific solution. This being the AIMS blog, dissecting AIMS makes the most sense, so for the purposes of this blog, AIMS will serve as the standard candle against which other AIOps solutions are compared.
As mentioned above, AIMS is an AIOps solution focusing on application administration and application insight. Although the AIMS platform is generic, AIMS started out focusing on Microsoft integration technologies, in large part because this is where its founders had the most experience.
AIMS uses OS agents and APIs to gather information. Windows Server is the operating system supported. AIMS also supports BizTalk, SQL Server, all Microsoft Azure infrastructure and services, IIS, generic file monitoring, and HTTP/S endpoint monitoring. Next is an extension beyond Microsoft to AWS and other non-Microsoft technologies commonly used by enterprises in their core application integration scenarios.
Like other AIOps solutions, AIMS monitors the various technologies and products it supports. The AI builds and continuously updates baselines for the supported technologies and products, while continuously looking for anomalies based on the event correlation of metric deviations from the current baseline. The current baseline of thousands of metrics is cyclical with the nature of the business supported and becomes a dynamic, self-enhancing digital fingerprint or DNA of the system.
The metrics are primarily performance data such as message count on ports and orchestrations, CPU load, execution count on stored procedures in a database to in/out transfer rates to/from cloud storage. All this data (most often north of 10,000 metrics) is relevant for business processes that are often critical for driving revenue or productivity for the business. With this data, AIMS provides Business Insights or Business Signals directly from the underlying technologies that are the building blocks for digital transformation.
AIMS monitors as deep into the technologies and products it supports as possible and looks for any irregularities.
A traditional, non-AI monitoring tool would be functionally useless at this task. Consider an application that relies on a database residing on shared storage. Anything else that uses that shared storage is going to cause deviations in performance. In order to prevent a flood of meaningless alerts, pre-AI monitoring software would have to use thresholding.
Thresholding is complicated. It requires a great deal of effort and expertise to determine useful thresholds for various performance counters, and in highly dynamic, shared environments (for example, public clouds,) the reality is that thresholds should really change on a regular basis.
The two uses of thresholding
Thresholds have two uses, depending on who you talk to. For one group of people, application performance is what really matters: if the application becomes too slow, then IT should be alerted, and they should do something about it.
IT operations teams, however, tend to have a more nuanced view. Yes, performance thresholds are useful, but applications operating on shared infrastructure dip below their target thresholds all the time. In the overwhelming majority of cases, these performance excursions are highly transient and unnoticed.
The pattern of threshold violations, however, could help someone determine if there was a service management problem, assuming anyone was willing to stare at the monitoring output for long enough. The reason staring at the monitoring output is required, is because thresholding and alerting are, despite decades of work, still a hot mess of false positives. There are vendors applying AI to try to stem to flood, but something better is called for.
This brings us back around to AIMS.
The sysadmin's digital assistant
AIMS Innovation learns what normal operations look like. AIMS also keeps track of thousands of metrics, analyzes data, and performs correlation, so it has a much better chance of determining whether a deviation in performance is a legitimate problem, or whether it is just an irrelevant transient, and can be ignored.
AIMS can also be taught about the interconnected nature of applications and can use this knowledge to perform automated root cause analysis. Before AIMS, operations teams trying to dig themselves out from under a crashed application would have to walk back through the cascade of failed components, workflows, and services to determine what actually went boom💥. AIMS tracks all of that and can surface the root cause to administrators quickly, and efficiently.
AIMS doesn't do anything a systems administrator can't do. And it doesn't really replace a ticketing system. What it does, however, is analyze both real-time monitoring data and its own historic archives to identify deviations in application behavior from established behavior in order to identify problems before they become noticeable to end-users, and/or to perform root cause analysis that would take an experienced sysadmin digging through a traditional ticketing system hours or days.
AIOps use cases
Sustainable IT monitoring in a modern agile world requires disruption of traditional practices and monitoring tools. Here are some of the most powerful AIOps use cases, supported by AIMS Innovation:
- Anomaly detection can help you prevent outages and avoid downtime.
- Topology discovery to help you understand the assets your business relies on.
- AIOps provides real-time performance analytics that automagically surfaces actionable insights from all your IT environment across on-prem, hybrid, and cloud.
- AIOps helps automate a wide range of IT operations, for example, defining and adjusting dynamic thresholds based on your business behavior.
The 5 Building Blocks of AIOps
AIOps combines several basic capabilities (or building blocks) from collecting huge amounts of data from data sources to taking action. AIOps allows IT operations to employ a system that makes sense out of the data and makes it easy to act on. These five (+1) capabilities — each one delivering incremental value — should initially be implemented sequentially with later iterations. Read more about the 5 (+1) Building Blocks in this open guide.
The reality of AIOps adoption and the potential of AIOps
As discussed several times above, each AIOps vendor has a different area of focus. This keeps getting repeated because it is important. The AIOps market today is much like the search market in the late 90s. There are AIOps vendors, like AIMS, that specialize in doing core things and doing it extremely well.
There are also many AIOps platforms that are essentially traditional marketing and ticketing system vendors with some AI slathered on top. These vendors can track way more data points than a targeted AIOps vendor, but lack AIs with domain-specific knowledge.
None of the AIOps vendors today have solutions that can monitor everything in a data center, let alone all of an organization's IT across multiple public cloud vendors, service providers, on-premises, mobile, IoT, etc. Getting there will take a decade or more of development, mergers, acquisitions, and so forth. This doesn't mean that AIOps is useless. It also doesn't mean that a decade from now the creepy AIOps army is coming for everyone's job. Google didn't wipe out research assistants, paralegals, journalists, or other jobs focused on uncovering information. What Google did was make finding that information easier. It made individual researchers able to more quickly find the information relevant to them, and thus able to handle more, larger, and/or more complex research tasks than was possible in the physical paper era.
In a similar vein, AIOps products are simply tools that help IT Ops teams do their jobs more efficiently. Like automation and orchestration platforms, AIOps products help IT practitioners manage more applications than would be possible without AIOPs.
The goal of every AIOps vendor is to build a tool that becomes an extension of the IT practitioner. A tool that becomes so much a part of us that we choose not to remember what life was like before it, just like most of us choose not to dwell on what life was like before Google, or what life was like before we all started carrying around the sum total of human knowledge in our back pocket in the form of a smartphone.
As an extension of the IT practitioner, AIOps can help bring a sense of pride and accomplishment back to beleaguered IT teams drowning under the sheer volume of workloads they have to manage. With AIOps, IT teams can do these things faster, for more workloads, and have to face fewer meetings with angry suits where IT can only repeat "we don't know…yet".
That’s what AIOps is. It provides a means to automate arduous and miserably annoying tasks so that we can focus on more meaningful work. AIOps gives us the opportunity to take pride in what we do because we aren’t bogged down with the mundane. So why not book an AIMS demo today and get started with AIOps?
Achieve effortless performance monitoring and free up your DevOps teams. Get started with AIMS free.
Topics from this blog: Blog