Anomaly Detection on OpenStack Logs Using Machine Learning

by | Mar 29, 2019 | AI

(There’s a great  new conference in the U.S., DevConf.US, returning in 2019 to Boston University (15 to 17 Aug). This highly-technical conference is interested in drawing a diverse group of speakers and attendees, with a specific emphasis on people who are new to speaking and tech conferences in general. Only in its second year, DevConf.US builds on the successful decade-spanning run of DevConf.CZ in Brno, CZ.

This is a session from DevConf.US 2018. The call for proposals to present at DevConf.US 2019 is now open.)

In this session from the CentOS Dojo held as part of DevConf.US, OpenStack technical support engineers Madhur Gupta and Shatadru Bandyopadhyay talk about how to use machine learning for anomaly detection on OpenStack logs. Once an anomaly is detected in the logs, it can be used to automate further action, while helping in root cause analysis.

The challenge with anomaly detection in OpenStack in the first place is that it generates a significant quantity of logs, even in relatively simple production setups. How do you ingest and detect anomalies in all that data?

Let’s begin with the context of what is an anomaly in an OpenStack production log. One example Madhur and Shatadru give is a web server where usage statistics deviate from the normal baseline. These spikes or dips in CPU, memory, disk IO, network IO, etc. may not mean something bad occurred. But it is perhaps worth your attention simply because it deviates. In addition to reflecting on internal issues, anomalies may be related to externalities, showing effects from outside of your infrastructure that may be important for you to know about.

The machine learning role is to automatically analyze trends amongst anomalies, and based on that, take decision by itself to respond to an anomaly. The more consistent and rational the data source you feed the machine learning, the more accurate your anomaly detection and response becomes. That data source starts with the logs.

The approach that Madhur and Shatadru take uses an ELK Stack to give more structure to the OpenStack logs as a time series database (Elasticsearch), to collect and filter the logs (Logstash), and to preview the content before the machine learning step (Kibana). Kibana also provides visibility into the machine learning component.

This video introduces these concepts in more depth, and concludes with a demonstration of the full stack.