With an increase in the number of applications being deployed on Red Hat OpenShift, there is a strong need for application monitoring. A number of these applications are monitored via Prometheus metrics, resulting in an accumulation of a large number of time-series metrics stored in a TSDB (time series database). Some of these metrics can have anomalous values, which may indicate issues in the application, but it is difficult to identify them manually. To address this issue, we came up with an AI-based approach of training a machine-learning model on these metrics for detecting anomalies.
When developing a new technology, it really helps if you are also a user of that new tech. This has been an approach of Red Hat around artificial intelligence and machine learning — develop openly on one hand, exchanging knowledge across the organization to use the same tools in the other hand to work on interesting business problems. All while keeping a two-way exchange to and from the open source commons.
This is the sort of left-hand/right-hand move that data scientist Oindrilla Chatterjee began using as part of a project she originally started during an internship, then later in a full-time role at Red Hat. Chatterjee and her team are looking at how to do sentiment analysis using machine learning on a dataset consisting of customer and partner surveys regarding a service offering.
Teams from Red Hat and NVIDIA have collaborated on creating a scalable hybrid cloud application that could revolutionize smart city initiatives such as traffic-flow monitoring and transportation management around the world. By working together, the two companies are creating solutions that make cities smarter and more efficient by taking sensor data and processing it in real-time to provide insights for traffic congestion, pedestrian flow, and infrastructure maintenance.
Running on top of the NVIDIA EGX platform with the NVIDIA GPU Operator, the application is built with NVIDIA’s Metropolis application framework for IoT that brings together innovative capabilities for real-time image processing where NVIDIA DeepStream SDK is used to extract metadata from live video streams at the edge. It then forwards the right metadata to the cloud for deeper analytical processing and further representation in an information dashboard depicted below.
Operators within Kubernetes are useful tools, designed to extend the container orchestration platform with additional resources. More directly, an Operator, sometimes referred to as custom controllers, is a method of packaging, deploying, and managing a Kubernetes application.
As useful as Operators are, they have had one limitation: originally they all had to be written in the Go programming language. Thanks to the Operator SDK, you do not need to develop your Operators in Go. The Operator SDK has options for Ansible and Helm that may be better suited for the way you or your team work. But, it can still be limiting for dev teams trying to build an operator if they don’t happen to be skilled in Helm or Ansible.
A well-known tactic for figuring out how to identify the root cause of a problem that has caused an outage in a production environment is to go back and see what the environment has been doing so far. Through the analysis of logs, developers and operators alike can determine usage information that ideally reveal what’s wrong with a given application or how it can be improved to work better.
In the early days of logging, there wasn’t a great deal of activity going on, so it was possible for a human being (or two) to examine such logs and figure out what was up. It didn’t hurt that the logs were not only sparse in content, but also not terribly complicated in terms of what they reported. Alerts such as “Help, my processor is melting” really didn’t take a lot to figure out how to fix. Applications now are more distributed and that further complicates the situation. But over time, logs got far more voluminous and more detailed in what they were reporting.
Quick, name some weird stuff that’s happened to your production machines.
Accidentally dropping a production database table? Rolling out a patch that enabled any user to log in with any password? Disabling a load balancer? Using a dictionary to physically keep keyboard keys depressed so “terminals [could] repeatedly [hit] the enter key in order for the logins and print jobs of about 40,000 people to work”?
It’s happened to Alex Corvin, a senior engineer at Red Hat. Well, not that last one. But Corvin has been around long enough in his career to have met Mr. Murphy and his Law: if it can go wrong, it will.
The prospect of true machine learning is a tangible goal for data scientists and researchers. It has been long known that the platform on which such ML apps can run have to be fast and hyper efficient so that learning can be that much faster. This is the motivation for Red Hat engineers in the Office of the CTO who are working to optimize such an open source platform: Open Data Hub.
Open Data Hub is built on Red Hat OpenShift Container Platform, Ceph Object Storage, and Apache Kafka/Strimzi integrated into a collection of open source projects to enable a machine-learning-as-a-service platform. That’s a lot of components to be integrated, and to ensure that their contributions to Open Data Hub perform well, Red Hat engineers have taken the step of creating an Internal Data Hub within Red Hat as a proving ground and learning environment.
When you run a workload as a VM, container or in a serverless environment, that workload is vulnerable to interference by any person or software with hypervisor, root or kernel access. Enarx, a new open source project, aims to make it simple to deploy workloads to a variety of trusted execution environments (TEEs) in the public cloud, on your premises or elsewhere, and to ensure that your application workload is as secure as possible.
When you run your workloads in the cloud, there are no technical barriers to prevent the cloud providers–or their employees–from looking into your workloads, peeking into the data, or even changing the running process. That’s because when you run a workload as a VM, container or serverless, the way that these are implemented means that a person or software entity with sufficient access can interfere with any process running on that machine.
As machine learning becomes more interesting to technology companies, it is hardly surprising that a company like Red Hat is going to approach the challenges of this aspect of artificial intelligence with an open source methodology in mind.
The immediate benefits to open source machine learning tools are plain as day to anyone familiar with how open source works: lower cost, more flexibility, no vendor lock-in… you know, the usual.
But dig a little deeper and it quickly becomes apparent that open source means more for cutting-edge software than just a faster way to get cheaper software.
The concept of artificial intelligence, which seemed so much like science fiction a few decades ago, has made real, practical inroads in producing results that organizations can find useful. What’s making those results happen, though, isn’t esoteric pie-in-the-sky theory: it’s creating statistical models that have been trained to make decisions. And trained a lot.
Artificial intelligence itself is a term that, for now, has had less of a focus than the more results-oriented machine learning, where a computer system is given input and output data and then is directed to infer the mathematical rules that govern the transformation of that data.
“It’s like pointing a program to look at the solar system and then have it figure out the laws of motion that govern a planetary system,” explained Sanjay Arora.