Open source software communities have many choices when it comes to modes of communication. Among those choices, mailing lists have been a long standing common choice for connecting with other members of the community. Within mailing lists, the sentiment and communication style can give a good insight into the health of the community. The interactions can become a deciding factor for new and diverse members considering becoming active in the community.
As the focus of diversity and inclusion increases in OSS communities, I have taken on the task of using ML/AI strategies to detect hate speech and offensive language within community mailing lists. This project’s scope is starting with the Fedora devel and user mailing lists and will be transitioned into a service that will be applicable to all OSS mailing lists. In this three-part blog series, we will go through this process step by step. First, the cleaning process, second is model creation, and finishing up the series the creation of a service to be used by managers to be notified of concerning behaviors on their community’s mailing list. It is time to start using data science to help the efforts of D&I.
Continue reading “Examining mailing list traffic to evaluate community health”
It was the talk title that caught my eye – “Developer Insights: ML and Analytics on src/”. I was intrigued. I had a few ideas of how machine learning techniques could be used on source code, but I was curious to see what the state of the art looked like now. I attended the session at DevConf.cz 2020 by Christoph Görn and Francesco Murdaca of the AI and ML Center of Excellence in Red Hat to hear more.
The first question I had was “where did they come up with the project name Thoth?” My initial guess was that “Thoth” was an ice moon from the Star Wars universe, or maybe a demon from Buffy the Vampire Slayer. It turns out that Thoth is the Ancient Egyptian god of writing, magic, wisdom, and the moon. The Egyptian deity theme runs through the project, with components called Thamos, Kebechet, Amun, and Nepthys, among others.
The set of problems that Thoth aims to solve is an important one. Can we help developers identify the best library to use, by looking at what everyone else is using for a similar job? Can we help identify the source of common performance issues, and suggest speed-ups? Can we create a framework that can enforce compliance, and help minimize risk, as applications grow?
Continue reading “Using machine learning and analytics to help developers”
With an increase in the number of applications being deployed on Red Hat OpenShift, there is a strong need for application monitoring. A number of these applications are monitored via Prometheus metrics, resulting in an accumulation of a large number of time-series metrics stored in a TSDB (time series database). Some of these metrics can have anomalous values, which may indicate issues in the application, but it is difficult to identify them manually. To address this issue, we came up with an AI-based approach of training a machine-learning model on these metrics for detecting anomalies.
Continue reading “Prometheus anomaly detection”
When developing a new technology, it really helps if you are also a user of that new tech. This has been an approach of Red Hat around artificial intelligence and machine learning — develop openly on one hand, exchanging knowledge across the organization to use the same tools in the other hand to work on interesting business problems. All while keeping a two-way exchange to and from the open source commons.
This is the sort of left-hand/right-hand move that data scientist Oindrilla Chatterjee began using as part of a project she originally started during an internship, then later in a full-time role at Red Hat. Chatterjee and her team are looking at how to do sentiment analysis using machine learning on a dataset consisting of customer and partner surveys regarding a service offering.
Continue reading “Sentiment analysis with machine learning”
A well-known tactic for figuring out how to identify the root cause of a problem that has caused an outage in a production environment is to go back and see what the environment has been doing so far. Through the analysis of logs, developers and operators alike can determine usage information that ideally reveal what’s wrong with a given application or how it can be improved to work better.
In the early days of logging, there wasn’t a great deal of activity going on, so it was possible for a human being (or two) to examine such logs and figure out what was up. It didn’t hurt that the logs were not only sparse in content, but also not terribly complicated in terms of what they reported. Alerts such as “Help, my processor is melting” really didn’t take a lot to figure out how to fix. Applications now are more distributed and that further complicates the situation. But over time, logs got far more voluminous and more detailed in what they were reporting.
Continue reading “Diagnosing apps with AI”
The prospect of true machine learning is a tangible goal for data scientists and researchers. It has been long known that the platform on which such ML apps can run have to be fast and hyper efficient so that learning can be that much faster. This is the motivation for Red Hat engineers in the Office of the CTO who are working to optimize such an open source platform: Open Data Hub.
Open Data Hub is built on Red Hat OpenShift Container Platform, Ceph Object Storage, and Apache Kafka/Strimzi integrated into a collection of open source projects to enable a machine-learning-as-a-service platform. That’s a lot of components to be integrated, and to ensure that their contributions to Open Data Hub perform well, Red Hat engineers have taken the step of creating an Internal Data Hub within Red Hat as a proving ground and learning environment.
Continue reading “How Open Data Hub learners become the teachers”
As machine learning becomes more interesting to technology companies, it is hardly surprising that a company like Red Hat is going to approach the challenges of this aspect of artificial intelligence with an open source methodology in mind.
The immediate benefits to open source machine learning tools are plain as day to anyone familiar with how open source works: lower cost, more flexibility, no vendor lock-in… you know, the usual.
But dig a little deeper and it quickly becomes apparent that open source means more for cutting-edge software than just a faster way to get cheaper software.
Continue reading “Machine Learning with Open Source Infrastructure”
The concept of artificial intelligence, which seemed so much like science fiction a few decades ago, has made real, practical inroads in producing results that organizations can find useful. What’s making those results happen, though, isn’t esoteric pie-in-the-sky theory: it’s creating statistical models that have been trained to make decisions. And trained a lot.
Artificial intelligence itself is a term that, for now, has had less of a focus than the more results-oriented machine learning, where a computer system is given input and output data and then is directed to infer the mathematical rules that govern the transformation of that data.
“It’s like pointing a program to look at the solar system and then have it figure out the laws of motion that govern a planetary system,” explained Sanjay Arora.
Continue reading “Exploring Unsupervised Deep Learning”
Red Hat’s AI Center of Excellence and PerceptiLabs wanted a way to demonstrate a TensorFlow model to the public during the 2019 Red Hat Summit. The plan was for this model to take images as input, and then respond with the likelihood of a Red Hat fedora being in that image. Here’s what we learned during Red Hat Summit.
This application, which we called Fedora Finder Bot, would be featured during Red Hat CTO Chris Wright’s keynote, where PerceptiLabs demoed their AI platform.
Our initial solution for this objective would be a Twitter bot that receives tweets or direct messages and replies with the output from the TensorFlow model. Twitter being a public service, we felt it could make the model available to a large number of users, so that any user could just tweet to the bot with a picture and the bot would respond with the model’s output.
Continue reading “Building a Scalable TensorFlow Twitter Bot for Red Hat Summit”
(There’s a great new conference in the U.S., DevConf.US, returning in 2019 to Boston University (15 to 17 Aug). This highly-technical conference is interested in drawing a diverse group of speakers and attendees, with a specific emphasis on people who are new to speaking and tech conferences in general. Only in its second year, DevConf.US builds on the successful decade-spanning run of DevConf.CZ in Brno, CZ.
This is a session from DevConf.US 2018. The call for proposals to present at DevConf.US 2019 is now open.)
In this session from the CentOS Dojo held as part of DevConf.US, OpenStack technical support engineers Madhur Gupta and Shatadru Bandyopadhyay talk about how to use machine learning for anomaly detection on OpenStack logs. Once an anomaly is detected in the logs, it can be used to automate further action, while helping in root cause analysis.
The challenge with anomaly detection in OpenStack in the first place is that it generates a significant quantity of logs, even in relatively simple production setups. How do you ingest and detect anomalies in all that data?
Continue reading “Anomaly Detection on OpenStack Logs Using Machine Learning”