It was the talk title that caught my eye – “Developer Insights: ML and Analytics on src/”. I was intrigued. I had a few ideas of how machine learning techniques could be used on source code, but I was curious to see what the state of the art looked like now. I attended the session at DevConf.cz 2020 by Christoph Görn and Francesco Murdaca of the AI and ML Center of Excellence in Red Hat to hear more.
The first question I had was “where did they come up with the project name Thoth?” My initial guess was that “Thoth” was an ice moon from the Star Wars universe, or maybe a demon from Buffy the Vampire Slayer. It turns out that Thoth is the Ancient Egyptian god of writing, magic, wisdom, and the moon. The Egyptian deity theme runs through the project, with components called Thamos, Kebechet, Amun, and Nepthys, among others.
The set of problems that Thoth aims to solve is an important one. Can we help developers identify the best library to use, by looking at what everyone else is using for a similar job? Can we help identify the source of common performance issues, and suggest speed-ups? Can we create a framework that can enforce compliance, and help minimize risk, as applications grow?
As the IT world develops, there are more and more choices available to developers. For cloud application developers, there are various languages, application frameworks, tools, and test frameworks to choose from. The number of options can become overwhelming. It would be great if you could offload some of those choices to a computer brain that could do some of the mundane work, identify patterns and drawbacks learned by examining millions of lines of other peoples’ code, and give you some actionable advice on your own code.
This is the main idea behind Thoth. The goals for Thoth are to deliver optimized AI stacks as container images that can be used by any developer, to prototype AI-backed guidance for developers, and to augment CI and CD pipelines with AI-backed analytics, to assist in common maintenance tasks.
At its heart, Thoth creates a “knowledge graph” describing the build-time and run-time environment of an application, including dependencies, performance metrics, application binary interfaces exposed by the application, open security issues (CVEs) against the project’s dependencies, and meta-information about each dependency by the project. Building on this knowledge graph, the Thoth recommendation engine (adviser) can recommend specific versions of dependencies to simplify installation, increase security, or improve performance.
As it comes from the data science and machine learning world, the initial focus of Thoth is on applications in that domain. It supports the Python ecosystem, and provides integration with common data science tools like Jupyter Notebook. The framework is very extensible, however, and the team is open to supporting other language ecosystems and dependency stacks in the future.
Eventually, Thoth is planned to be able to identify common errors as the developer is writing them, and propose alternative approaches on the fly. The plan is to integrate Thoth deeply with the Continuous Deployment system to identify build-time and run-time issues related to container images, allowing the operations team to flag and fix security issues quickly, without impacting production.
Imagine a world where your IDE can tell you, as you are writing code, how you can improve it, or when it seems likely that you have made a common mistake. Imagine performance issues being identified, and fixes proposed, before you or your end users notice them. Imagine all the experience of senior developers, observing community projects and developing a feeling for projects going the right way or slowly fading away: identifying early warning signs, and looking out for an alternative project. Imagine an OpenShift deployment reconfiguring BuildConfigs on its own and creating new Builder Images on its own, because an OpenShift Operator knows what ABI the application stack needs, and what container image provides this ABI. In the future, the Thoth project aspires to make all these things possible.