As AI agents become more common in the enterprise, the sheer number of available tools can overwhelm them. This article explores a practical approach based on the `Tool2Vec` methodology to create a smarter tool retrieval system, allowing even small language models to...
AI
Tool RAG: The Next Breakthrough in Scalable AI Agents
Imagine this: you're building an AI assistant that can book flights, summarize documents, analyze spreadsheets, and schedule meetings. You give it access to dozens - or even hundreds - of tools and APIs. But instead of becoming smarter, it gets confused. It picks the...
Triton Kernel Profiling with NVIDIA Nsight Tools
Are your custom Triton GPU kernels running as efficiently as they could be? Unlocking peak performance requires the right tools. This blog post is all about diving into profiling a Triton GPU kernel, with a specific focus on compute performance, using the powerful...
Intelligent inference request routing for large language models
Today's AI environment is experiencing a surge in specialized Large Language Models (LLMs), each possessing unique abilities and strengths123: Some are strong in reasoning and mathematics, while others may excel in creative writing. Yet most applications resort to a...
Enhancing AI inference security with confidential computing: A path to private data inference with proprietary LLMs
Red Hat's Office of the CTO is collaborating in the upstream project Tinfoil community to explore pioneering a complete, cloud-native solution for Confidential AI. The community is focused on solving one of the toughest AI security challenges facing the enterprise:...
A developer’s guide to PyTorch, containers, and NVIDIA – Solving the puzzle
Starting about 5 years ago, I began moving to container-based operating systems (OS). It started with Bazzite and most recently I have been using Aurora. What's not to love? These OS's make containers first-class citizens, simplifying how to "install" and run...
Understanding Triton Cache: Optimizing GPU Kernel Compilation
The goal of this blog post is to explore Triton’s caching mechanism: how it works, what affects it, how different frameworks leverage it, and how you can optimize it for your specific workloads.
Model authenticity and transparency with Sigstore
What is the Sigstore model transparency project? Sigstore’s Model Transparency project is a Sigstore community project aimed at applying the software supply chain security practice of signing to machine learning (ML) models. Hosted on Github at...
A container-first approach to Triton development
The Triton project from OpenAI is at the forefront of a groundbreaking movement to democratize AI accelerators and GPU kernel programming. It provides a powerful and flexible framework for writing high performance GPU kernels. As AI workloads become increasingly...
Getting started with PyTorch and Triton on AMD GPUs using the Red Hat Universal Base Image
In a prior blog post, we provided an overview of the Triton language and its ecosystem. Triton is a Python based DSL (Domain Specific Language), compiler and related tooling designed for writing efficient GPU kernels in a hardware-agnostic manner, offering high-level...
User experience and its importance in adoption of democratized AI
"An intuitive and accessible UI is critical to even the most powerful AI system" As artificial intelligence (AI) continues to evolve, its influence spans across industries, transforming operations and enhancing decision-making processes. At this point in time, I...
Democratizing AI Accelerators and GPU Kernel Programming using Triton
Red Hat’s Emerging Technologies blog includes posts that discuss technologies that are under active development in upstream open source communities and at Red Hat. We believe in sharing early and often the things we’re working on, but we want to note that unless...
Evaluating the performance of Large Language Models
In the field of natural language processing (NLP), large language models (LLMs) have become crucial for various applications. These models are widely adopted by enterprises, signaling a shift in how we use and gain insights from available data. However, putting these...
Optimizing development with the time to merge tool
Organizations are dedicating significant resources to produce and distribute top-quality software at a more accelerated pace due to the growing competition in the software market (source). To achieve this goal, they use practices such as Continuous Integration (CI),...
Red Hat NEXT! 2022 Session Recap
If you missed the Red Hat NEXT! event back in September, or if you just want to refresh your memory on some of the amazing content that was presented there, here's a complete listing of all of the talks. Follow the links to see the recordings on the Red Hat Community...
The Future of AI, Security, and the Edge
In recent years, “edge devices” have evolved from simple IoT sensors to autonomous drones driven by powerful artificial intelligence (AI) software. Similarly, the processes to develop and deploy AI software to “the edge” have also seen a rapid evolution. Today, data...
Examining mailing list traffic to evaluate community health
Open source software communities have many choices when it comes to modes of communication. Among those choices, mailing lists have been a long standing common choice for connecting with other members of the community. Within mailing lists, the sentiment and...
Using machine learning and analytics to help developers
It was the talk title that caught my eye - “Developer Insights: ML and Analytics on src/”. I was intrigued. I had a few ideas of how machine learning techniques could be used on source code, but I was curious to see what the state of the art looked like now. I...
Prometheus anomaly detection
With an increase in the number of applications being deployed on Red Hat OpenShift, there is a strong need for application monitoring. A number of these applications are monitored via Prometheus metrics, resulting in an accumulation of a large number of time-series...
Sentiment analysis with machine learning
When developing a new technology, it really helps if you are also a user of that new tech. This has been an approach of Red Hat around artificial intelligence and machine learning -- develop openly on one hand, exchanging knowledge across the organization to use the...
