Blog
Simplifying Edge AI Builds with Verified GitHub Actions Patterns
As the ecosystem and economy around AI continues to grow and the Internet of Things (IoT) grows smarter and more prolific, a new paradigm of computing is emerging: edge AI. That is, the application of AI technologies to advanced IoT systems. This has all sorts of...
A Practical Approach to Smart Tool Retrieval for Enterprise AI Agents
As AI agents become more common in the enterprise, the sheer number of available tools can overwhelm them. This article explores a practical approach based on the `Tool2Vec` methodology to create a smarter tool retrieval system, allowing even small language models to...
Tool RAG: The Next Breakthrough in Scalable AI Agents
Imagine this: you're building an AI assistant that can book flights, summarize documents, analyze spreadsheets, and schedule meetings. You give it access to dozens - or even hundreds - of tools and APIs. But instead of becoming smarter, it gets confused. It picks the...
Triton Kernel Profiling with NVIDIA Nsight Tools
Are your custom Triton GPU kernels running as efficiently as they could be? Unlocking peak performance requires the right tools. This blog post is all about diving into profiling a Triton GPU kernel, with a specific focus on compute performance, using the powerful...
Intelligent inference request routing for large language models
Today's AI environment is experiencing a surge in specialized Large Language Models (LLMs), each possessing unique abilities and strengths123: Some are strong in reasoning and mathematics, while others may excel in creative writing. Yet most applications resort to a...
Enhancing AI inference security with confidential computing: A path to private data inference with proprietary LLMs
Red Hat's Office of the CTO is collaborating in the upstream project Tinfoil community to explore pioneering a complete, cloud-native solution for Confidential AI. The community is focused on solving one of the toughest AI security challenges facing the enterprise:...
A developer’s guide to PyTorch, containers, and NVIDIA – Solving the puzzle
Starting about 5 years ago, I began moving to container-based operating systems (OS). It started with Bazzite and most recently I have been using Aurora. What's not to love? These OS's make containers first-class citizens, simplifying how to "install" and run...
Understanding Triton Cache: Optimizing GPU Kernel Compilation
The goal of this blog post is to explore Triton’s caching mechanism: how it works, what affects it, how different frameworks leverage it, and how you can optimize it for your specific workloads.
Model authenticity and transparency with Sigstore
What is the Sigstore model transparency project? Sigstore’s Model Transparency project is a Sigstore community project aimed at applying the software supply chain security practice of signing to machine learning (ML) models. Hosted on Github at...
A container-first approach to Triton development
The Triton project from OpenAI is at the forefront of a groundbreaking movement to democratize AI accelerators and GPU kernel programming. It provides a powerful and flexible framework for writing high performance GPU kernels. As AI workloads become increasingly...
