AI

Eliminating the ‘Rego tax’: How AI orchestrators automate Kubernetes compliance

by Anamika Valappil, Alekhya Koppineni | Mar 20, 2026 | AI, Trust

Manually writing OPA Rego policies is a significant bottleneck for many platform teams, creating a 'Rego tax' that can slow down development and introduce risk. This article introduces a new approach: a Dynamic Kubernetes Policy Generator that uses a large language...

Zero trust AI agents on Kubernetes: What I learned deploying multi-agent systems on Kagenti

by Roy Belio | Mar 5, 2026 | AI, Trust

AI agent content focuses on prompt engineering and framework selection. But very little addresses what happens when those agents run in production: Who they are, what they're allowed to call, and whether anyone can tell what they did. I spent 2 weeks (January 2026)...

Zero Trust for autonomous agentic AI systems: Building more secure foundations

by Parul Singh | Feb 26, 2026 | AI, Trust

AI systems are no longer just single-purpose models. With the rise of agentic AI, software systems designed to carry out complex tasks and solve problems with limited human supervision. It's a step beyond generative AI, which creates content, to an AI that does...

From hand-tuned to generated: A reproducible Triton GPU kernel benchmark across different vendors

by Alessandro Sangiorgi, Liron Kesem | Feb 12, 2026 | AI

In the world of Large Language Models (LLMs), speed is very important. Much of this speed comes from highly specialized functions called GPU kernels which are small, focused routines that instruct the GPU how to perform calculations with the maximum efficiency....

Protecting Triton kernel deployments with cryptographic signatures

by Anton Ivanov, Maryam Tahhan | Feb 5, 2026 | AI

Triton is a domain-specific language and compiler for writing high-performance GPU kernels (snippets of compiled GPU code) using a Python-like syntax. It offers fine-grained control over memory and parallelism, making it ideal for custom, architecture-optimized...

Skip the JITters: Fast, trusted model kernels with OCI caching

by Maryam Tahhan | Jan 29, 2026 | AI

Triton is a domain-specific language and compiler for writing high-performance GPU kernels in Python. It offers fine-grained control over memory and parallelism, making it ideal for custom, architecture-optimized compute in machine language and high-performance...

Architecting Cloud-Native Ambient Agents: Patterns for Scale and Control

by Yu An, Kevin Cogan, Shrey Anand | Jan 21, 2026 | AI

Moving AI from interactive chatbots to autonomous "ambient" agents requires a fundamental shift in system architecture. This article examines the technical implementation of agents that operate asynchronously within an enterprise environment. We detail a practical...

Simplifying Edge AI Builds with Verified GitHub Actions Patterns

by Vance Raiti, Nick Cao | Dec 12, 2025 | AI

As the ecosystem and economy around AI continues to grow and the Internet of Things (IoT) grows smarter and more prolific, a new paradigm of computing is emerging: edge AI. That is, the application of AI technologies to advanced IoT systems. This has all sorts of...

A Practical Approach to Smart Tool Retrieval for Enterprise AI Agents

by Eoghan O'Connor, Kevin Cogan | Dec 5, 2025 | AI

As AI agents become more common in the enterprise, the sheer number of available tools can overwhelm them. This article explores a practical approach based on the `Tool2Vec` methodology to create a smarter tool retrieval system, allowing even small language models to...

Tool RAG: The Next Breakthrough in Scalable AI Agents

by Ilya Kolchinsky | Nov 26, 2025 | AI

Imagine this: you're building an AI assistant that can book flights, summarize documents, analyze spreadsheets, and schedule meetings. You give it access to dozens - or even hundreds - of tools and APIs. But instead of becoming smarter, it gets confused. It picks the...

Triton Kernel Profiling with NVIDIA Nsight Tools

by Joseph Groenenboom, Craig Magina | Nov 19, 2025 | AI

Are your custom Triton GPU kernels running as efficiently as they could be? Unlocking peak performance requires the right tools. This blog post is all about diving into profiling a Triton GPU kernel, with a specific focus on compute performance, using the powerful...

Intelligent inference request routing for large language models

by Ron Haberman, Huamin Chen, Yossi Ovadia, Ricardo Noriega De Soto, Andre Fredette, David Brewster | Nov 11, 2025 | AI

Today's AI environment is experiencing a surge in specialized Large Language Models (LLMs), each possessing unique abilities and strengths123: Some are strong in reasoning and mathematics, while others may excel in creative writing. Yet most applications resort to a...

Enhancing AI inference security with confidential computing: A path to private data inference with proprietary LLMs

by Ivan Font, Donald Hunter | Oct 23, 2025 | AI, Trust

Red Hat's Office of the CTO is collaborating in the upstream project Tinfoil community to explore pioneering a complete, cloud-native solution for Confidential AI. The community is focused on solving one of the toughest AI security challenges facing the enterprise:...

A developer’s guide to PyTorch, containers, and NVIDIA – Solving the puzzle

by Steven Pousty | Aug 26, 2025 | AI, Developer Productivity

Starting about 5 years ago, I began moving to container-based operating systems (OS). It started with Bazzite and most recently I have been using Aurora. What's not to love? These OS's make containers first-class citizens, simplifying how to "install" and run...

Understanding Triton Cache: Optimizing GPU Kernel Compilation

by Alessandro Sangiorgi | May 16, 2025 | AI

The goal of this blog post is to explore Triton’s caching mechanism: how it works, what affects it, how different frameworks leverage it, and how you can optimize it for your specific workloads.

Model authenticity and transparency with Sigstore

by Ivan Font | Apr 10, 2025 | AI, Trust

What is the Sigstore model transparency project? Sigstore’s Model Transparency project is a Sigstore community project aimed at applying the software supply chain security practice of signing to machine learning (ML) models. Hosted on Github at...

A container-first approach to Triton development

by Maryam Tahhan | Mar 20, 2025 | AI

The Triton project from OpenAI is at the forefront of a groundbreaking movement to democratize AI accelerators and GPU kernel programming. It provides a powerful and flexible framework for writing high performance GPU kernels. As AI workloads become increasingly...

Getting started with PyTorch and Triton on AMD GPUs using the Red Hat Universal Base Image

by Sanjeev Rampal, Steven Royer | Dec 17, 2024 | AI, Hybrid Cloud

In a prior blog post, we provided an overview of the Triton language and its ecosystem. Triton is a Python based DSL (Domain Specific Language), compiler and related tooling designed for writing efficient GPU kernels in a hardware-agnostic manner, offering high-level...

User experience and its importance in adoption of democratized AI

by Anil Vishnoi, Brent Salisbury, Ryan Cook | Dec 10, 2024 | AI

"An intuitive and accessible UI is critical to even the most powerful AI system" As artificial intelligence (AI) continues to evolve, its influence spans across industries, transforming operations and enhancing decision-making processes. At this point in time, I...

Democratizing AI Accelerators and GPU Kernel Programming using Triton

by Sanjeev Rampal | Nov 7, 2024 | AI, Hybrid Cloud

Red Hat’s Emerging Technologies blog includes posts that discuss technologies that are under active development in upstream open source communities and at Red Hat. We believe in sharing early and often the things we’re working on, but we want to note that unless...

« Older Entries

Explore