Blog
Zero trust AI agents on Kubernetes: What I learned deploying multi-agent systems on Kagenti
AI agent content focuses on prompt engineering and framework selection. But very little addresses what happens when those agents run in production: Who they are, what they're allowed to call, and whether anyone can tell what they did. I spent 2 weeks (January 2026)...
Zero Trust for autonomous agentic AI systems: Building more secure foundations
AI systems are no longer just single-purpose models. With the rise of agentic AI, software systems designed to carry out complex tasks and solve problems with limited human supervision. It's a step beyond generative AI, which creates content, to an AI that does...
From hand-tuned to generated: A reproducible Triton GPU kernel benchmark across different vendors
In the world of Large Language Models (LLMs), speed is very important. Much of this speed comes from highly specialized functions called GPU kernels which are small, focused routines that instruct the GPU how to perform calculations with the maximum efficiency....
Protecting Triton kernel deployments with cryptographic signatures
Triton is a domain-specific language and compiler for writing high-performance GPU kernels (snippets of compiled GPU code) using a Python-like syntax. It offers fine-grained control over memory and parallelism, making it ideal for custom, architecture-optimized...
Skip the JITters: Fast, trusted model kernels with OCI caching
Triton is a domain-specific language and compiler for writing high-performance GPU kernels in Python. It offers fine-grained control over memory and parallelism, making it ideal for custom, architecture-optimized compute in machine language and high-performance...
Architecting Cloud-Native Ambient Agents: Patterns for Scale and Control
Moving AI from interactive chatbots to autonomous "ambient" agents requires a fundamental shift in system architecture. This article examines the technical implementation of agents that operate asynchronously within an enterprise environment. We detail a practical...
Simplifying Edge AI Builds with Verified GitHub Actions Patterns
As the ecosystem and economy around AI continues to grow and the Internet of Things (IoT) grows smarter and more prolific, a new paradigm of computing is emerging: edge AI. That is, the application of AI technologies to advanced IoT systems. This has all sorts of...
A Practical Approach to Smart Tool Retrieval for Enterprise AI Agents
As AI agents become more common in the enterprise, the sheer number of available tools can overwhelm them. This article explores a practical approach based on the `Tool2Vec` methodology to create a smarter tool retrieval system, allowing even small language models to...
Tool RAG: The Next Breakthrough in Scalable AI Agents
Imagine this: you're building an AI assistant that can book flights, summarize documents, analyze spreadsheets, and schedule meetings. You give it access to dozens - or even hundreds - of tools and APIs. But instead of becoming smarter, it gets confused. It picks the...
Triton Kernel Profiling with NVIDIA Nsight Tools
Are your custom Triton GPU kernels running as efficiently as they could be? Unlocking peak performance requires the right tools. This blog post is all about diving into profiling a Triton GPU kernel, with a specific focus on compute performance, using the powerful...
