Blog
From context to dreams: architecting memory for AI agents
Have you ever felt that every conversation you have with an LLM across sessions feels like starting over from scratch? LLMs have a problem: they have the memory of a goldfish (no disrespect to goldfish intended). This article explores the solution: Agent memory. Agent...
Benchmarking AI inference on CPUs: A transparent blueprint for the enterprise
As enterprises look to optimize the total cost of ownership (TCO) of Large Language Model deployment, utilizing existing enterprise CPU infrastructure alongside GPU resources for specific inference workloads has become a strategic initiative. However, infrastructure...
Zero trust for AI agents: why delegation beats impersonation
When an AI agent acts on your behalf, how much of "you" should it become? In AI systems, agent impersonation creates security risks by granting overly broad permissions. This post introduces a delegation model using a permission intersection' pattern, ensuring agents...
Who’s really calling? Securing agent-to-agent communication
The gap between what an agent claims and what the platform can verify is a real attack surface, and it grows with every new agent you onboard. As agents increasingly discover and call each other at runtime, protocols like Agent2Agent (A2A) have introduced a useful...
Code execution with MCP: How sandboxed Python replaces tool schema bloat in AI agents
As the number of tools connected to an AI agent grows, JSON Schema definitions become a massive scaling bottleneck. Every tool carries a full schema that gets loaded into the LLM’s context window on every turn. Our tests show that replacing these schemas with a...
PyTorch Call Stack Deep Dive: Tracing Tensor Operations from Python to C++ Kernels
Eliminating the ‘Rego tax’: How AI orchestrators automate Kubernetes compliance
Manually writing OPA Rego policies is a significant bottleneck for many platform teams, creating a 'Rego tax' that can slow down development and introduce risk. This article introduces a new approach: a Dynamic Kubernetes Policy Generator that uses a large language...
Zero trust AI agents on Kubernetes: What I learned deploying multi-agent systems on Kagenti
AI agent content focuses on prompt engineering and framework selection. But very little addresses what happens when those agents run in production: Who they are, what they're allowed to call, and whether anyone can tell what they did. I spent 2 weeks (January 2026)...
Zero Trust for autonomous agentic AI systems: Building more secure foundations
AI systems are no longer just single-purpose models. With the rise of agentic AI, software systems designed to carry out complex tasks and solve problems with limited human supervision. It's a step beyond generative AI, which creates content, to an AI that does...
From hand-tuned to generated: A reproducible Triton GPU kernel benchmark across different vendors
In the world of Large Language Models (LLMs), speed is very important. Much of this speed comes from highly specialized functions called GPU kernels which are small, focused routines that instruct the GPU how to perform calculations with the maximum efficiency....
