Zero trust AI agents on Kubernetes: What I learned deploying multi-agent systems on Kagenti

by Roy Belio | Mar 5, 2026 | AI, Trust

AI agent content focuses on prompt engineering and framework selection. But very little addresses what happens when those agents run in production: Who they are, what they’re allowed to call, and whether anyone can tell what they did.

I spent 2 weeks (January 2026) deploying multi-agent AI systems on Kagenti v0.2.0-alpha.19, a Kubernetes-based control plane for AI agents built by Red Hat. I tested 2 frameworks, Goose (Block) and BeeAI v0.1.70 (IBM), across 3 agent roles: Coordinator, worker, and critic.

This is the second article in a bonus series on Zero Trust and AI. For more on this topic, check out Zero Trust for autonomous agentic AI systems: Building more secure foundations.

Note: Red Hat’s Emerging Technologies blog includes posts that discuss technologies that are under active development in upstream open source communities and at Red Hat. We believe in sharing early and often the things we’re working on, but we want to note that unless otherwise stated the technologies and how-tos shared here aren’t part of supported products, nor promised to be in the future.

AI agents are network services, so treat them like it

An AI agent running in Kubernetes is an HTTP service. It receives requests, calls large language model (LLM) APIs, invokes tools, and returns responses. It has the same needs as a microservice: identity, authorization, encrypted transport, and audit trails.

The A2A (Agent-to-Agent) protocol, now backed by more than 150 organizations through the Linux Foundation, standardizes how agents communicate via HTTPS using JSON-RPC. Each agent exposes an Agent Card for capability discovery and handles tasks through a defined lifecycle. The protocol makes agents discoverable, but also makes them targets.

Kagenti wraps the protocol in a Kubernetes-native deployment model: a Component custom resource definition (CRD) defines your agent, and the platform injects identity and registration sidecars automatically. No manual certificate management or static API keys required.

Workload identity, not API keys

GitHub’s State of Secrets Sprawl 2025 found that 70% of leaked secrets remain active 2 years after exposure. For AI agents (which operate 24/7 and can spin up resources autonomously), compromised credentials are an amplified risk.

Kagenti addresses the credentials problem with SPIFFE (Secure Production Identity Framework For Everyone). Each agent pod receives a cryptographic workload identity:

spiffe://localtest.me/ns/beeai-team/sa/beeai-coordinator

Identity injection happens automatically. When you create a Component CRD, the pod gets 2 sidecars without configuration:

spiffe-helper: Fetches and rotates X.509 SPIFFE Verifiable Identity Documents (SVIDs) from the SPIFFE Runtime Environment (SPIRE) server
kagenti-client-registration: Registers the agent as an OAuth2 client in Keycloak, allowing token exchange with short-lived, scoped tokens

No static secrets are stored in ConfigMaps and long-lived API keys are no longer passed as environment variables. Instead, the agent proves its identity with a certificate that’s automatically rotated.

mTLS without sidecar overhead

Istio Ambient mesh and ztunnel

Traditional service mesh deploys an Envoy sidecar proxy alongside each pod. For AI agents running LLMs, that’s wasted memory and CPU competing with the inference workload.

Kagenti uses Istio Ambient mesh, which replaces per-pod sidecars with a shared, per-node ztunnel proxy written in Rust. The ztunnel handles L4 mutual TLS (mTLS) enforcement (encrypting pod-to-pod traffic and verifying workload identities) without injecting anything into the agent pod itself.

The authorization policies are straightforward YAML:

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: kagenti-access
  namespace: beeai-team
spec:
  rules:
  - from:
    - source:
        namespaces: [kagenti-system, gateway-system, istio-system]
---
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: peer-agents
  namespace: beeai-team
spec:
  rules:
  - from:
    - source:
        namespaces: [beeai-team]
---
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: mtls-strict
  namespace: beeai-team
spec:
  mtls:
    mode: STRICT

The result: agents in beeai-team can call each other. The Kagenti control plane (kagenti-system) can reach them. Traffic from other namespaces gets blocked.

I validated the enforcement by running a curl pod in an unauthorized namespace:

Source Namespace	Target	Result
beeai-team	beeai-team	Allowed
kagenti-system	beeai-team	Allowed
rbelio-test (unauthorized)	beeai-team	HTTP 000 (connection reset)

HTTP 000 (not 403) confirms L4 enforcement. The ztunnel resets the TCP connection before an HTTP exchange happens. The unauthorized caller never reaches the agent.

The L4 vs L7 gotcha

One thing that tripped me up: ztunnel enforces L4 policies, not L7. HTTP method restrictions (like limiting agents to GET and POST) require L7 waypoint proxies:

# This is IGNORED by ztunnel
spec:
  rules:
  - to:
    - operation:
        methods: ["GET", "POST"]  # Requires waypoint proxy

If you need HTTP-level controls (method filtering, path-based routing, header inspection), you need to deploy waypoint proxies for those specific services. For my proof of concept, namespace-level L4 isolation was sufficient.

Framework choice is a security decision

325 lines vs 40 lines

I started with Goose, a command-line AI agent from Block. Goose has no built-in A2A server, so I wrote a custom wrapper, 325 lines of Python, to bridge the gap:

# goose-a2a-wrapper/a2a_wrapper.py (325 lines)

# Manual session persistence
def load_session(session_id: str) -> list[dict]:
    session_file = SESSION_DIR / f"{session_id}.json"
    if session_file.exists():
        try:
            return json.loads(session_file.read_text())
        except (json.JSONDecodeError, OSError):
            return []
    return []

# Subprocess spawning for each request
def run_goose_agent(prompt, session_id=None, resume=False):
    cmd = ["goose", "run", "-t", prompt]
    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE,
                            stderr=subprocess.PIPE, text=True,
                            env={**os.environ, "GOOSE_CLI_MODE": "true"})
    stdout, stderr = proc.communicate(timeout=GOOSE_TIMEOUT)
    return stdout, stderr, proc.returncode == 0

# Manual SSE streaming for A2A protocol
async def stream_goose_response(task_id, context_id, request_id,
                                prompt, session_id, messages):
    yield {"data": json.dumps({"jsonrpc": "2.0", "id": request_id,
           "result": {"kind": "task", "id": task_id,
           "status": {"state": "working"}}})}

# ... 80 more lines of SSE event construction

Then I switched to BeeAI, IBM’s A2A-native framework. The entire agent, including OpenTelemetry (OTEL) instrumentation, fits in 40 lines:

# beeai-agent/agent.py (40 lines, full file)

from beeai_framework.adapters.a2a import A2AServer, A2AServerConfig
from beeai_framework.agents.requirement import RequirementAgent
from beeai_framework.backend import ChatModel
from beeai_framework.memory import UnconstrainedMemory
from beeai_framework.serve.utils import LRUMemoryManager

def main():
    setup_otel()  # 7-line function, omitted for brevity
    model_name = os.getenv("OPENAI_CHAT_MODEL",
                           "RedHatAI/Qwen3-Next-80B-A3B-Instruct-FP8")
    llm = ChatModel.from_name(f"openai:{model_name}")
    agent = RequirementAgent(llm=llm, tools=[], memory=UnconstrainedMemory())
    A2AServer(
        config=A2AServerConfig(port=PORT, protocol="jsonrpc"),
        memory_manager=LRUMemoryManager(maxsize=SESSION_MAXSIZE)
    ).register(agent, send_trajectory=True).serve()

87% less code as the A2A server, Server-Sent Events (SSE) streaming, session management, and Agent Card generation are handled by the framework.

Less code, fewer potential vulnerabilities

The Goose wrapper introduced 3 categories of attack surface that the BeeAI agent avoids:

Subprocess spawning (subprocess.Popen with shell commands), which creates injection risk if prompt content reaches the command line
File-based session storage (SESSION_DIR / f"{session_id}.json"), which creates path traversal risk with no access control on session data
Manual SSE construction (hand-built JSON-RPC events), where malformed events can violate protocol contracts

The BeeAI agent has none of these. Session management uses in-memory caching and the A2A server handles protocol correctness, all with no subprocesses.

The lesson extends beyond Goose and BeeAI: Command-line tools designed for local developer use are not designed for Kubernetes. They lack the server primitives (HTTP handlers, session stores, streaming protocols) that production deployment requires, forcing you to build and defend those primitives yourself.

Observability as a security control

Traces serve as audit trails. When an AI agent makes a decision, you need to know what it received, what it called, and what it returned. Kagenti ships Phoenix (an LLM observability tool) with an OTEL collector for distributed tracing.

Getting traces to appear required debugging 2 silent failures:

Port mismatch: My agents sent traces via HTTP to port 4318. The OTEL collector was configured for gRPC on port 4317. Traces were silently dropped Without an error or any warning. The fix was to point agents to the gRPC endpoint.

- name: OTEL_EXPORTER_OTLP_ENDPOINT
  value: http://otel-collector.kagenti-system.svc.cluster.local:4317

Filter mismatch: The OTEL collector’s openinference processor filtered out spans that weren’t tagged with OpenInference semantic conventions, leading to my custom spans being silently discarded. The fix required modifying the collector’s base.yaml configuration.

Both failures were silent without logs or error messages. Traces also did not appear. In a security context, silent observability failures are blind spots as you cannot audit what you cannot observe.

What’s still missing from the observability stack: Token use metrics, cost tracking per agent, and Prometheus-compatible metrics endpoints. You get traces in Phoenix, but no way to set alerts on agent behavior or costs.

What’s still missing

Kagenti is production-ready for single-agent deployments with SPIFFE identity, mTLS enforcement, and namespace-level authorization, but multi-agent orchestration remains unsolved. To be more precise, I hit 3 gaps:

No agent discovery: Agents can’t find peers dynamically. I hardcoded peer URLs as environment variables in the Component CRD:

- name: WORKER_URL
  value: http://beeai-worker.beeai-team.svc.cluster.local:9999
- name: CRITIC_URL
  value: http://beeai-critic.beeai-team.svc.cluster.local:9999

An agent registry API (GET /agents?namespace=X) or DNS-based discovery would make multi-agent systems possible without URL management.

CRD updates don’t reconcile: Changing an imageTag in a Component CRD doesn’t update the managed Deployment. The workaround is kubectl set image deployment/... directly. The controller needs reconciliation logic to watch for CRD spec changes.

Multi-agent orchestration is manual: Deploying 3 agents individually works. Making them discover each other, sequence tasks, and handle failures across a workflow is on you. There is no built-in workflow engine (like Argo Workflows) or task graph support yet.

Update: Since this trial ran on v0.2.0-alpha.19, the Kagenti team has shipped v0.2.0-alpha.21 with notable changes: the legacy Component/Agent CRD was deprecated in favor of standard Kubernetes Deployments, a new AgentCard CRD with targetRef now provides Kubernetes-native agent discovery, A2A Agent Card signature verification was added via SPIRE x5c signing, and env var updates now propagate to running pods without restart.

3 takeaways

AI agents need workload identity, not API keys. SPIFFE gives each agent a cryptographic identity that’s automatically rotated and scoped to its namespace and service account. When leaked secrets can persist for years, eliminating static credentials from agent deployments is not optional.

Framework choice is a security decision. An A2A-native framework like BeeAI reduced my code from 325 lines to 40, an 87% reduction. Each line of custom wrapper code is an attack surface you own. Pick frameworks that handle the server primitives so you don’t have to.

Observability is a security control. Silent OTEL failures (wrong ports, wrong filters) create blind spots in your audit trail. Verify your trace pipeline end-to-end before trusting it and, If traces aren’t appearing, your audit trail has gaps.

FAQ

What is Kagenti? Kagenti is a Kubernetes-based control plane for AI agents, developed as a Red Hat incubation project. It provides a Component CRD for deploying agents, automatic SPIFFE identity injection, Istio Ambient mesh integration for mTLS, and a Phoenix-based observability stack. It works with agent frameworks that support the A2A protocol.

Do I need Istio Ambient mesh for AI agents? Not specifically, but you need some form of mTLS and workload identity in production. Istio Ambient is a good fit for AI workloads because its ztunnel proxy runs per-node rather than per-pod, avoiding the memory overhead of sidecar proxies alongside resource-heavy LLM inference containers.

How does SPIFFE differ from Kubernetes service account tokens? Kubernetes service account tokens are namespace-scoped and long-lived by default. SPIFFE IDs are cryptographic identities backed by short-lived X.509 certificates that are automatically rotated. SPIFFE also works across clusters and cloud providers. The identity follows the workload, not the Kubernetes cluster.

Can I use frameworks other than BeeAI with Kagenti? Yes. Kagenti’s Component CRD is framework-agnostic. It deploys container images that expose an A2A-compatible HTTP endpoint. I tested both Goose and BeeAI. Frameworks with A2A support (or a custom wrapper) will work. The security primitives (SPIFFE, mTLS, authorization policies) apply regardless of framework.

Roy Belio is a member of the AI Catalyst Team at Red Hat.

Links:

Roy Belio

view posts