Istio ambient mode with Red Hat OpenShift

by , | Mar 19, 2024 | Hybrid Cloud

Istio ambient refers to a new mode for the Istio service mesh, the upstream project behind Red Hat OpenShift Service Mesh. This article provides an overview and some technical analysis of this emerging technology.

Note that the ambient mode is currently in the alpha stage in the upstream Istio community and is not yet ready for production deployments. Consequently, this is not yet a supported feature of Red Hat OpenShift Service Mesh. At Red Hat, we are evaluating and contributing to this emerging technology. As this feature matures in upstream Istio, it will be supported as part of Red Hat OpenShift Service Mesh at a later date. This article is from an Emerging Technology perspective to understand the benefits and high level architecture of ambient mode and guide early experimentation with it on Red Hat OpenShift. It is assumed that you are already familiar with basic concepts of the existing Istio service mesh.

Red Hat’s Emerging Technologies blog includes posts that discuss technologies that are under active development in upstream open source communities and at Red Hat. We believe in sharing early and often the things we’re working on, but we want to note that unless otherwise stated the technologies and how-tos shared here aren’t part of supported products, nor promised to be in the future.

For early (unsupported) experimentation with Istio ambient mode on OpenShift, directly use upstream Istio version 1.21.0 or newer and install it on OpenShift version 4.12 or newer. In addition to Istio 1.21.0, an additional patch is needed prior to Istio 1.22.0 and this is described below in this article. Once Istio ambient mode is officially supported on OpenShift, the exact supported versions and details will be documented.

What is Istio ambient mode?

Istio ambient mode uses a new architecture for the data plane proxies when compared with the prior sidecar proxies based model. It is fully backward compatible and can coexist within (and interoperate with) an Istio mesh that utilizes sidecar proxies. The main difference is that now some (or all) endpoints or namespaces in an Istio mesh can be tagged to operate in ambient mode. For such pods, instead of network proxies running as sidecars within each application pod, external L4 and L7 proxies are used to provide service mesh functionality. Hence this is also referred to as a “sidecarless mesh”. 

The reference upstream implementation of the ambient mode’s L4 proxy is called the ztunnel proxy and that for the L7 proxy is called the waypoint proxy. The ztunnel proxy is deployed as a Kubernetes DaemonSet where one instance is deployed on each node of the cluster. It is written in Rust and optimized for highly performant L4 zero trust networking and mutual TLS securely shared by all ambient pods in a mesh running on that node. The waypoint proxy operates as a Kubernetes Deployment per target application service account or for a particular namespace. It is used for L7 traffic management and is implemented using Envoy proxies.

Let us review a summary of ambient mode architecture and with that understanding, in the subsequent section, understand the benefits of the ambient mode over the sidecar mode.

Architecture

We provide a brief summary of the Istio ambient architecture here, with emphasis on how the data path works alongside OpenShift’s container networking infrastructure which is based on the Open Virtual Network Kubernetes (OVN-K) CNI plugin. For more details and user guides, refer to upstream documentation, Ambient user guides.

The control plane architecture for ambient mode uses the same building blocks as current Istio, namely an istiod control plane that provisions data plane proxies and provides discovery information via xDS APIs. The main enhancement here is that additional xDS API resources are defined to enable exchange of this information with the ztunnel and waypoint proxies of ambient and these new resources are optimized to compress the amount of control plane information that needs to be exchanged when compared to the control plane information exchanged for sidecar proxies.

Figures 1 and 2 illustrate the data plane architecture of ambient mode. 

Istio ambient mode data path for L4 traffic via ztunnel proxy
Figure 1: Istio ambient mode data path for L4 traffic via ztunnel proxy

Figure 1 shows a pure L4-only application where client pods C1, C2 and C3 initiate connections to service pod S1. The ztunnel proxy on each of the source and destination nodes is responsible for initiating secure encrypted tunnels over which the application traffic is transported. The protocol used for these tunnels is called HBONE (HTTP Based Overlay Network Encapsulation) and uses the HTTP CONNECT method over http/2 on TCP port 15008.  Incidentally, this new data plane format is also now supported by Istio sidecar proxies and is used for communication between sidecar proxies and ambient proxies, thereby enabling data plane interoperability between sidecar endpoints and ambient endpoints and proxies. Note that although the figure shows the HBONE tunnel starting at the ztunnel proxy, in the actual implementation it starts inside the source and destination pods themselves using a mechanism called in-Pod traffic redirection that is briefly described later in this section.

Figure 2 shows the data path for traffic that requires both L4 and L7 traffic management functions. In this case, traffic is routed via interim waypoint proxies that may be located on any node of the cluster (and can also be co-located on nodes with the service destination pods).

Figure 2: Istio ambient mode data path for L7 traffic via ztunnel and waypoint proxies

Figure 3 shows how traffic is routed between application pods and ztunnel proxies when using ambient mode with OpenShift. OpenShift uses an Open vSwitch (OVS) based CNI plugin called OVN-Kubernetes

Figure 3: Ambient in-Pod Traffic Redirection with OpenShift & OVN-K CNI

In order to have application traffic get transparently redirected via ztunnel proxies, there is a functionality called in-pod redirection implemented in the istio-cni node agent and the ztunnel proxy. Using this redirection function, the ztunnel proxy is able to intercept incoming and outgoing traffic inside the application pod namespace in order to perform ambient mode L4 proxying functions such as L4 Authorization, m-TLS encrypt/decrypt and HBONE encap/decap operations. Iptables rules are added by the istio-cni node agent inside the pod’s network namespace to loop traffic via a logical ztunnel proxy function, dedicated for that application pod. This works even though there is only a single instance of the ztunnel proxy per worker node running as a dedicated pod with its own Linux network namespace. The exact mechanism for this is based on leveraging Linux UNIX domain sockets and having the istio-cni agent pass information about the pod’s network namespace to the ztunnel proxy which is able to perform its data path functions. Details of this mechanism are provided in the upstream documentation.

Note: Ambient mode has been available since Istio version 1.18.0. However the in-Pod redirection functionality is only available from Istio version 1.21.0 onwards and is critical when using ambient mode with the OVN-K CNI. This is why Istio version 1.21.0 or later is required when experimenting with ambient mode on OpenShift (1.22.0 or later will be required for official support).

We can see from Figure 3 that traffic is redirected/looped via the logical ztunnel proxy function even though the ztunnel proxy itself runs as a node level pod and not a per-pod sidecar. The traffic coming in and out of the pod, as seen by the OVS switch (which implements the Kubernetes CNI and NetworkPolicy data path functions in case of OVN-K), is mTLS encrypted and HBONE formatted. However, the source and destination IPs of these packets are unchanged from what the application originally intended. The only change is that the outer destination TCP port is now 15008 (and there is an additional HTTP Connect/ HBONE header within the payload). As a result, Kubernetes NetworkPolicy rules that match only on IP addresses will continue to work correctly, but rules that match on the TCP destination port may not work correctly. Such rules within Kubernetes NetworkPolicy are likely rare in the real world, so this restriction is considered acceptable. For the rare case where a Kubernetes NetworkPolicy based on TCP port is unavoidable, alternate solutions could be considered including the use of Istio Authorization policy for TCP traffic.

Benefits of Istio ambient mode over sidecar mode

With that understanding of Istio ambient mode architecture, we can now appreciate some of the 

benefits of ambient mode over the sidecar mode. These include:

  • Significantly improved resource efficiency due to multiple factors:
    • We no longer need to deploy network proxies (with full L4 and L7 capabilities) as a sidecar along with every application pod.
    • If the use case is primarily for zero trust networking, mutual TLS and L4 traffic management, then only ztunnel proxies need be deployed using one instance per node. There is no need to deploy any waypoint proxies at all, again leading to improved resource efficiency.
    • Even for L7 applications, the vertical and horizontal scale of waypoint L7 proxy deployments is now independent of the number and capacity of application pods, leading to better right sizing and resource efficiency.
  • Improved network performance including:
    • Lower end-to-end latency (L7 proxying only up to once in a connection path instead of twice).
    • New L4 proxy written in high performance Rust.
    • Improved control plane performance due to the optimized xDS api resources for ambient mode as described earlier in the architecture summary.
  • Improved security posture and features:
    • The ztunnel proxy is implemented in Rust which is a memory safe programming language in line with recent recommendations from the OWASP foundation and the White House. A key design goal of the ztunnel (literally zero trust tunnel) proxy within ambient mode is to align strongly with US Federal guidelines for Zero Trust Architecture such as NIST 800-207. Support for pluggable cryptographic modules is continued with ambient mode.
    • Improved security posture by not requiring L7 proxies when not needed thereby limiting exposure to CVEs and reducing the attack surface. Typically L7 proxies are more likely to encounter vulnerabilities than L4 proxies.
    • Removal of sidecar proxies from application pods results in greater isolation of ambient proxies from any misbehaving applications, again improving the security posture.
  • Improved application and operations experience. These include not requiring application pods to be restarted when enabling/disabling ambient mesh capability or during mesh upgrades.
  • Improved pod and application startup times due to not needing to deploy and setup proxies every time a new pod is created. This also avoids race conditions between application startup and sidecar proxy startup.
  • Modular architecture enables decoupling between L4 and L7 functions allowing for future mix and match, including use of custom L4 or L7 proxies.

In addition, ambient mode continues to share the existing benefits of Istio service mesh that are already documented as part of Istio sidecar mode documentation and are not repeated in detail in this article. These range from the use of TLS for security with high performance (when compared with use of IPSec or Wireguard for encryption), to having a unified platform for cryptographic identity-based application level networking, traffic management, security and observability. Also note that the Istio project will continue to support sidecar proxy mode in addition to ambient mode to support scenarios where it may still be beneficial to have dedicated sidecar proxies per application pod instead of node level proxies, as well as for backward compatibility. This will provide maximum flexibility to operators and developers.

Getting started with ambient on OpenShift

Istio comes with predefined configuration profiles that come in handy during the installation process on a Kubernetes cluster. One of these profiles is called “openshift,” but it lacks the ambient mode bits. To address this, a new profile named “openshift-ambient” has been introduced through the following pull request. This profile is designed specifically for installing the ambient profile on an OpenShift cluster.

The release of Istio 1.21 incorporates necessary support for the ambient mode’s in-Pod implementation, however, the openshift-ambient profile pull request could not be merged before the branch cut-off. Consequently, it is slated to be included in the 1.22 release.

Along with the new profile, the following changes have been made for supporting OpenShift:

  1. The ztunnel pods require NET_ADMIN capabilities to support ambient mode use-cases. To simplify current experimentation of ambient on OpenShift, the `openshift-ambient` profile installs both the ztunnel as well as istio-cni pods in the kube-system namespace, which has the required `privileged` SCC capabilities. The eventual official support of ambient on OpenShift will include additional flexibility including the ability to install the Istio pods (including ztunnel pods) in arbitrary namespaces as well as support for multiple control plane instances.
  2. As we are now installing ztunnel in the kube-system namespace, it requires `istio-ca-root-cert`configMap which is not created by default. The following PR enhances the istiod controller to create the configMap even in the kube-system namespace.
  3. The `ztunnel` pod is set up with an SELinux context of “spc_t” (Super Privileged Container) to provide unrestricted access to the system, avoiding any interference from SELinux.

In order to try out ambient mode on OpenShift, we can use the following steps.

1. Obtain the istioctl that includes the above changes.

$: wget -O - https://storage.googleapis.com/istio-build/dev/1.22-alpha.f1898ff2fc1406b7899dcf2409876f94b0e76dca/istioctl-1.22-alpha.f1898ff2fc1406b7899dcf2409876f94b0e76dca-linux-amd64.tar.gz | tar zxvf -

2. Execute the following command to install Istio.

./istioctl install --set hub=gcr.io/istio-testing --set tag=1.22-alpha.f1898ff2fc1406b7899dcf2409876f94b0e76dca --set profile=openshift-ambient

The above command will install istio-cni and ztunnel daemonSets in the kube-system namespace.

$: oc get pods -n kube-system
NAME                  READY    STATUS     RESTARTS        AGE
istio-cni-node-6xqn9   1/1     Running       0            12m
istio-cni-node-98vwv   1/1     Running       0            12m
istio-cni-node-bjjr8   1/1     Running       0            12m
istio-cni-node-njh8p   1/1     Running       0            12m
istio-cni-node-wf76b   1/1     Running       0            12m
ztunnel-kjgdk          1/1     Running       0            12m
ztunnel-l26xm          1/1     Running       0            12m
ztunnel-lk2qc          1/1     Running       0            12m
ztunnel-m875z          1/1     Running       0            12m
ztunnel-tdgfs          1/1     Running       0            12m

The istiod controller on the other hand will be deployed in the usual istio-system namespace.

$: oc get pods -n istio-system
NAME                                                READY    STATUS       RESTARTS       AGE
istiod-66c58f8ccb-pcxs5                             1/1      Running          0          15m

Let’s deploy the sample bookinfo application along with the sleep/notsleep pods.

$: oc apply -f samples/bookinfo/platform/kube/bookinfo.yaml
$: oc apply -f samples/sleep/sleep.yaml
$: oc apply -f samples/sleep/notsleep.yaml

Enable ambient mode for the pods in the default namespace.

$: oc label namespace default istio.io/dataplane-mode=ambient

Verify that the connectivity is working.

$: oc exec deploy/sleep -- curl -s http://productpage:9080/ | grep -o "<title>.*</title>"
$: oc exec deploy/notsleep -- curl -s http://productpage:9080/ | grep -o "<title>.*</title>"

As a quick check, you can use the following command to verify that ztunnel has successfully installed the listening sockets inside the pods in the default namespace where ambient mesh is enabled. For more details on the ports used by Istio, you can refer to the following page.

$: oc debug $(oc get pod -l app=productpage -o jsonpath='{.items[0].metadata.name}') -it --image nicolaka/netshoot -- ss -ntlp
State       Recv-Q Send-Q Local Address:Port  Peer Address:PortProcess
LISTEN     0          0          127.0.0.1:15080      0.0.0.0:*          
LISTEN     0          0                  *:9080             *:*          
LISTEN     0          0                  *:15001            *:*          
LISTEN     0          0                  *:15006            *:*          
LISTEN     0          0                  *:15008            *:*          

Let’s verify that L4 Authorization policies are properly enforced by the Istio ztunnel proxies by creating an Authpolicy that only allows traffic from the sleep pod to the productpage service.

$: oc apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: productpage-viewer
  namespace: default
spec:
  selector:
    matchLabels:
      app: productpage
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        - cluster.local/ns/default/sa/sleep
EOF

Accessing the productpage from the sleep pod should succeed.

$: oc exec deploy/sleep -- curl -s http://productpage:9080/ | grep -o "<title>.*</title>"
<title>Simple Bookstore App</title>

Whereas accessing the productpage from the notsleep pod should return an error message.

$: oc exec deploy/notsleep -- curl -s http://productpage:9080/ | grep -o "<title>.*</title>"
command terminated with exit code 56

The logs from the ztunnel pod running on the node where the productpage pod is scheduled should include RBAC errors.

2024-02-20T09:10:14.760495Z  INFO inbound{^[[3mid^[[0m^[[2m=^[[0mc1fbe9efda83e81d0672fc856193fbd1 ^[[3mpeer_ip^[[0m^[[2m=^[[0m10.131.0.25 ^[[3mpeer_id^[[0m^[[2m=^[[0mspiffe://cluster.local/ns/default/sa/notsleep}: ztunnel::proxy::inbound: got CONNECT request to 10.128.2.134:9080
2024-02-20T09:10:14.760589Z  INFO inbound{^[[3mid^[[0m^[[2m=^[[0mc1fbe9efda83e81d0672fc856193fbd1 ^[[3mpeer_ip^[[0m^[[2m=^[[0m10.131.0.25 ^[[3mpeer_id^[[0m^[[2m=^[[0mspiffe://cluster.local/ns/default/sa/notsleep}: ztunnel::proxy::inbound: RBAC rejected conn=10.131.0.25(spiffe://cluster.local/ns/default/sa/notsleep)->10.128.2.134:9080 

Coexistence with OVN-K Network Policies:

OpenShift OVN-K CNI supports Kubernetes Network Policies. The network policies are implemented in the OVS layer which is outside of the pod network namespace as shown in Figure 3 above. For an outbound traffic from the pod, the Istio L4 Auth policies would be applied first and then OVN-K network policies, but for inbound traffic, the OVN-K network policies would be applied first and then the Istio L4 auth policies.

Let’s create a network policy that allows traffic from the sleep pod to the productpage pod. In this use case both Kubernetes Network policy and Istio L4 Authpolicy are configured to allow traffic from sleep and we want to confirm that the use case works when both policies are configured.

$: oc apply -f - <<EOF
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-policy
spec:
  podSelector: 
    matchLabels:
      app: productpage
  ingress:
  - from:
    - podSelector: 
        matchLabels:
          app: sleep
EOF

Verify that we are able to access the productpage from the sleep pod.

$: oc exec deploy/sleep -- curl -s http://productpage:9080/ | grep -o "<title>.*</title>"
<title>Simple Bookstore App</title>

Now, let’s modify the Kubernetes network policy to allow traffic only from notsleep pod.

$: oc apply -f - <<EOF
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-policy
spec:
  podSelector: 
    matchLabels:
      app: productpage
  ingress:
  - from:
    - podSelector: 
        matchLabels:
          app: notsleep
EOF

Now, access to the productpage should be blocked from both sleep (due to Kubernetes Network Policy) as well as notsleep pods (due to Istio L4 Auth policies).

$: oc exec deploy/notsleep -- curl -s http://productpage:9080/ | grep -o "<title>.*</title>"
command terminated with exit code 56
$: oc exec deploy/sleep -- curl -s http://productpage:9080/ | grep -o "<title>.*</title>"
command terminated with exit code 56

Let’s delete the Istio L4 Auth policy and verify that we are now able to access the productpage from the notsleep pod.

$: oc delete authorizationpolicy productpage-viewer
authorizationpolicy.security.istio.io "productpage-viewer" deleted
$: # This is expected to fail.
$: oc exec deploy/sleep -- curl -s http://productpage:9080/ | grep -o "<title>.*</title>"
command terminated with exit code 56
$: 
$: # This should pass.
$: oc exec deploy/notsleep -- curl -s http://productpage:9080/ | grep -o "<title>.*</title>"
<title>Simple Bookstore App</title>

For more details, refer to the ambient mode getting started guide and ZTunnel user guide.

Conclusion

This article introduced some technical details of the ambient mode of the Istio service mesh and provided guidelines to assist early experimentation with it on OpenShift platform installations. Users and testers are encouraged to use these guidelines for early experimentation and provide feedback to the Istio and OpenShift communities.

As already mentioned, official support of Istio ambient on OpenShift will come in a future release of OpenShift Service Mesh, with a developer preview anticipated in the second half of 2024 with OpenShift Service Mesh 3.0’s preview releases. Learn more about OpenShift Service Mesh here.