Using eBPF in unprivileged Pods

by , , | Jul 18, 2023 | Hybrid Cloud

Extended Berkeley Packet Filter (eBPF) presents an attractive technology that Kubernetes applications can take advantage of, either to accelerate their packet processing needs (as an in kernel Fast Path) or as part of various monitoring and telemetry projects. However, utilizing eBPF in these applications may require escalating pod privileges to CAP_SYS_ADMIN or CAP_SYS_BPF level, which can compromise security. This article aims to demonstrate how to use eBPF object pinning to utilize eBPF in unprivileged Pods.

Note: Red Hat’s Emerging Technologies blog includes posts that discuss technologies that are under active development in upstream open source communities and at Red Hat. We believe in sharing early and often the things we’re working on, but we want to note that unless otherwise stated the technologies and how-tos shared here aren’t part of supported products, nor promised to be in the future.

How can you use eBPF with an unprivileged Pod?

To answer this question consider the following operations involved in utilizing BPF in an application:

OperationPrivileged/Unprivileged
Loading of the eBPF programPrivileged (SYS_CAP_BPF)
Attaching a eBPF programPrivileged (SYS_CAP_BPF)
Running the eBPF programN/A – once the program is attached at the right hook it will execute when triggered
Interacting with eBPF maps (from user space)It depends on:
– How the map is shared
– The kernel version (>5.18)
– The kernel.unprivileged_bpf_disabled setting
Unloading the eBPF programPrivileged (SYS_CAP_BPF)

Since some of these operations require privileged access, there’s an opportunity to separate privileged operations into a control or privileged process and complete other operations in an unprivileged process. In a Kubernetes scenario, this control process could run in a DaemonSet on a node and share information with unprivileged Pods to utilize eBPF programs. One option for such a control process is bpfd

bpfd is a system daemon for managing eBPF programs. Along with its accompanying Kubernetes operator, it seeks to solve the following problems:

  • To allow multiple XDP programs to share the same interface
  • To give administrators control over who can load programs and to allow them to define rules for ordering of networking eBPF programs
  • To allow programs to be loaded automatically at system launch time
  • To simplify the packaging and loading of eBPF-based infrastructure software (i.e Kubernetes CNI plugins)

A control process, like bpfd, can share eBPF objects with unprivileged user space processes by either:

  1. Passing the file descriptor for that object through a UNIX Domain Socket (UDS) [1]. However, it’s important to note that in this situation the control process needs to be active (alive) until the transfer is complete. It essentially involves making sure that the BPF object persists by ensuring that some process keeps the file descriptor. An example of this approach is detailed by CNDP.
  2. Exporting/Pinning the object to a eBPF File System (BPFFS) [2]. Note that this filesystem by default is mounted at /sys/fs/bpf and mounting this location into an unprivileged Pod requires that either you relax the default Pod security policy for unprivileged Pods, or use privileged Pods. Once the Pod has access to the BPFFS it can retrieve the file descriptor for the object. Alternatively, you can create this filesystem in a custom path other than /sys/fs/bpf which allows it to be mounted into an unprivileged Pod and access to it can be controlled through standard file permissions. This approach is the focus of this article.

Creating a eBPF file system

A eBPF File system can be created using the following command:

$ mount bpffs /var/run/example/map/ -t bpf

It can then be shared using the following command:

$ mount --make-shared /var/run/example/map/

eBPF programs/objects (such as a map) can then be pinned to this BPFFS, which in turn can be mounted in the unprivileged Pod.

Creating the BPFFS and configuring a Pod to mount that path can be automated through a DaemonSet (such as bpfd) as demonstrated by the AF_XDP device plugin. Future plans to integrate this with bpfd are in flight.

Accessing a pinned eBPF object from an unprivileged process

The following code snippet shows how to retrieve the pinned object FD using `bpf_obj_get()` in the unprivileged process. In this case the pinned object is an xsk_map:

int fd;
const char *file = "/var/run/example/map/xsk_map";

fd = bpf_obj_get(file);
if (fd < 0)
    printf("Couldn't get fd %s\n", strerror(errno));
else
    printf(“bpf: get fd:%d\n", fd);

Accessing a pinned eBPF object from an unprivileged Pod

If an unprivileged Pod would like to run the code shown in the previous section, it needs to add a hostpath entry to its Pod spec as shown below:

apiVersion: v1
kind: Pod
metadata:
 name: bpfmap-pod          
spec:
 nodeSelector:
   bpfexample: "true"
 containers:
 - name: bpfmap-get
   image: bpfmap:latest 
   imagePullPolicy: IfNotPresent
   command: ["./tests/get_bpf_map.py"]
   volumeMounts:
     - mountPath: /var/run/example/map/
       name: bpfmap-volume
 volumes:
 - name: bpfmap-volume
   hostPath:
     # directory location on host
     path: /var/run/example/map/
     # this field is optional
     type: Directory
 restartPolicy: Never

In an ideal scenario, a volume type other than hostpath can be used to share pinned eBPF objects with unprivileged Pods. There’s an ongoing investigation to understand if/how this would work. But for now, the ideal way to limit access to the pinned objects is to create a separate BPFFS per Pod, and manage access to that mount point using a DaemonSet like bpfd.

A fully functional Kubernetes example of an unprivileged Pod accessing a pinned eBPF map can be found here.

Summary

It’s possible to leverage the benefits of eBPF in Kubernetes applications without having to escalate their privileges or use a model that leverages a control process to explicitly share eBPF object FDs with the applications. While hostpath is the volume type presented as part of the solution in this blog, more investigation is needed to understand if another (more suitable) volume type can be used.

References

[1] Lifetime of BPF objects, Alexei Starovoitov,  2018.

[2] Persistent BPF objects [LWN.net] , Jonathan Corbet, 2015.