Deploying a full-service 5G network on OpenShift

During the last Kubecon North America in San Diego, a cross-vendor team of engineers from Red Hat and several other companies rolled a half-rack of servers and a self-made Faraday cage onto the keynote stage and demoed live a full 5G/4G network connected to two additional deployments in Canada and France, all containerized and running on Red Hat OpenShift Container Platform clusters.

This live demo was the culmination of an intense, multi-month community effort supported by Linux Foundation Networking, and we had the honor of working on the site located in France at Eurecom, a research institute on telecommunications, that is the initiator and main contributor to the OpenAirInterface 5G/4G project. In this post we explore how that 5G network was constructed and deployed on the Kubernetes-based open source OpenShift platform.

Where open source has changed the way we understand software development, that change of mindset arrived to the telecom industry about five years ago with Network Function Virtualization (NFV), the concept of running traditional telco hardware appliances (routers, firewalls, load balancers) on commodity servers in the form of virtual machines. Today, the industry is participating in recently created consortiums such as OpenAirInterface or O-RAN, that have the goal of evolving radio access networks in an open way, with a lot of that work now focusing around containers.

Into this situation with NFV and containers is where Red Hat’s engineers bring expertise in open source software development, such as around Kubernetes. Kubernetes is described as a “portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation.” OpenShift is Red Hat’s distribution of Kubernetes that allows enterprises to run their applications on the hybrid cloud. With enterprises around the globe adopting Kubernetes and OpenShift as a de facto standard platform to deploy applications, can we do the same for the new generation of mobile networks?

How we kubernetized a 5G network

Let’s get into some of the details of how we deployed OpenAirInterface on OpenShift. The OpenAirInterface project fosters a community of industrial as well as research contributors for software and hardware development for the core network (EPC) and access network and user equipment (EUTRAN) of 3GPP cellular networks. The main repository with the container image build recipes, Kubernetes manifests, and helper scripts can be found at the project’s GitHub repo.

Our first step was to containerize all OpenAirInterface components and produce consistent and reproducible container image builds. For this particular demo, we used OpenAirInterface’s Evolved Packet Core (EPC) for the core network. The EPC consists of three main components:

  • Mobility Management Entity (MME): authenticates and authorizes users and manages both their current session state as well as mobility state, i.e. which base station the user is attached to and how to hand over to a different base station.
  • Home Subscriber Server (HSS): it is a master database that stores all users’ subscription profiles, authentication keys, etc.
  • Serving & Packet Data Network Gateway (S/P-GW): serves as entry and exit point for traffic, enforces the operator’s traffic policies, acts as mobility anchor and routes traffic to a user’s current base station, and so forth.

The S/P-GW was deployed as one component processing the user traffic and another component handling the control signalling. This so-called Control / User Plane Split (CUPS) architecture allows scaling control traffic and user traffic capacity independently.

On the Radio Access Network side, the Evolved Node B (eNodeB or eNB) in a 4G network, is the element that communicates directly and wirelessly with mobile handsets. It uses different protocols to connect to the MME and S/P-GW and handles processing of the radio signals. In a 5G network, this very same component is called Next Generation Node B (gNB). It features advanced Software Defined Radio (SDR) technology to achieve better performance and flexibility.

The project’s code repository contains the scripts necessary to build all these components from source into small, ready-to-run container images. For the container base layer, we used Red Hat’s Universal Base Image (UBI), a lightweight enterprise-grade base image with curated, hardened, and stabilized package content that allows developers to focus on their applications while having the option to run images in a fully vendor-supported manner.

Next we worked on deploying our own 5G/4G network on the OpenShift Kubernetes distribution. The main challenges we had to overcome were typical for migrating software designed to run on physical hosts to Kubernetes: ensuring a service makes no assumptions about the specific host it runs on or where other services are deployed relative to it. This includes looking up services from the Kubernetes clusters’ Domain Name System (DNS), ensuring services gracefully restart, retry, etc.

Further, OpenShift and Red Hat Enterprise Linux (RHEL) as enterprise-grade Kubernetes and Linux distributions, respectively, default to a more locked-down security model, but like many workloads that make extensive use of kernel or hardware features, OpenAirInterface services assumed they had full root-level access. We instead ran them as regular users with the least amount of privileges to certain system capabilities. The necessary Kubernetes manifests are in the project’s code repository.

How we configured OpenShift to run the 5G network

Telco / 5G network functions are among the more exigent Kubernetes workloads, but they are not unique: customers from high performance computing, high frequency trading, industrial control, et al. are asking for pretty much similar sets of capabilities. This is why we at Red Hat work to develop these capabilities upstream alongside the rest of the Kubernetes community and through this to become native capabilities of OpenShift, too, instead of telco-specific extensions.

To support the 5G network in a production-like deployment, we configured OpenShift to segregate real-time and non-real-time compute workloads as well as management, control, and data plane traffic according to the following logical deployment architecture:

The “Distributed Unit” (DU) part of the eNBs / gNBs are highly latency and jitter sensitive, so they are deployed onto real-time capable Kubernetes workers. These require a number of special configurations:

 

  • BIOS configuration:When the hardware, firmware, or firmware settings for the host machine running the real-time workload introduce non-deterministic latency spikes, there’s nothing the host OS or Kubernetes can do to mitigate this. Therefore, the first step is to eliminate hardware- and firmware-level sources of non-determinism such as disabling C-states (CPU power saving), P-states (CPU frequency scaling), EDAC (ECC memory scans), etc.

 

  • Host OS configuration: Next, the host OS needs to run a low latency kernel with the real-time preempt patches and certain OS level tuning, such as enabling huge pages, isolating CPU cores, disabling timer ticks, disabling IRQ load balancing, etc. In OpenShift, which is running on immutable RHEL CoreOS hosts, this can be configured declaratively using Kubernetes MachineConfig resources to enable the real-time RHEL kernel and auto-tune the host using the tuned real-time profile.

 

  • CPU Resource Management: Finally, to ensure Kubernetes places a real-time workload onto isolated cores on the real-time capable Kubernetes worker, we need to configure the static cpuManagerPolicy on the Kubelet, and set resource requests and limits for both CPU and memory.

 

 

 

On the networking side, the following changes are required:

 

  • Multiple Interfaces: Most of the telco deployments require a clean segregation of networks for control, user data, and management traffic. OpenShift Container Platform 4 supports this out of the box using Multus CNI. In deployment, we use the Kubernetes cluster network for management traffic to connect the OpenAirInterface services, but create secondary networks to segregate the 3GPP control and data plane networks. The eNodeB (4G) and the gNodeB (5G) pods are connected to the USRP software-defined radios via dedicated, bonded interfaces.

 

  • SCTP: Some 3GPP protocols rely on SCTP for the network transport layer. Related OpenAirInterface services therefore open SCTP sockets to be able to communicate. That meant we had to enable the Kubernetes SCTP feature gate and whitelist and load the Linux SCTP kernel module on worker nodes. Thanks to OpenShift and RHEL CoreOS, this is again a matter of creating a MachineConfig and using labels and selectors to apply this configuration to all worker nodes.

 

 

What can radio hackers do with this?

Now that we have reviewed most of the software-side requirements, what else would you need to have a fully functional 5G/4G network?

In the end, a smartphone has to connect to the network via a radio unit and antenna built for a certain frequency band. Professional hardware for production networks comes with a steep price tag, though. Fortunately, the open hardware movement has led to a democratization of software-defined radio hardware and there are more and more radio hackers doing research on these technologies.

That is why there is low-end hardware available for prototyping for less than a thousand US dollars.

Possible shopping list:

  • USRP B200-mini ($500)
    • Up to 50 MHz BW
  • Custom 20 dBm PA/LNA/Switch ($300)
    • Band 38, 42/43, n38/n77-78
  • Upboard/Upboard2
    • (low-end $90 PC)
  • GbE frontHaul POE+
  • Antenna

Conclusion

The telco industry is truly working with community development models and open source technologies to make the new generation of mobile networks a reality. At Red Hat, we work every day to make our platforms, such OpenShift, suitable for new telco use cases, and 5G is clearly a very powerful one. 5G is designed to bring to the enterprise world as well as to the regular consumer, high throughput and low latency bandwidth that will enable the use cases of the future like IoT, autonomous cars, and many other applications deployed at the edge of the networks. Red Hat plans to keep working with service providers to make sure 5G stays open.

 

Using machine learning and analytics to help developers

It was the talk title that caught my eye – “Developer Insights: ML and Analytics on src/”. I was intrigued. I had a few ideas of how machine learning techniques could be used on source code, but I was curious to see what the state of the art looked like now. I attended the session at DevConf.cz 2020 by Christoph Görn and Francesco Murdaca of the AI and ML Center of Excellence in Red Hat to hear more.

The first question I had was “where did they come up with the project name Thoth?” My initial guess was that “Thoth” was an ice moon from the Star Wars universe, or maybe a demon from Buffy the Vampire Slayer. It turns out that Thoth is the Ancient Egyptian god of writing, magic, wisdom, and the moon. The Egyptian deity theme runs through the project, with components called Thamos, Kebechet, Amun, and Nepthys, among others.

The set of problems that Thoth aims to solve is an important one. Can we help developers identify the best library to use, by looking at what everyone else is using for a similar job? Can we help identify the source of common performance issues, and suggest speed-ups? Can we create a framework that can enforce compliance, and help minimize risk, as applications grow?

Continue reading “Using machine learning and analytics to help developers”

Size matters: how Fedora approaches minimization

As part of a modern IT environment, Linux distributions can look to optimizing their size to be better suited for container use. One of the ways this improvement can happen is through reducing the size of a distribution, a process known as minimization. A new tool is being put together that will enable developers and operators to create minimal images of the appropriate size for the container use cases they need.

Graphic represents the relationships between all of the software repositories in Fedora Linux, many thousands of green dots cross-connected to appear like a cloud nebula.
Graphical representation of Fedora repository relationships. Image by: Adam Šamalík

Continue reading “Size matters: how Fedora approaches minimization”

Managing application and data portability at scale with Rook-Ceph

One of the key requirements for Kubernetes in multi-cluster environments is the ability to migrate an application with all of its dependencies and resources from one cluster to another cluster. Application portability gives application owners and administrators the ability to better manage applications for common needs such as scaling out applications, high availability for applications, or just simply backing up applications for disaster recovery. This post is going to present one solution for enabling storage and data mobility in multicluster/hybrid cloud environments using Ceph and Rook.

Containerization and Container Native Storage has made it easier for developers to run applications and get the storage they need, but as this space evolves and matures it is becoming increasingly important to move your application and data around, from cluster to cluster and cloud to cloud.

Continue reading “Managing application and data portability at scale with Rook-Ceph”

Kiali: An observability platform for Istio

Istio exists to make life easier for application developers working with Kubernetes. But what about making Istio easier? Well, that’s Kiali’s job. Read on to learn more about making Istio even more pleasant to use.
Deploying and managing microservice applications is hard. When you break down an application into components, you add complexity in how those components communicate with each other. Getting an alert when something goes wrong, and figuring out how to fix it, is a challenge involving networking, storage, and potentially dozens of different compute nodes.

Continue reading “Kiali: An observability platform for Istio”

Current Trusted Execution Environment landscape

If you run software on someone’s servers, you have a problem. You can’t be sure your data and code aren’t being observed, or worse, tampered with — trust is your only assurance. But there is hope, in the form of Trusted Execution Environments (TEEs) and a new open source project, Enarx, that will make use of TEEs to minimize the trust you need to confidently run on other people’s hardware. This article delves into this problem, how TEE’s work and their limitations, providing a TEE primer of sorts, and explaining how Enarx aims to work around these limitations. It is the next in a series that started with Trust No One, Run Everywhere–Introducing Enarx.

Continue reading “Current Trusted Execution Environment landscape”

Scaling workload storage requirements across clusters

A number of multi-cloud orchestrators have promised to simplify deploying hundreds or thousands of high-availability services.  But this comes with massive infrastructure requirements. How could we possibly manage the storage needs of a thousand stateful processes?  In this blog, we’ll examine how we can leverage these orchestrators to address our dynamic storage requirements.

Currently in Kubernetes, there are two approaches in how a control plane can scale resources across multiple clusters.  These are commonly referred to as the Push and Pull models, referring to the way in which configurations are ingested by a managed cluster.  Despite being antonyms in name, these models are not mutually exclusive and may be deployed together to target separate problem spaces in a managed multi-cluster environment.

Continue reading “Scaling workload storage requirements across clusters”

Prometheus anomaly detection

With an increase in the number of applications being deployed on Red Hat OpenShift, there is a strong need for application monitoring. A number of these applications are monitored via Prometheus metrics, resulting in an accumulation of a large number of time-series metrics stored in a TSDB (time series database). Some of these metrics can have anomalous values, which may indicate issues in the application, but it is difficult to identify them manually. To address this issue, we came up with an AI-based approach of training a machine-learning model on these metrics for detecting anomalies.

Continue reading “Prometheus anomaly detection”

Sentiment analysis with machine learning

When developing a new technology, it really helps if you are also a user of that new tech. This has been an approach of Red Hat around artificial intelligence and machine learning — develop openly on one hand, exchanging knowledge across the organization to use the same tools in the other hand to work on interesting business problems. All while keeping a two-way exchange to and from the open source commons.

This is the sort of left-hand/right-hand move that data scientist Oindrilla Chatterjee began using as part of a project she originally started during an internship, then later in a full-time role at Red Hat. Chatterjee and her team are looking at how to do sentiment analysis using machine learning on a dataset consisting of customer and partner surveys regarding a service offering.

Continue reading “Sentiment analysis with machine learning”

Red Hat and NVIDIA bring scalable, efficient edge computing to smart cities

Teams from Red Hat and NVIDIA have collaborated on creating a scalable hybrid cloud application that could revolutionize smart city initiatives such as traffic-flow monitoring and transportation management around the world. By working together, the two companies are creating solutions that make cities smarter and more efficient by taking sensor data and processing it in real-time to provide insights for traffic congestion, pedestrian flow, and infrastructure maintenance.

Running on top of the NVIDIA EGX platform with the NVIDIA GPU Operator, the application is built with NVIDIA’s Metropolis application framework for IoT that brings together innovative capabilities for real-time image processing where NVIDIA DeepStream SDK is used to extract metadata from live video streams at the edge. It then forwards the right metadata to the cloud for deeper analytical processing and further representation in an information dashboard depicted below.

Continue reading “Red Hat and NVIDIA bring scalable, efficient edge computing to smart cities”