The Promise of Open Source Network Functions Virtualization

by | Jul 27, 2017 | Hybrid Cloud

Network Functions Virtualization (NFV) is revolutionizing the telecommunications industry. That word, “revolution”, is often misused, but it is appropriate for the transformation of core network services from physical to virtual infrastructure.

The transformation is two-fold: first, operators must change profoundly the way they deploy and manage services, bringing their IT and network operations closer together; second, by disaggregating the acquisition of applications, platform and hardware, the barriers to entry for new services and new approaches to core services are lowered, and a market which has traditionally been dominated by a small number of network equipment providers is experiencing a Cambrian explosion of vendors in all sectors, including the cloud platform, management and orchestration, and virtual network functions (VNFs).

This transformation is enabled and accelerated by open source software and by development methodologies pioneered in the open source development community. The Open Platform for NFV (OPNFV) project is leveraging open source communities to build a 100% open source reference implementation of an NFV stack. Through the efforts of initiatives like OPNFV and the ETSI NFV industry specification group, open source communities are increasingly embracing telco needs like dataplane acceleration, service availability, and better fault management.

The promise of NFV is reduced time to market for new services, and agile management of existing services. By tightening the feedback loop between development, deployment, and operation of services, changes can be deployed to production with less risk. The capability of network operations will be able to manage infrastructure at scale is enabled by automation, monitoring and policy-driven application management. The economics of network management will likely change, allowing more effort to be spent deploying new revenue-generating services, instead of maintaining existing infrastructure.

The Open Source NFV Platform

The OPNFV Foundation, founded in 2014 by an alliance of operators, network equipment providers, hardware platform vendors, virtualization platform vendors, and ISVs, represents a cross-section of the entire industry. Three years later, the project has helped align vendors behind a number of features which require implementation, and has deployed a number of hardware labs to enable the integration, deployment and testing of open source NFV stacks.

The project has created a vehicle for experimentation and innovation, with multiple deployment tools, SDN controllers, and data plane acceleration projects, as well as a wide range of development projects. Active feature development is underway across all components of the Virtual Infrastructure Manager and NFV Infrastructure, covering hypervisor and virtual switch performance, fault management, high availability, policy definition and enforcement, and service function chaining.

The Importance of Working Upstream

Red Hat is an open source company. We build our products with open source software, and core to our philosophy is the principle of “upstream first”. That means, when we are building new features into our products, we work with the upstream projects to design, implement, and test those features in the communities who build them. We believe that this is the best way to build open source products, for a number of reasons.

  • Ability to influence: Working in open source communities means building relationships with other developers, and understanding the culture and processes of the community. This makes it easier to get features into the upstream project, and influence the future roadmap. Red Hat engages in all of the open source projects in our NFV platform,
  • Maintenance costs: When a feature has been implemented without first consulting the community, the path to integration can be long. The community may like the idea, but not the implementation, and may re-implement the feature in an incompatible way. Maintaining a significant patch out-of-tree means porting it to each new release; re-qualifying the feature with your product or solution every time; and potentially diverging from the upstream community permanently and missing out on new development. As a customer, if you are purchasing a solution from a company who does not share the “upstream first” philosophy, these additional costs of maintenance may be passed on to you, or may result in upgrades and security patches taking longer to arrive.
  • Security concerns: When an upstream project identifies a security issue, there is a clear process for handling it. In addition to fixing the issue, typically very quickly, the patch will be back-ported by the community to older stable branches, and vendors are informed, if they are members of the community, of the issue, and given an opportunity to backport the patch to their products. In addition, security issues with consequences throughout the stack can often require co-ordination across multiple projects and communities. By being involved in all of the projects which make up the stack, we are better able to address security issues.

For companies who choose not to engage actively in the projects on which they build their products, the costs of getting their changes upstream can be high. This may lead some to suggest that it is not worth trying, that it is better to get code into your products, and to your users, immediately.

Of course, when a feature requires changes across multiple projects, this requires co-ordination, and may take some time. This can, however, be mitigated by having active developers across all of the relevant projects. By communicating simultaneously about the use-case being addressed, with the ability to influence multiple projects, cascading changes can be integrated in a coordinated fashion in short order.

Once patches are integrated into a development branch, and will be included in a future stable release, that provides vendors the confidence to be able to back-port these features for inclusion in a supported product offering.

“Carrier Grade” Evolves with NFV

A service is considered “carrier grade” if it meets carrier requirements across four areas:

  • Service availability: To satisfy carrier requirements, core telecommunications services are considered essential infrastructure, with regulatory commitments. Essential services cannot fail, and must be resilient to failures in the underlying infrastructure.
  • Security concerns: As a public service, information stored by operators is protected by privacy laws, and held to the highest level of information security.
  • Performance: Telecommunications services must handle large volumes of requests, and data streams, without degradation of service. Performance issues can affect quality of phone calls, cellphone service, and video streams, all of which are unacceptable to subscribers.
  • Manageability: Network operations require a high level of situational awareness. Hardware has defects, routers fail, cables can get cut. The system must provide operators the ability to identify issues as they arise, predict failures before they happen, and give them the means to keep the system running at all times.

As network functions move from physical infrastructure to private cloud, the hardware, virtual infrastructure, and application architecture all contribute to service availability. In that context, it no longer makes sense to talk about “five 9s” as a measure of availability for infrastructure components. Service availability is a combination of reliable hardware, a reliable virtualization platform, and, most importantly, applications that are resilient to failures in both.

The open source community has made great progress in improving the security features, manageability, and performance of the NFV platform. Recent releases of OpenStack have brought improvements in platform reliability, and ensuring OpenStack’s services are highly available. More work needs to be done – to detect and report faults more rapidly so that management applications can take corrective action; to increase performance and throughput for packet processing; and to enable dynamic routing and load balancing of traffic across active instances.

However, there is only so far we can go by improving the cloud platform.

If a host or virtual machine goes down, the VNF must continue to function. The VNF Manager should be able to repair the failure, and bring up new instances to replace the failed ones. Unfortunately, migrating an application from a physical server to a cloud platform is not always straightforward. Most network functions will need to be re-designed, or re-written, for the cloud, rather than simply migrating or porting them to virtual machines. Changes are needed to take advantage of the fine-grained control and programmability afforded by private infrastructure as a service and software defined networking, and applications will need to be adapted to enable then to take advantage of scale-out infrastructure.

Fulfilling the Promise of NFV

Up until now, most of the industry effort has been focused on performance and reliability of the virtualization platform. We have made a lot of progress in improving the platform to meet performance requirements, and to enable the management of VNFs. But as a recent Light Reading article from Caroline Chappell has stated, applications are now on the critical path for NFV adoption.

Unlike a traditional virtualization platform, cloud applications typically require significant modifications to take advantage of a cloud platform. The management of state such as file storage and databases, session-aware load balancing, monitoring, and log management across a clustered application, all require application modifications and potentially architecture changes. Provisioning of applications need to be automated to enable continuous deployment and agility in managing services.

NFV is a journey. The first step is the development of a viable cloud platform to enable the initial deployment of NFV workloads. The next step is to modify VNFs to take advantage of the benefits of a cloud platform, migrating applications from scale-up to scale-out. Learning the lessons of cloud and DevOps will take time, and effort. Fulfilling the promise of NFV – agility, reduced time to market, and the ability to manage applications at scale and reduce operating expenditures – requires the industry to make that leap.