Unikernels are customized, single address space bootable images composed of an application and the required bare-minimum kernel functionality. Today’s unikernels have demonstrated substantial performance and security advantages over monolithic and microkernels, but none have yet achieved widespread adoption.
The fundamental problem is that today’s unikernels, which have been developed by forking existing operating systems or as clean-slate designs, have abandoned the evolutionary community process that has made Linux such a success. In this post we describe an alternative approach we are pursuing with the goal of making unikernels a community supported, evolving capability of Linux and and the GNU C LIbrary (glibc).
Unikernels are single address space library operating systems. An application compiled into a unikernel only has the required functionality of the kernel and nothing else. Such a stripped-down kernel makes unikernels extremely lightweight, both in terms of image size and memory footprint, and also can lead to security benefits due to a reduced attack surface. There are many such lightweight unikernel implementations, e.g, LING, IncludeOS, and MirageOS. LING’s website takes 25 MB of memory because it runs on top of the LING unikernel. IncludeOS’s base VM starts at 1MB and a DNS server running on MirageOS compiles into a 449 KB image.
Since a unikernel does not have to initialize devices or services not needed for the application to run, boot times can be very fast. There is no ring transition overhead in unikernels (e.g., the overhead of going from least privileged ring 3 where applications operate to most privileged ring 0 where the kernel lives) because ring transitions are required to implement multiprocessing and by design, basic unikernels do not run multiple processes.
Additional improvements come because the kernel/application code can be co-optimized to meet the specific application’s needs. For example, researchers at Boston University’s Department of Computer Science carried out experiments to compare the performance of Linux and their specialized unikernel named the Elastic Building Block Runtime (Ebbrt). The application they chose to run was Memcached which is a memory based key value store also used by Facebook for their own data-centers. The researchers generated access patterns representative of Facebook’s own workloads and found out that EbbRT gives almost two times improvement in Memcached throughput at a target 99% tail latency as compared to Memcached running on the Linux kernel by optimizing the kernel memory management, networking, and scheduling to the needs of the application.
Why aren’t unikernels more popular?
Given all the advantages, why are unikernels still not widely used even after being around for decades? Possible answers can be found in the way these unikernels were developed. Different efforts for unikernel development over the years have followed one of the two approaches: clean slate or forking of an existing code base.
With the clean slate approach, developers have far more control over the code and the API. They don’t necessarily have to follow the POSIX API and can use newer languages, as is the case for HalVM, which uses Haskell. With such freedom, these unikernels can provide enhanced performance/security benefits.
The problem with this approach is these unikernels abandon the battle-tested Linux code, and with that, the entire Linux community as well that maintains Linux and keeps on fixing bugs. The newer codebases might be thoroughly structured and well written, but cannot match the thousands of person-years invested in development of Linux over the decades.
Furthermore, Linux has become the go-to kernel for deployment in cloud and other settings. Deviating away from that and developing something entirely new means persuading everyone to let go of Linux. If these unikernels deviate from the POSIX API, then legacy software can’t run on these unikernels. Applications have to be ported or rewritten for different, non-standard APIs.
The other approach to unikernel development is forking of an existing kernel codebase, e.g. Linux or NetBSD, and stripping it down to create a unikernel. In the process of stripping down the codebase, some projects change the original code so much that the changes don’t get integrated back into the original code bases. This essentially leads to maintaining the code as a new project. There is no existing developer community that takes care of maintaining the non-target-specific code and might even ensure that well-integrated targets don’t break without further work by the original contributor of the port.
A gradual, community based approach
Can we take a different approach to developing a unikernel? Can it follow the approach that Linux, glibc, and other such projects took: making a working version and improving it gradually? Can such a unikernel actually be part of the Linux and glibc code base so the huge community maintains the unikernel as well?
This way we would be able to use the entire code base of Linux and glibc without having to reinvent the wheel. This can provide an unchanged Linux interface to developers, with support for the existing device drivers, file systems, etc. Such a unikernel can be deployed in a virtual environment or on bare metal machines. If we can achieve this, we might have a unikernel that has, for instance, GPU support!
Over the summer of 2018, I worked at Red Hat as an intern on this project. Ulrich Drepper and Richard W.M. Jones from Red Hat, along with my PhD advisor Prof. Orran Krieger from Boston University, were my supervisors. James Cadden and Tommy Unger (EbbRT team from Boston University) also consulted on the project.
We started off with a few goals in mind. We wanted to create a unikernel out of Linux and glibc. The idea was to get it accepted upstream eventually so we had to make as few changes to the code bases as possible. We wanted to keep the Linux interface unchanged; any changes (e.g., zero copy networking) should come after the project is upstreamed and go through the regular Linux/GCC community processes.
The architecture that we came up with had an application and relevant user space libraries sitting on top of glibc, which instead of making systems calls into the kernel, made function calls. For now, we wrote a wrapper library around the system calls to translate them into direct function calls in the kernel.
To keep the application and kernel in the same address space, we stopped the kernel from creating the init process. Instead, we called our application code there. Finally, everything was linked together in a single bootable binary. We tested our setup by running a simple echo server inside the unikernel in QEMU.
This initial version of our unikernel is extremely promising because we changed just one line in the Linux code base. The changes in glibc made by Ulrich Drepper are also minimal. They are in a separate subtree that translates system calls into function calls. Such modest changes might increase our chances of eventual acceptance upstream.
The unchanged Linux and glibc API means applications can run on this unikernel with little or no changes. Additionally, it means that libraries that do not make system calls themselves can be used unmodified. We might need some more code in emulating uses of pseudo files in /proc and /sys, but those can be hidden implementation details in glibc and/or the kernel.
Coming iterations of UKL
We are currently working on the next iterations of our unikernel. We will get rid of the wrapper library and move some of the functionality into the Linux kernel and the rest into glibc. We plan to introduce config options for users to turn off or turn on functionality as required by the application at compile time. Further along, we plan to do this automatically by link time optimizations.
We are also currently working on cleaning up the unikernel build process. Right now we are borrowing functionality from the Linux kernel build process. Eventually, we hope this will be a different GCC target and all one will have to do to create an application specific unikernel is to run “make CC=ukl-gcc …”, assuming that the libraries used fulfill the requirements. Such ease of creating the unikernel will be a huge step towards making unikernels pervasive.
Once we have all the plumbing done, we plan to incorporate more support into it. We will start with features such as thread local storage, pthreads support, calling C++ constructors on boot up, etc. We will employ the research carried out by different unikernels such as EbbRT and add similar solutions here, e.g., zero copy networking.
By employing whole program optimizations and link time optimizations on the resulting codebase, we can automatically create application specific unikernels. The idea is to have acceptance by the community and gradually add features that the community can accept and maintain.
Long term, we believe that a Linux-based unikernel can provide a viable alternative to packages that, fully or in-part, bypass or avoid the kernel and do most of the processing in user space, such as DPDK and SPDK. Such packages won’t need to re-implement the functionality that already resides in the kernel and would be able to build on top of that. A unikernel approach could eliminate the overhead these kernel-bypass approaches address.
It will be interesting to see the performance benefits for applications we can get as compared to vanilla Linux. Interesting research directions can be explored while improving the performance of our unikernel by comparing it against specialized efforts, e.g., EbbRT.
This unikernel based on Linux may not be the answer to all performance and security questions out there, but it has a potential to begin addressing those questions once it becomes easier to deploy applications with it. And that is where we intend to go.