The rise of container technology has created a new challenge for the storage industry. Within containers, applications, and computation resources are now incredibly mobile, while storage still has to remain persistent and accessible. Here’s how Red Hat is working to address the storage needs of container workloads.
In modern microservice-based architectures, each container is a transient object. It might live on one server for a while and then get moved over to another if directed by an orchestrator tool. While a container keeps its bundle of application software and dependencies during its lifecycle, it usually does not keep application data within the container. Nor should it. After all, in this model a container is designed to run only what is needed and when it is needed. When done, the container is allowed (in fact encouraged) to disappear. If an application’s data were held inside that same application container, too, then pfft!
That’s a challenge.
The technical description is that microservices should be stateless, in that they are capable of starting up, doing their work, and shutting down as part of a workflow. But the applications they support are almost inevitably stateful–meaning once created, data need to stick around at least long enough for another application (or person) to look and do something with the data. Currently, there are two ways of taking on this problem of persistent storage around containers.
The first way is providing storage for containers, where container‐ready storage is exposed to a container or a group of containers from an external mount point over the network. By telling all of the containers in a cluster, “hey, your storage needs are all over on this server,” the basic persistent storage hurdle is overcome. Apps in containers simply follow the guidelines of the network to dump data in the appropriate place on the network.
Sending data across a network to a separate storage area means multiple layers of management. Plus, while orchestrators like Kubernetes can use APIs to connect to container-ready storage and take advantage of features like dynamic provisioning, there aren’t many storage platforms that can do this. So some of the advantages of container-space computing are rendered moot.
The second way has a bit more appeal: put the storage inside containers that run alongside the application containers. The main benefit is easy to see: getting all the compute and storage within containers means you only have to manage with a single orchestration tool like Kubernetes. Kubernetes and distributed applications are complicated enough without adding more layers of management into the equation.
This, then, is where container-oriented storage technology is currently at. But innovations are soon on their way. A recent presentation from Sage Weil, Ceph Project Lead, at Red Hat Summit gave a far-reaching overview of the challenges and innovations that lie ahead for the future of storage.
I recently sat down with Sage to talk with him about his presentation, and learn what areas look most interesting to him and the rest of the Red Hat Storage teams. Sage had his talk broken out into four key areas, essentially moving from the bottom of the stack on up.
- Hardware trends (outlined here)
- Software platforms (outlined in Part 2)
- Multi-cloud and hybrid cloud (outlined in Part 3)
- Emerging workloads (to be outlined in Part 4)
It should be noted that hardware trends affect all of the IT landscape, not just container-oriented storage solutions. But the changes coming down the road are certainly expected to have an impact on container storage moving forward.
Setting up the current state of hardware, when talking about traditional hard-disk drives (HDDs) and solid-state drives (SSDs), the marketplace is showing declining revenue reports for HDDs, while revenue from SSDs continues to climb and has indeed passed HDD revenue in recent years.
“But,” Sage added, “there are still 10X number of bytes shipped, because [HDDs] are so much cheaper.”
This situation isn’t really going to change, Sage explained, until SDD vendors start building more fabrication facilities, each of which can cost up to billions to produce, and current annual revenue rates for all SDD vendors is just now around $40 billion, according to the first graph above.
Because of this dramatic difference in price, “hard drives are increasingly relegated to archival workloads. So most of those hard drive sales aren’t going onto peoples’ desktops or laptops anymore. They’re going into the big cloud data centers. So it’s cold data and video data, streaming data.”
Everything else, Sage explained, is (or soon will be) flash (SSD). So though SSDs are still more expensive now, they are getting faster and cheaper, so Red Hat plans to focus on that hardware technology moving forward.
More accurately, Red Hat plans to continue to work on SSD technology. Over the last 10 years, Sage explained, developers from Red Hat have been working with others on rewriting the Linux I/O stack to optimize for SSD devices, and certainly both Ceph and Gluster play well with SSD and HDD hardware.
“As we go forward, we need to be smarter about using the flash more efficiently, by using less CPU,” Sage told me. “So there’s a big project underway in Ceph, to basically rewrite the OSD [object storage device] using new programming frameworks [Seastar] and user-space drivers [DPDK and SPDK] to capture more of that performance.”
Software vendors aren’t the only organizations working on the persistent memory problem. Around the time of Red Hat Summit in May, 3D XPoint (pronounced “cross-point”) technology was just beginning to hit the market in DIMM form (until now it’s only been used to accelerate traditional NVMe SSDs). This development, according to Sage, has the potential to completely upend the way the software stack approaches persistences. Red Hat has already built in enablement to use this new phase-change technology, such as direct access for files (DAX) features built into the Linux kernel for the ext4 and XFS file systems. It will be a long time before system architectures can be reworked to completely capture all of the benefits of the XPoint technology.
Another key trend is the recent rise of Non-volatile Memory Express (NVMe) over fabrics, a system architecture that enables SSDs normally attached via PCIe to be accessed over a network fabric, usually within the same rack. The idea is similar in concept to a traditional storage area network, but with updated protocols and more flexibility on the network transport, which may be Infiniband, Ethernet, or some proprietary protocol.
For his part, Sage views fabric-oriented storage as addressing only part of the overall problem. While fabric-attached storage helps solve the issues of elasticity and scalability in a system (once you sprinkle in some management), it still does not solve the issues of reliability, durability, and the system’s ability to tolerate device failures. Since the Ceph and Gluster aspects of Red Hat Storage are very much about reliable storage that can tolerate failures, Sage does not see fabrics as being particularly helpful or even directly competitive to the current software-defined storage platforms.
Even though NVMe fabrics may not be the best approach for containers’ persistent storage needs, Sage says that they can be complementary. NVMe fabrics may also be the best approach for other workloads, and work is being done to support and enable fabric storage in Red Hat Enterprise Linux. Most environments have a range of data sets and varying data needs, and not all data needs to be replicated: temporary data, for instance, or data associated with stateless microservices can be safely lost in the event of a hardware failure. Provisioning and management applications may still need ephemeral storage systems as part of their underlying hardware set, for example. The good news is, platforms like Kubernetes provide the tools to map the diverse storage requirements of applications to appropriate storage backends.
The advances being made in the realm of hardware are certainly no surprise to the Red Hat Storage team, which is focused on keeping pace with innovations of their own. In the next installment of this series, we’ll learn more about the types of software platforms that are here and are coming in container technology, and what storage solutions are being devised for them.