The Future of Storage in Container Space: Part 4

by | Jul 23, 2018 | AI

The challenges of maintaining persistent storage in environments that are anything but persistent should not be taken lightly. My recent conversation with Ceph founder Sage Weil certainly made that clear. Thus far, the conversation with Sage has highlighted key areas of focus for the Red Hat Storage team as they look to the horizon, including how storage plans are affected by:

  • Hardware trends (examined in Part 1)
  • Software platforms (reviewed in Part 2)
  • Multi-cloud and hybrid cloud (discussed in Part 3)

In the last segment of our interview, Sage focused on technology that’s very much on the horizon: the emerging workloads. Specifically, how will storage work in a world where artificial intelligence and machine learning begins to shape software, hardware, and networking architecture?

Artificial intelligence (AI) and machine learning (ML) is a key emerging use case for storage, with many players in IT working on the unique problems surrounding AI/ML. This is an area where Sage believes Red Hat should have a distinct advantage. For one thing, the key to AI/ML is not the inclusion of a lot of features, but rather scale.

“Basically people are building big scalable platforms and integrating it with one toolset,” Sage explained. There doesn’t have to be a lot of integrations with legacy IT infrastructure components that have accumulated over the years — the application is greenfield and all that matters is that it has to scale. “And that’s perfect for our stuff,” he added.

Another advantage for Red Hat is that typically the applications that are using the storage are evolving quickly. For instance, there’s roughly one new AI platform coming out a year, Sage estimated. “And because it’s moving so quickly, having an open source storage platform to integrate with means we can very quickly make things work well together and attach ourselves to those communities,” he explained.

Another way Red Hat’s skill set matches is with the provisioning of applications and storage dynamically.

“A lot of these workloads are very dynamic and elastic, so they’ll be spinning up a job that’s doing a bunch of machine learning stuff in containers and they might store a bunch of data and then retrieve it and use it and then throw it all away,” Sage said. This kind of workflow is a good match for Red Hat’s technology.

Currently, most users are combining tools such as Apache Spark, the S3/Apache Hadoop connector known as s3a, and Ceph RGW to create a stack that is similar to data lake environments. A recent tutorial at Red Hat Summit highlighted the power of such a stack by running Jupyter, Tensorflow, and Ceph.

Moving forward, Red Hat is collaborating with other vendors to put together reference architectures to make storage a good fit for AI workloads, as well as creating shared orchestration platforms.

On the Road Again

Autonomous driving, as Sage describes it, is a large-scale example of the AI/ML challenge. There are usually two parts to the problem, Sage explained. “There’s stuff that happens in the car and there’s stuff that happens in the cloud. Looking at the cloud part, these autonomous driving companies are storing petabytes of data a day, because they have all this video, all the sensor data from the cars that they are archiving and using to learn.”

“Whenever they change their data driving model, they have to go back and revalidate the model against all their existing data, so having these huge datasets is where all the value comes from,” he added.

Red Hat is working with multiple vendors that are talking about deployments that are ingesting a petabyte of data per day, which means the Red Hat Storage team is working to make Ceph scale very well.

A great place to test this kind of scalability is working with technicians at CERN, where a single Ceph cluster was tested at a scale of 10,000 object-storage devices, about 40 PB of data. With these successes under their belt, the Storage team is still working to scale Ceph more, as well as improve the capability to federate multiple Ceph RGW clusters within a single object namespace. Even if a single Ceph cluster can’t handle an exabyte worth of data yet, multiple clusters easily can.

Right on the Edge

While data in the cloud is about scale, data management out on the edges of the network, within the devices or vehicles themselves, presents completely different challenges. On this edge, composed of mini-datacenters with hundreds of CPUs that are pulling in data from multiple sensors, there’s no time to wait for the data to be sent all the way up to the cloud, get a decision, and then send that decision back for the device to act. This is particularly true when connectivity is intermittent.

“Vehicles are not going to [be] waiting for the cloud to tell them what to do,” Sage explained. “They have to act autonomously, which means they need to store data, they need to process data, and maybe do their own local learning. And they need to be reliable.”

Here, then, the solution is about scaling down storage systems. But a lot of Red Hat Storage work has been put into automating storage management, so that it works with minimal human involvement.

To that end, Ceph features could eventually include improved problem mitigation, so the storage system will know how to handle issues such as slow devices, or optimally manage data retention. Multiple architecture support, such as for ARM and Power, may also play a big part in the storage roadmap for edge storage, since devices in this part of the network tend to be much more power constrained.

There are, clearly, a lot of challenges ahead for Red Hat in the storage and container ecosystems. But those are challenges that have been identified and open source projects are well on their way to solving.