It was not that long ago when organizations had in-house servers humming along running applications and storing data. Today, the opportunity afforded by containers means that applications can now live on a cloud platform (either public or private), or one of several available cloud platforms.
But while applications and microservices housed in stateless containers are easy to move from place to place (indeed, that’s a big part of the appeal of containers), the data the applications are accessing are stateful and very, very difficult to relocate while still maintaining consistency, latency, and throughput. This is one of the challenges faced by the Red Hat Storage team, and addressed by Sage Weil in his recent presentation at Red Hat Summit: maintaining data availability with acceptable latency when working with applications in multi-cloud and hybrid cloud environments.
According to Sage, there is no silver bullet to solving this issue. In fact, the approaches that are being tried are very much dependent on the type of storage being used: object storage, block storage, or file system storage. In his Summit talk Sage discussed past investments in open source technologies and Red Hat products as future directions.
On the object storage side, Sage explained, “Today Ceph does multisite federation with Ceph RGW object storage, [which] gives you a collection of clusters across different sites, each offering an [Amazon Web Services] S3-compatible storage service clusters with a shared global user and bucket namespace. You can create buckets in different locations and replicate asynchronously across them.”
This means that a lot of pieces for a solution framework are already in place as far as object storage is concerned. Building on those pieces, the latest Ceph release added functionality to enable Ceph RGW to sync data to a public cloud object storage service such as S3. Longer term, the storage team is looking at having a more robust offering of global data services.
“That will be a solution where you thinking of all of these different object storage footprints as a topology across multiple clouds and you have a framework to move things around dynamically to make sure applications can find the data wherever it is,” he said. “Whether it’s bursting onto a public cloud, or tiering or being encrypted and put on a public cloud, or applying policy around placement, retention, or something else.”
Block storage is very tricky because it has to be consistent. Just copying the contents of the block device that is in use won’t work, as consistency won’t be maintained. Today, Ceph RBD takes a mirroring approach that enables asynchronous replication across clusters, which are point-in-time consistent.
But, Sage added, the challenge is really in orchestrating that. In order to have a database in one cloud consuming block storage and migrate that to another cloud, automation is needed in setting up the replica, populating it, and cutting it over. This is where much of the work is being done around block storage in multi-cloud environments: putting the orchestration and automation tools in place.
File storage faces similar challenges. “People always want a global file system that will work everywhere,” Sage indicated. “But it’s sort of impossible to do it in a strongly consistent and performant way.”
Gluster today implements a disaster-recovery geo-replication feature, which Sage described as asynchronous and loosely consistent. “The interesting thing there is that even though it’s not perfectly consistent, it works for a lot of use cases.” On the Ceph side, the Storage team is looking at several possible solutions.
Some of the potential solutions for Ceph are targeting just disaster-recovery features, while some are more loosely consistent and therefore able to enable multi-site storage capabilities with bilateral replication.
Layers and Edges
Not all of the solutions for storage management in multi-cloud environments have to rely on the storage tools themselves.
Some databases, Sage highlighted, can be replicated as applications into another environment and then the database itself will use its own internal replication tools to replicate data as needed. Databases such as MySQL Galera, Cassandra, and MongoDB fall into this category. Using the application layer, then, databases and their data can be orchestrated, though it’s currently more of a tedious manual approach.
As with block and file storage, automation is clearly needed.
“Big picture,” Sage said, “what we really need to do is build a whole toolbox of tools that let you orchestrate and automate this stuff. You can just say, I have this application, it’s built with this stack, and we’ll figure out how to move it.”
Another unique challenge for storage will be the environments that restrict or even prohibit an application’s access to data. This is the “edge” — where Internet of Things devices or automobiles have applications that are producing and consuming data, under circumstances where they may not have persistent access to cloud-based data stores.
Currently, both Ceph and Gluster have the capability to scale down to run on one to three hosts, sitting on a quarter-rack mini-cluster (or something even smaller) that sits much closer to the edge of a given network. Looking ahead to the future, Sage described how the solutions that will apply for object storage — global data services — should also apply well for edge computing.
“Being able to ingest data and write it locally, then asynchronously replicate it up into the cloud,” will be a familiar and useful solution, Sage stated.
The storage solutions for the ever-changing container ecosystem seem to be well in hand. But how will storage solutions adapt to the tech that’s exceptionally innovative? In Part 4, we’ll learn how Red Hat will approach merging storage, containers, and emerging workloads.