In this post we introduce the basics of reading and writing Apache Spark DataFrames to an SQL database, using Apache Spark’s JDBC API.
Apache Spark’s Structured Streaming data model is a framework for federating data from heterogeneous sources. Structured Streaming unifies columnar data from differing underlying formats and even completely different modalities – for example streaming data and data at rest – under Spark’s DataFrame API.
Continue reading “Data integration in the hybrid cloud with Apache Spark and Open Data Hub”
In our previous blog, Managing application and data portability at scale with Rook-Ceph, we talked about some key features of Rook-Ceph mirroring and laid groundwork for future use case solutions and automation that could be enabled from this technology. This post describes recovering from a complete physical site failure using Ceph RBD mirroring for data consistency coupled with a GitOps model for managing our cluster and application configurations along with an external load balancer all working together to greatly minimize application downtime.
This is done by enabling a Disaster Recovery (DR) scenario where the primary site can failover to the secondary site with minimal impact on Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO).
Continue reading “Managing disaster recovery with GitOps and Ceph RBD mirroring”
One of the key requirements for Kubernetes in multi-cluster environments is the ability to migrate an application with all of its dependencies and resources from one cluster to another cluster. Application portability gives application owners and administrators the ability to better manage applications for common needs such as scaling out applications, high availability for applications, or just simply backing up applications for disaster recovery. This post is going to present one solution for enabling storage and data mobility in multicluster/hybrid cloud environments using Ceph and Rook.
Containerization and Container Native Storage has made it easier for developers to run applications and get the storage they need, but as this space evolves and matures it is becoming increasingly important to move your application and data around, from cluster to cluster and cloud to cloud.
Continue reading “Managing application and data portability at scale with Rook-Ceph”
A number of multi-cloud orchestrators have promised to simplify deploying hundreds or thousands of high-availability services. But this comes with massive infrastructure requirements. How could we possibly manage the storage needs of a thousand stateful processes? In this blog, we’ll examine how we can leverage these orchestrators to address our dynamic storage requirements.
Currently in Kubernetes, there are two approaches in how a control plane can scale resources across multiple clusters. These are commonly referred to as the Push and Pull models, referring to the way in which configurations are ingested by a managed cluster. Despite being antonyms in name, these models are not mutually exclusive and may be deployed together to target separate problem spaces in a managed multi-cluster environment.
Continue reading “Scaling workload storage requirements across clusters”
As a cloud user, how do you avoid the pull of data gravity of one provider or another? How can you get the flexibility and tooling to migrate your infrastructure and applications as your needs change? How do you get to the future of storage federation as data agility?
In this blog we cover the primary motivations and considerations that drive the enablement of flexible, scalable, and agile data management. Our subsequent blogs cover practical use cases and concrete solutions for our 6 federated storage patterns.
All of this is grounded in Red Hat’s work to take a lead in multi-cluster enablement and hybrid cloud capabilities. Our work is focused on leading and moving forward projects that lay the groundwork for this vision in OpenShift, driven via the Kubernetes project KubeFed.
Continue reading “Understanding and Applying Storage Federation Patterns Using KubeFed”
It’s no secret that if you want to run containerized applications in a distributed way, then Kubernetes is the platform for you. Kubernetes’ role as an orchestration platform for containers has taken center stage to become a main player for automating deployment, scaling, and management of applications within containers. Red Hat’s own OpenShift Container Platform is a Kubernetes distribution that uses Kubernetes optimized for enterprises.
Storage has been one of the areas of potential optimization. Many containers, by their very nature, are usually small enough to be easily distributed and managed. Containers hold applications, but the data those applications use needs to be held somewhere else, for a number of reasons. Of particular interest in this post, we want to avoid the containers themselves becoming too large and unwieldy to be effectively managed.
Continue reading “Rook Changes the Kubernetes Storage Landscape”
The challenges of maintaining persistent storage in environments that are anything but persistent should not be taken lightly. My recent conversation with Ceph founder Sage Weil certainly made that clear. Thus far, the conversation with Sage has highlighted key areas of focus for the Red Hat Storage team as they look to the horizon, including how storage plans are affected by:
- Hardware trends (examined in Part 1)
- Software platforms (reviewed in Part 2)
- Multi-cloud and hybrid cloud (discussed in Part 3)
In the last segment of our interview, Sage focused on technology that’s very much on the horizon: the emerging workloads. Specifically, how will storage work in a world where artificial intelligence and machine learning begins to shape software, hardware, and networking architecture?
Continue reading “The Future of Storage in Container Space: Part 4”
It was not that long ago when organizations had in-house servers humming along running applications and storing data. Today, the opportunity afforded by containers means that applications can now live on a cloud platform (either public or private), or one of several available cloud platforms.
But while applications and microservices housed in stateless containers are easy to move from place to place (indeed, that’s a big part of the appeal of containers), the data the applications are accessing are stateful and very, very difficult to relocate while still maintaining consistency, latency, and throughput. This is one of the challenges faced by the Red Hat Storage team, and addressed by Sage Weil in his recent presentation at Red Hat Summit: maintaining data availability with acceptable latency when working with applications in multi-cloud and hybrid cloud environments.
Continue reading “The Future of Storage in Container Space: Part 3”
In Part 1 of Now + Next’s closer look at the future of container storage, we examined the beginnings of the storage solution with a look at how hardware trends will affect the way storage and containers will evolve together.
In this installment, Ceph Project Lead Sage Weil continues our conversation, moving “up” the stack to software platforms. Specifically, Sage discusses where container technology is now and where it is going.
Continue reading “The Future of Storage in Container Space: Part 2”
The rise of container technology has created a new challenge for the storage industry. Within containers, applications, and computation resources are now incredibly mobile, while storage still has to remain persistent and accessible. Here’s how Red Hat is working to address the storage needs of container workloads.
In modern microservice-based architectures, each container is a transient object. It might live on one server for a while and then get moved over to another if directed by an orchestrator tool. While a container keeps its bundle of application software and dependencies during its lifecycle, it usually does not keep application data within the container. Nor should it. After all, in this model a container is designed to run only what is needed and when it is needed. When done, the container is allowed (in fact encouraged) to disappear. If an application’s data were held inside that same application container, too, then pfft!
That’s a challenge.
Continue reading “The Future of Storage in Container Space: Part 1”