In our previous blog, Managing application and data portability at scale with Rook-Ceph, we talked about some key features of Rook-Ceph mirroring and laid groundwork for future use case solutions and automation that could be enabled from this technology. This post describes recovering from a complete physical site failure using Ceph RBD mirroring for data consistency coupled with a GitOps model for managing our cluster and application configurations along with an external load balancer all working together to greatly minimize application downtime.
This is done by enabling a Disaster Recovery (DR) scenario where the primary site can failover to the secondary site with minimal impact on Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO).