The Failover That Failed Successfully - Lessons from a Successfully Failed Disaster Recovery and Failover Test
APR 6, 202633 MIN
The Failover That Failed Successfully - Lessons from a Successfully Failed Disaster Recovery and Failover Test
APR 6, 202633 MIN
Description
Conducted during a busy release weekend, the failover test exposed gaps not in the technology itself, but in coordination and communication. While production ultimately stayed unaffected, the situation quickly escalated as subcontractors weren't aligned, assumptions didn't match reality, and information didn't flow when it mattered most. We unpack how a well-intentioned test turned into a coordination challenge, where timing, dependencies, and unclear responsibilities created confusion across teams. It's a story about how resilience isn't just about systems and infrastructure, but also about people, processes, and making sure everyone is on the same page — especially when things are supposed to "just be a test." 00:00 Welcome & Setup 01:34 Corporate Environments 03:30 Failover Planning 07:19 Double Disaster 09:08 Critical Failure 13:20 Realization Moment 15:28 Split Brain 17:34 The Recovery 21:13 Lessons Learned 31:32 Conclusion