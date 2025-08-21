How to avoid outages in financial services

Three models for maximum uptime in financial services

In the financial services industry, downtime isn’t an inconvenience — it's a catastrophe. A single outage can lead to staggering financial losses, punitive regulatory fines, and long-term reputational damage.



That’s why financial institutions are on a quest for “unbreakable” infrastructure — one that ensures resilience. This journey has led many to explore multiple architectural models, from relatively simple single-cloud models to complex multicloud models. At the heart of this quest lies a fundamental tension: the pursuit of resilience versus the realities of cost, complexity, and operational risk.

Today, accelerating the resilience journey is urgent. The financial services industry remains among the most frequently targeted for cyber attacks. Cybercriminals, including those backed by nation-states, are attempting not only to steal valuable data but also cause significant disruptions to financial systems. And the availability of new technologies — including AI tools and (soon) quantum computers — are enabling attackers to launch larger, more sophisticated attacks that are more successful at causing disruption.

At the same time, the CrowdStrike outage in 2024 was a massive wake-up call to financial services companies — and regulators. Companies are now determined to explore new IT architectures that avoid single points of failure.

In working with leading financial services companies, I’ve found that there is no single path to resiliency and no single architectural model that is perfect for everyone. However, nearly all choose from one of three approaches. Whether you intend to strengthen resilience by moving from on-premises infrastructure to the cloud, or transitioning from a single cloud provider to multiple clouds, exploring the pros and cons of each model can help ensure you are making the right choice for your organization.

Model 1: Improving availability with a single cloud provider

The cloud has long been recognized for enhancing resilience. By using cloud services, organizations can avoid the cost and complexity of building, managing, and maintaining their own infrastructure — including backup data centers for protecting data and high-availability (HA) clusters for maintaining application availability.



For the vast majority of financial services organizations, cloud-based resiliency begins with a single, trusted cloud provider. They leverage that provider’s built-in HA features, for example, by distributing workloads across multiple availability zones (AZs). If one AZ goes down, an application is designed to have its traffic rerouted to the others, ensuring business continuity.

Model 2: Reducing risk with a “polycloud” strategy

I first heard the term “polycloud” from Goldman Sachs technology executives around 2022, though the term may have been coined earlier. While a “multi-cloud” approach simply means using services from multiple cloud providers, a polycloud strategy involves strategically dividing workloads among two or more providers. The providers don’t necessarily run the same workloads at the same time. Instead, an organization assigns workloads to different clouds, often based on the appropriateness of a cloud service to a specific workload.

For example, a bank might run their retail banking applications on one cloud platform and their investment banking operations on another. I’ve also encountered a few large institutions that choose to host their website with one cloud provider and their mobile application with another.



Model 3: Eliminating service interruptions with an active-active multicloud approach

For the largest systemically important financial institutions (SIFIs), not even the polycloud model provides sufficient resilience. This handful of organizations instead implements an “active-active” multicloud architecture. With this architecture, the same critical workload — like a core banking application — runs simultaneously across two or more cloud providers. Traffic is load-balanced between them, so if one provider fails, all traffic is automatically rerouted to the other with no interruption in service.

When I ask leaders at these institutions why they have adopted this model, the answer almost always involves regulatory requirements. These are organizations that must adhere to the requirements for operational resilience, such as requirements outlined by the Federal Reserve or in the EU’s Digital Operational Resilience Act (DORA).

Despite the challenges, this model is often a requirement for those few institutions that are considered “too big to fail.” This model provides the requisite proof to regulators that they have taken every possible step to ensure financial stability. Consequently, for these institutions, the high costs of this model are the necessary costs of doing business.



Envisioning a future of intelligent resilience

There is no one-size-fits-all solution. The right architecture for any given financial institution depends on their size, ability to handle complexity, risk tolerance, and regulatory obligations. As your organization works to enhance resiliency, the key will be to make informed decisions, carefully weighing the trade-offs.

Keep in mind that technology options will evolve — and that could alter your decisions. For example, AI and machine learning will likely play an increasing role in predicting and preventing outages while new tools will help simplify the management of complex polycloud and multicloud environments.

