In server failover, a backup server is set up to take over if and when the primary server fails. Learn how server failover works and why it is crucial for disaster recovery.
Server failover is the practice of having a backup server (or servers) prepared to automatically take over if the primary server goes offline. Server failover works like a backup generator. When the power goes out in a building or home, a backup generator temporarily restores electricity. Similarly, in server failover, a secondary server takes over when the primary server fails. The goal of server failover is to improve a network or website's fault tolerance, or its ability to continue operating when one of its parts fails.
A server's primary job is to store content and data to share with other computers. While there are different types of servers, web servers are perhaps the most well-known because they keep websites and applications operational. When web servers fail, they cannot process requests, which means they cannot serve data to clients. Without server failover, a failed server can cause a loading error or a site outage.
Servers can fail for many reasons, such as:
While no one can fully predict when or how a server might fail, IT leaders know that server failure is inevitable. Failover is a backup plan that helps prevent a complete outage.
Failover often goes hand in hand with a process called load balancing. Load balancers increase application availability and performance by distributing traffic across more than one server. To ensure requests are assigned to servers that can handle the traffic, many load balancers monitor server health and implement failover.
Server redundancy is a measure of how many backup servers are in place to support a primary server. For example, a site hosted on one server with no backups is not redundant. Configuring failover creates server redundancy that improves availability and prevents outages. "Availability" describes the amount of time a site or application is online.
The terms "failover" and "switchover" are sometimes confused with one another. In failover, the shift to a redundant server happens automatically. Switchover is a similar process, only the shift to the secondary server happens manually, creating a short period of downtime. Because failover happens automatically, there is usually no downtime associated with a switch to a secondary server.
For server failover to work, servers must be connected so that they can sense issues and take over when necessary. Physical “heartbeat” cables can connect servers and allow for monitoring, just like a heartbeat monitor tracks a person’s heartbeat. Server monitoring can also take place over the Internet.
For example, Cloudflare Load Balancing periodically sends HTTP/HTTPS requests to server pools to monitor their status. If the HTTP/HTTPS check reveals that a server is unhealthy or offline, Cloudflare will reroute traffic to an available server.
Depending on the configuration, failover works slightly differently. Server failover configurations are either active-active or active-standby.
In active-standby, there is a primary server and one or more secondary servers. In a two-server setup, the secondary server monitors the primary one but otherwise remains inactive. If the secondary server senses any change with the primary server, it will take over and advise the data center that the primary server needs restoration. Once the primary server is restored, it takes over once again, and the secondary server resumes a standby position. The act of a primary server resuming operations is called failback.
By contrast, in a two-server active-active configuration, both servers must remain active. An active-active configuration is typically associated with load balancing because the servers are configured in the same way and share the workload. When a server fails in an active-active configuration, the traffic routes to the operational server or servers.
Server failover is important because a single server's failure could knock a site offline without it.
Server availability can impact industries differently. For example, ecommerce and gaming companies are completely dependent on their site working properly. Other industries, like B2B SaaS companies, risk upsetting their end users if they cannot access the information they need to do their job. At the same time, availability is nonnegotiable for industries that meet urgent needs, like medical or emergency services.
Apart from availability, failover is an important component of most disaster recovery plans. Disaster recovery plans encompass scenarios like failed backups, a network going down, or even power outages. Disaster recovery helps companies maintain business continuity and avoid the lost revenue associated with downtime.
A failover cluster refers to a group of two or more servers that work together to make failover possible. Failover clusters create the server redundancy that enables high availability (HA) or continuous availability (CA).
Systems that aim for as little downtime as possible (or 99.999% uptime) are considered HA. If an HA system experiences downtime, it should only last for a few seconds or minutes at a time. Highly-regulated industries, like government services, may need to meet high availability standards for compliance purposes.
CA systems, on the other hand, are created to avoid any downtime at all. No downtime means that users can stay connected to a site or application at all times, even during maintenance. One area where CA might be necessary, for example, is in online stock trading, where transactions are highly time-sensitive. CA systems are more complex to build and maintain because they must account for every single point of failure, from servers to physical location to power access.
As failover configurations can operate slightly differently, the speed at which failover happens can vary. Some load balancers offer fast failover, which means the system monitors server health and can quickly failover when needed. Fast failover is essential to achieve HA or CA.
Cloudflare Load Balancing achieves fast failover by actively monitoring servers and instantly rerouting traffic when an issue is detected, resulting in zero downtime. Learn more about Cloudflare Load Balancing.