What 100% Uptime Actually Means — And How to Design for It

Every connectivity vendor promises uptime. The number in the SLA varies — 99.9%, 99.95%, 99.99% — but the promise is universal. What differs significantly is what the promise actually covers, what happens when it is not met, and whether the underlying architecture can actually deliver the claimed availability.
Why SLA Percentages Are Not the Right Metric
Consider what 99.9% uptime actually means in practice: 8.7 hours of downtime per year. For a restaurant, that could be an entire dinner service. For a contact centre, it could be a day of lost productivity. For a healthcare facility, it could be a period of impaired clinical operations.
More importantly, SLA percentages tell you what a carrier will credit you for after an outage — they do not tell you how the architecture behaves when a component fails. A 99.9% SLA on a single circuit does not prevent an outage. It determines what compensation you receive afterward.
The Architecture Problem
Most business connectivity setups are architected for normal operating conditions, not failure conditions. A single primary internet connection with no tested failover is the most common architecture — and it is a single point of failure.
The failure modes are more varied than most organisations appreciate:
- Last-mile failures — The physical connection between your building and the carrier’s network can fail independently of the carrier’s backbone network.
- Carrier backbone failures — Regional outages can affect all customers on that carrier’s infrastructure simultaneously.
- Equipment failures — The router or modem at your location can fail independently of everything else.
- Power failures — A UPS protects against brief outages; extended power disruption takes down connectivity hardware regardless of carrier status.
What Genuine Network Redundancy Looks Like
Independent Carriers, Independent Infrastructure
The primary and secondary connections must use different carrier infrastructure — not just different services from the same carrier. Two Bell circuits running through the same last-mile infrastructure can fail together. Primary Bell fibre with secondary LTE on Telus provides genuine carrier independence.
Automatic Failover
Manual failover introduces delay that is operationally unacceptable. Automatic failover at the network level (SD-WAN or dual-WAN router with failover configured) means the switch happens in seconds, before most users notice.
Tested Failover
This is where most “redundant” architectures actually fail. Many organisations have redundant connections configured on paper but have never tested what happens when the primary fails. Tested failover means deliberately taking down the primary connection and confirming that traffic moves automatically, switchover time is acceptable, secondary bandwidth is sufficient, and critical systems handle the switchover correctly.
Monitoring and Alerting
A connectivity failure that takes 30 minutes to be noticed costs significantly more than one that is detected automatically in seconds. Network monitoring that alerts on primary connection failure is a basic operational requirement.
The Economics of Redundancy
The most common objection to redundant connectivity is cost. The question is not “what does redundancy cost?” The question is “what does an outage cost?” For most businesses, a two-hour internet outage costs more in lost productivity, lost transactions, and operational disruption than six months of a secondary LTE connection.
If you want to understand where your current connectivity architecture is exposed, identify every service that stops working when your primary internet connection fails, and estimate what that costs per hour. The answer usually makes the cost of redundancy look very different.
Keep reading






