99.9% uptime is an industry-standard SLA. It's also, when you do the arithmetic, 8 hours and 45 minutes of downtime per year — basically a working day. We hit 99.98% actual across our managed fleet in 2025. Here's the breakdown of where the lost minutes went.
The biggest single category was scheduled maintenance windows that ran over their allotted time (about 40% of total downtime). The second was upstream provider incidents we couldn't route around (about 30%). The third was our own configuration mistakes (about 15%). The fourth was client-initiated downtime — usually a credentials problem or a botched deploy on a site we don't deploy for them (about 10%). The remaining 5% was a long tail.
The one structural change that's moved the needle most: putting every site behind Cloudflare with smart routing on. The second: enforcing that every patching window can be aborted within 90 seconds.