We maintain various SLAs with our customers for our suite of services, some 99.9%, but a few are higher at 99.99%. Currently, this downtime is measured very accurately, by documenting outages as soon as we detect and confirm that a system is down.
Breaches of these SLAs in some cases can result in financial penalties for the company (e.g. payouts to customers).
However, most of our systems, while available 24/7/365, are primarily used during U.S. business hours.
When measuring downtime, esp. considering that real money can be involved, is it appropriate (or required) to measure downtime in this way? Should it instead be measured from a customer perspective, so that we don’t count an outage as actual downtime if end users didn’t experience it?
I appreciate the honesty at the heart of our current system, but wonder if it’s too simplistic.