Who commits to five nines 99.999% availability?

Can anyone point me to a cloud hosting provider who will really commits to 99.999% availability? From what I’ve seen some offer this but only back it up with meaningless compensation if they fail such as refunding a months hosting fees which for any business which really needs 99.999% availablity will be out of propotion with the damge unscheduled downtime brings. The other catch that you often see is 99.999% planned availablity which means that they can schedule as much downtime as they like as long as the plan it and let you know in advance.

I ask this question because although i’m very much in favour of cloud hosting i increasingly feel that where clients really need 99.999% availablity they need to look at a hybrid solution which eiether combines your own hosting scaling out to the cloud which has been done succesfully or maybe ultimatley a mix of hosting over two or more cloud providers which as far as i know no one has really done on a large scale successfully yet.

So while contracts often call for five nines can the client afford that extra nine, are they prepared for the complications it brings and even if they are is it more of a target than something anyne will actually commit to in application hosting today?

Google Service Status Page – A great example of best practice.

Following the brief outage of gMail on September 1st I was reminded that Google publish a status page or dashboard showing the status of all their services.  You can find this service at www.google.com/appsstatus.  I mention this because it’s an excellent example of providing visibility and therefore accountability about the services you are providing which is essential if you’re being paid to provide a service.  If you’re responsible for providing various IT services to your business or customers then you really need to consider how you can create this type of service dashboard or status page.

If you’re involved in providing online services then you need to have formally agreed service up-time levels and planned maintenance times.  When agreeing up-time SLA’s you need to get people to understand the cost of moving from 98% to 99%, to 99.99% to 99.999% (five nines) up-time.  Have a think about it, the level of engineering needed to deliver 99% is quite different to 99.999%.

Availability per day per month per year
99.999% 00:00:00.9 00:00:26 00:05:16
99.99% 00:00:09 00:04:23 00:52:36
99.9% 00:01:26 00:43:50 08:45:57
99% 00:14:24 07:18:17 87:39:30

If you commit to 99.999% up-time, you’re allowed 5 minutes a year, that’s not enough time to do anything so you need to your application to be running on a distributed system over two or more sites with instant fail over and probably load balanced workload.   In contrast 99% up time allows you 87 minutes of downtime which means that you can stick with simpler technologies like RAID and mirrored servers.

Let me know what you think and how you approach up-time SLA’s.