In case of problems, your admin’s honesty is priceless
What uptime does your web hosting provider guarantee? Is it 99.99%? Or even 100%?
Uptime guarantees can often be twisted through marketing and are usually misleading. For example, if you exclude local outages affecting only a small number of services, you can easily claim 99.99% uptime. (For context: 0.01% downtime equals up to 52 minutes per year.)
Even the best, most modern, and most secure infrastructure cannot avoid outages.
You can have several levels of safety mechanisms, and still, from time to time, all of them might fail at once.
The important thing is how your provider handles the outage and how well-prepared their crisis scenarios are for unexpected situations.
“We always strive to be fair with our clients. We openly admit our mistakes and don’t make promises we wouldn’t believe ourselves.”
Tomáš Kostka, Director of Custom Business Solutions vo Webglobe
The worst-case scenario: How we handled an outage that should never have happened
At Webglobe, we dedicate several dozen extremely powerful physical servers solely for server solutions, separated from those used for regular hosting services.
These run our virtualisation clusters, which ensure high availability for our custom infrastructure.
They’re built to ensure that a single virtual server’s failure does not affect other services running in the same cluster.
Even if two or three go down at the same time, nothing should happen.
However, one day, through an incredible coincidence and without any prior warning signs, the entire virtualisation cluster went down.
This cluster hosted hundreds of business websites whose revenue was suddenly at risk. One of the most heavily impacted was Bonami, where every minute of downtime costs several thousand euros.
“For me, it was important to get more details than the typical ‘we’re having an outage, our admins are working on it.’ In crisis situations like this, accurate information is essential.”
Martin Patočka, CTO Bonami
Crisis plan activated: Customers come first
We immediately contacted all clients on the affected cluster and shared every bit of information we had at the time.
At the same time, we launched our crisis response plan and assigned a team of technicians with the sole task of resolving the issue as quickly as possible.
Thanks to transparent communication, Bonami minimised losses
Bonami initiated their own crisis plan: they instantly paused PPC campaigns, slowed down logistics in the warehouse, called customers whose deliveries could not be dispatched at the time, prepared official communication for social media, and waited until we gave them the green light.
Thanks to these actions, they minimised damage and saved thousands of euros.
6 gruelling hours, but a full recovery
Once we discovered the root cause of the outage (a specific storage technology was to blame), all that remained was to recover the data. But the total volume of data was 360 TB.
Recovering everything without loss was a demanding task. The last services were brought back online after nearly 6 hours.
During this entire time, we kept in close contact with customers, continuously passing along live updates from our technicians.
“Kudos from me. The whole situation was complex for everyone, but this is what well-managed crisis communication looks like!”
Martin Patočka, CTO Bonami
Want the same proactive, transparent approach and 24/7 care? Let us design a fully managed custom infrastructure for you.