Catastrophic Failures
by Andrew Macpherson on Sep.05, 2009, under Operations
Some companies live on top of their hardware. When there is a failure it;s just a walk down the corridor. For OA5 it takes a bit longer, which is why we rent space from a hosting centre which does have such well placed technical guys, but manage most things ourselves with remote screen switches, and remote power switches.
Even with all that a particular piece of hardware may well still decide to go permanently out to lunch, and there is no alternative but to go to the backups to build a replacement system. This is where a near-current image of the user area really helps. Given the near-current image the rsync protocol is very efficient at bringing it forward to current as of last backup.
This does no mean we have spare hardware images of our major servers only XEN virtual machines ticking over which can be brought on line with the addition of a few IP addresses, to work on until a new piece of hardware can be prepared