January 15, 2005
LiveJournal Down But Good
A power failure at one of Internap's data centers has sent LiveJournal sprawling on the floor. LJ's admins are finding out the hard way that the name of Usenet newsgroup alt.sysadmin.recovery isn't about twelve-step programs, but instead is a job description:
Update #3: 2:42 am: We're starting to get tired, but all the hard stuff is done at least. Unfortunately a couple machines had lying hardware that didn't commit to disk when asked, so InnoDB's durability wasn't so durable (though no fault of InnoDB). We restored those machines from a recent backup and are replaying the binlogs (database changes) from the point of backup to present. That will take a couple hours to run. We'll also be replacing that hardware very shortly, or at least seeing if we can find/fix the reason it misbehaved. The four of us have been at this almost 12 hours, so we're going to take a bit of a break while the binlogs replay... Again, our apologies for the downtime. This has definitely been an experience.
It is through system failures and crashes that sysadmins show their true mettle. They have comparatively little to do between them –which explains a lot about the nature of Usenet, come to think of it.
Posted by abostick at January 15, 2005 08:38 AMComments
