Problems this morning
At approximately 4am PST, two separate database servers (db1 and db16) had RAID failures that caused file system corruption. They kept trying to process traffic but Linux had switched part of the file system to "read only", so no traffic data was actually being written to the hard drives. This problem lasted from approximately 4am to 7am PST. Unfortunately, this traffic data is gone and unrecoverable.
We have alert systems setup so that when a significant event occurs, such as a server going offline or a RAID failure, we are alerted immediately. Unfortunately, the RAID notifications on a few servers were recently disabled while we were performing some maintenance, and wouldn't you know it, db1 and db16 were among those servers. Because of this, we weren't notified of the problem, and didn't discover it until we woke up to a flood of emails in our inbox this morning.
There were no problems on other servers that we could find, but if you have a site on a server other than db1 or db16 and it's experiencing issues, please leave a comment here explaining what's happening. Be sure to include the site ID.
We apologize for this issue, which we take very seriously. The RAID notifications are all back online, and we will be sure to always re-enable them immediately after this kind of maintenance in the future. Leaving them disabled was just an honest mistake.
One final note, these RAID failures occurred at the exact same time on two different servers. This happened once before as well, although it was three servers instead of two, and it didn't cause any corruption last time. This seems like very strange behavior to us, and we're not sure what could possibly cause such a thing to happen to separate servers (that don't talk to each other) at the exact same time. If any sysadmins out there have any ideas, please share.
19 comments | Sep 02 2009 8:44am
We have alert systems setup so that when a significant event occurs, such as a server going offline or a RAID failure, we are alerted immediately. Unfortunately, the RAID notifications on a few servers were recently disabled while we were performing some maintenance, and wouldn't you know it, db1 and db16 were among those servers. Because of this, we weren't notified of the problem, and didn't discover it until we woke up to a flood of emails in our inbox this morning.
There were no problems on other servers that we could find, but if you have a site on a server other than db1 or db16 and it's experiencing issues, please leave a comment here explaining what's happening. Be sure to include the site ID.
We apologize for this issue, which we take very seriously. The RAID notifications are all back online, and we will be sure to always re-enable them immediately after this kind of maintenance in the future. Leaving them disabled was just an honest mistake.
One final note, these RAID failures occurred at the exact same time on two different servers. This happened once before as well, although it was three servers instead of two, and it didn't cause any corruption last time. This seems like very strange behavior to us, and we're not sure what could possibly cause such a thing to happen to separate servers (that don't talk to each other) at the exact same time. If any sysadmins out there have any ideas, please share.
19 comments | Sep 02 2009 8:44am

Recent Comments
Thanks for the awesome Google Chrome addon! This will help my site alot! http:// ... Ruari, Mar 21 2010 Well, all white label accounts are setup manually, including payment plans, so there ... Sean (Clicky), Mar 20 2010 It would be nice if this payment system could be used for white label partners. John, Mar 20 2010 Yet another nice feature - not that I need to upgrade anytime soon but nice to see ... Alex Smith, Mar 19 2010 This is a good post, I stumbled across your article while looking for song downloads. ... aion kinah, Mar 19 2010 You're in Amsterdam, the tracking code should load lightning fast. My tests from ... Sean (Clicky), Mar 18 2010 This round-trip thing is a big deal actually and it's the reason we'll probably have ... Sander, Mar 18 2010 Thank you again Clicky Admin! On demand invoice information is definitely excell ... Websight1, Mar 18 2010 Great pals ! I love the live feature and this extension is the perfect companion ... rcoilliot, Mar 16 2010 Ok works now...had to run it three times to get it registered. Spencer, Mar 15 2010