So today at work I faced one of the things you hope you never have, but you are sure you will some day when you least expect it. For example when your supervisor (IS Director) is supposed to be somewhere in north-central Iowa climbing a hill on a 21 speed, road bike. This morning when I was on my way to work, I got within 1 block and there was a street light out. I didn’t think much of it, there were stop signs 1 way so I wasn’t worried about getting hit, on with my business. I pull in to my parking lot and noticed that nobody is parked in our companies private lot. It would be very surprising if I beat everyone in. I noticed that most of the cars were parked just outside of the lot in the metered parking. Again, not a huge surprise, we seem to have alot of problems with our electronic gate. Usually the problem is someone drives through it to get prime time parking on the weekends. So this was a little odd, but oh well. I’m walking in and one of the maintenance guys is removing the arm from the gate and says “gonna be an exciting morning”. Again, I figure he’s just referring to a broken arm on the gate and having to work outside when it’s already humid and warming up. Then I walk into the building and the light for the first floor isn’t on for the elevator. Weird, but I remember 1 of the lights was burned out before, but I thought it was 2nd floor. Maybe it just burned out over night. Then I press the up button and the elevator doesn’t open. It hits me, “THE POWER WAS OUT”!
I mentioned my supervisor was out, so is my office mate, the 3rd guy in the IS Department. So that just leaves me. I raced up the stairs, the server room had been getting hot so we had a portable air conditioner in the server room (more like a closet). The room was open to keep cool, but there was not a sound. Everything was off, the lights were obviously all out, network and phones down, everything.
I hear there was a fire in a man hole in downtown/old market and that power has been out since 2 AM. I figured there wasn’t much I could do, but hope that the power came up soon. I texted my boss to see if he had heard. Turns out he was still in town and was on his way in. He was trying to put together some paper work for me to do some work today and was unable to get into the system from home. So he ran in to check on the cooling unit and discovered the problem a little before I had. So that was a relief. I hate to bring him in from vacation, but it took a little pressure off my shoulders trying to explain to everyone what all was down, how long and what could be done in the mean time. I was prepared for the questions, but still not my favorite situation.
I tethered my laptop to my phone and got a little done. Grabbed breakfast and then it was just a waiting game. Word was power could be out anywhere from 2 hours (12 noon) to 24 more. The most solid news we heard was that the area was still to hot to work in and repairs would probably start around noon. Still just a guessing game. The CFO and CEO decided to ask me how hard it would be to bring in generators and get atleast the server stack up and going. Which brings me to the main point of this post. What’s your disaster recovery plan?
Every company has one in place. When you think about it, it’s pretty obvious. What would you do in case of any possible disaster that could happen to your work place, or the information needed to work. Like most companies most of our server stack is in the building. We backup off site and have a pretty solid disaster recovery plan in place, should we know that the disaster will last atleast 1-2 days or more. It’s tough in this case when we didn’t know if it was going to last from 2 to 8 working hours. So I was asked how hard it would be to bring in a generator, I mentioned we needed to have the networking gear and atleast a few of the servers plugged in to accomplish what they wanted. My supervisor quickly came to the rescue again. He had ran power meters on the servers halfway recently and with the help of the building maintenance guy and his electrician, we were quickly able to decide what was needed and how long it would take to get the equipment there and working. The guesstimate was a little less than 3 hours and they were able to beat that by 30 minutes. Pretty quick turn around actually. The networking, phones and servers were up and running by 3 PM and we didn’t have to go into our full disaster recovery.
As for our full disaster recovery, we have a hot site setup that we can restore our servers to from backup. Then they set up 10 work stations imaged like our computers so that 10 people can be officed and fully working in 4-8 hours. The tough part of that equation comes when we have to change our DNS settings so that people on the outside can reach us. The only guarantee that we are given in that situation is 20 minutes to 48 hours. Since we have properties all over the US that access our systems though a web interface, it could cause a little more of a damper for them. The main goal is that the company has an answer for any possible thing that could happen. Today we learned how long it takes us to adapt and gives us real experience to know if we can handle it.
The other thing that faces many companies today is, if servers or the services they provide can be moved off site or hosted. We have been toying with this and many companies already have hosted email solutions and are starting to move other services into the cloud. I wonder if this will come up in the near future.
So what’s this mean to you? Do you have a “disaster recovery plan” in place for your home computer or information. What would you do if your hard drive crashed and you lost your family pictures? What if there was a fire in the house and you lost your photo albums? How far are you willing to go to make sure your information AND stuff is safe? I have a server that backs up the information from my home computers so that any 1 of them or all of them can crash and I don’t loose anything important. Then that server duplicates the data so that any 1 hard drive can crash. I hate to admit this, but if someone steals my server I’m currently out of luck. I’ve backed up most of the data to an external hard drive that I store in a fire safe box in my house, but really people should be saving this stuff to the cloud or somewhere offsite. I could talk forever about backup solutions, but the point is, make sure you have a plan.