Last year I was leading a security incident investigation where a client had their ERP system and core operating, accounting, and support systems affected by ransomware. The standard operating procedure is to ignore the bad actors, restore from the previous day’s backup, and go about normal business.
However, when the IT manager went to restore, the previous backup was incomplete. The previous month’s backup was also incomplete, and so was the version from the prior month. The IT Manager had to go back eight months to find a complete set of data and restore from that point. The company was ruined.
In another case, a bank contracted with a disaster recovery company to deliver a trailer with computers and a satellite internet connection should something catastrophic happen to their building. One Monday morning, the first employee to arrive discovered two inches of water on the floor from a second-floor bathroom toilet that overflowed sometime earlier in the weekend.
To make matters worse, the incident occurred above the server closet, and the dripping water shorted out all of the networking equipment. The trailer was delivered and the backup restore went into motion. As happened in the previous story: there had not been a good backup for more than three weeks. Data for all account transactions after that date were lost, and it took them several days and a near round-the-clock effort to manually restore all the critical information.
Do you see the common thread with each of these client stories? While everyone understood the importance of backing up the infrastructure and data and making sound business continuity plans, no one thought to test their systems. There was no effort made to ensure the company could recover to an acceptable business state following a disruption or disaster, and no one knew the RTO (Recovery Time Objectives) to return everything to operational status. Imagine yourself as their customer and not knowing when you could access your money again ‒ and neither does the bank!
When I was managing the IT department for a New York Hedge Fund, our goal was to test the backups on business-critical applications every month and perform a full system restore offsite annually.
How did that help? I won’t lie; the first several restoration exercises were ugly beyond the point of embarrassment. However, those steps made it possible to create a functional book of procedures to restore each piece of the network, a valuable resource for any business (or MSP) should a disaster occur.
We solidified the proper order for bringing back servers and services and learned exactly how long it takes to complete each step. We also discovered which data was truly important for the business to survive and to run various processes.
As IT professionals, we knew which 3rd party services we could rely on — and when their services and support let us down, we changed providers. Testing the systems and updating our documentation happened so frequently that all our anxiety went away, and the process was so simple that we could depend on our junior admins and interns to perform these steps flawlessly.
One word of advice: please, please do not stop after creating a plan. Test it completely. Update it regularly. Then test it again. A well-designed plan and system can truly save your client and their critical data after a natural disaster or malicious attack.
The good news is there is now an easier and more profitable way to help your clients avoid these DR-related problems. Check out the Adept Managed Continuity solution. Strengthen your recurring revenue opportunities and give your clients more peace of mind today!
Chris Jones, CEO