Some Good Q&A on Backup and Disaster Recovery

I am sometimes asked to provide reporters with information on stories they are writing.  Here is a series of questions from a reporter writing for a financial publication and my answers to him. After I wrote this I thought it served in its raw form as the basis for my blog.

1. Where should small businesses start with disaster recovery, whether or not they already have a DR plan in place? What is the first question the small business owner needs to ask?

I recommend starting with determining RTO and RPO.  If the small business owner starts here he or she will be off to a good start with the DR plan.  What are RTO and RPO?

•         RTO – Recovery Time Objective, the time between the disaster and when the system has been made operational again.  Why is this important?  Different businesses have different costs associated with downtime.  Others have certain commitments or expectations from clients with regards to availability and/or performance.  Others have regulatory commitments to customers and/or regulatory bodies.  These are the common factors that determine what length of time is appropriate for the RTO.  For some companies the requirement or expectation will be near real-time, i.e. an RTO of say 15 minutes.  For others it can perhaps be two days.  If we are looking at RTO for a consumer financial services firm or a hospital emergency/operating room system 15 minutes might be the number.  If we are considering a small farming operation perhaps two business days is fine.  The type o backup and disaster recovery solution for these two scenarios is very different.

•         RPO – Recovery Point Objective, the time between the latest backup and a system disaster, representing the nearest point from which the system can be recovered. Put another way, if your system fails, how much data does the business owner want to put back into the system?  Some systems are easy to determine what information needs to be re-input into the system and time is not very critical.  If a farmer has workers coming in from the field with clipboards of reporting data a one day RPO is probably fine.  The data can be re-input from the paper sheets and timeliness is probably not important.  Doing tape backups at night for a one business day RPO is probably fine.  If on the other hand the small business is doing 90% of its business online and from phone orders all directly input into the computer, any failure will mean a loss of all data since the last backup.  I would suggest that a maximum RPO of 15 minutes is appropriate, especially if we are talking about thousands of dollars per hour being at stake.  In this sort of environment a virtualized disk to disk backup with offsite replication is probably the solution.  This will have a much higher initial and on-going cost than a nightly backup to tape.

2. Is there a basic template for DR planning that small businesses should use? What are the basic questions or categories of information that need to be covered?

Yes, figuring out what I posted above keeps it simple and manageable for the SMB owner.  Just figure out the RTO and RPO.  They only need to ask one other question to answer the first two questions.  What is there SLA or Service Level Agreement?  Does the SMB owner have a promise made, written or unwritten, to customers, vendors, regulatory bodies and other stakeholders that must be met?  How long will those stakeholders tolerate an outage at the SMB?  Once that is determined the SMB owner can figure out what they need for an RTO and RPO and then use that as the measuring stick to see what backup and disaster recovery solutions meet their needs.

3. Many small businesses use online backup services to back up their data in the event of a disaster - how can small businesses decide which is the best data backup solution for them? What kind of support should they expect to get from a DR services provider?

Here are a few other things to consider.  These should get answered with some due diligence in the RTO and RPO examination of the considered solution.  If using an online backup solution, if it takes two days to fully seed the online backup system, in other words to copy the data up to the cloud, it stands to reason it will take two days to download it again after a disaster.  If the RTO is only four hours, two days to restore over the internet won’t work.  Does that DR vendor have some other solution to do that more quickly?  Additionally, what resources for initial setup and on-going support are available from the vendor?  My sister for example signed up for Mozy on her home system.  They provide very little support and she used the default set-up settings.  Those settings only backup drive C:.  She had 15+ years of her family photos, her kids from babies to adulthood on drive D:.  When the hard drive failed she lost it all.  It was a fast, seemingly easy and cheap service, but it cost her dearly.  

4. At what size of business (how many of employees, number of locations) does disaster recovery planning start to get more complicated? What are some insights/pitfalls to keep in mind?

Don’t worry about the size of the business.  They must worry about what type of work they do, the economic impact of an outage (RTO), and the value of the data that might be lost (RPO).  When they figure out what it costs per hour to be down, they will understand better what RTO they need and that becomes the benchmark for computing to investment they are willing to make into a backup and disaster recovery system. The same can be said for the value of the data that gets lost in the length of time in the RPO.  Two easy calculations can be used to figure out the costs per hour.  I like to use opportunity cost and operations cost.  To compute opportunity cost calculate out the revenue per business day for the company then divide that by the number of hours of operation per day.  If a company is typically open an average of 21 days per month and they do $210,000 in monthly revenue then the math is $210,000/21=$10,000 per day.  If they are open 10 hours then they have a $1,000 per hour opportunity cost per day.  If opportunity is compromised approximately 80% in an outage, then an hour outage costs $800.  If we look historically at their operating history and they have averaged 20 hours of downtime per year, then the outages are costing about $16,000 per year.  Now they have a budget to work with.  Some might prefer to use an operating cost rather than opportunity cost.  Here the firm needs to use similar math to figure out how much it costs per hour to keep the doors open when the system has failed.  It is important to note that SMBs that have a short time associated with their SLA can often not just lose revenue in an outage, but they lose customers or incur penalties from regulatory agencies and/or according to client agreements so that must be considered in the costs.

5. What about the financial planning/business insurance aspect of disaster recovery? What are some things that business owners can do now to prepare for disasters in the future, like paying for insurance, setting cash aside, etc. - are there specific business insurance policies that cover this?

There are usually some provisions in basic casualty insurance, but they are basic coverages and they may not fit the cyber needs of the business.  Cyber versions of insurance are usually price with terms and conditions that require the SMB to have decent backup and DR in place.  I have seen statistics that show that 50% of the businesses that incur a severe data loss are out of business within two years.  I like to be cautious about making wrong assumptions for statistics such as this.  Those businesses that had the severe data loss are not necessarily out of business because of the data loss.  It might have just hastened what was inevitable due to other inadequate business practices, such as one that would make a disastrous data loss more likely and costly.