Relying on Backups in a Disaster can be a CLM (Career Limiting Move)
A Case In Point
Not long ago I was talking with a long-term CIO of a large organization about Disaster Recovery. He proceeded to tell me they are all set, as their tapes are stored offsite. To him, that was all he needed to be concerned about as it related to DR.
When a fire broke out in the office next to their data center, I am certain that offsite tapes were the last thing on his mind. He learned a hard lesson about relying on backups though. Turns out that after the fire, they were able to physically relocate their entire office before IT was able to restore all their applications. Even more disturbing than that was the discovery that they had more than ten days’ worth of data loss due to old/bad tapes, skipped files, and incomplete backups. I would not have wanted to be him when he met with the COO in the aftermath and had to explain the situation and his lack of preparedness.
Why Relying on Backups Is a Problem
The term “Disaster Recovery” has long stood for the trust placed in archaic backup processes and tools, namely weekly and/or monthly full tape backups. The cry of offsite storage was the blanket answer to IT Disaster Recovery. No mention was ever made of recovery times, testing, or even data loss.
I wonder how many IT executives have a real level of confidence today that they can ensure their organization will have zero data loss or little to no recovery time in the event of a disruption or disaster. The odds are stacked against them as most disaster recovery software and systems lack the ability to test frequently. Compounding the situation is the fact that IT departments share more of a load when it comes to disaster recovery. Virtualization, constant and long backup sessions, and the time and storage space needed for the restoration all but eliminate the opportunity for accurate Recovery Point Objective (RPO) testing.
The belief that storing tapes offsite or cloud backup solutions will ensure zero, or close to zero data loss in the (likely?) event of a an unplanned incident can quickly become a Career Limiting Move. The unfortunate reality is that while backup is generally inexpensive (comparatively speaking), it does not ensure rapid and accurate recovery when a disaster occurs. The baseline understanding is that backups, regardless of how and where they are stored, will have available and accessible data…. but how quickly?
Not fast enough.This means that after an “incident”, an organization can (and usually will) have a considerable delay in recovering systems and getting up and running trying to rebuild and restore applications and data so that operations can resume. The time the business needs to become operational (Recovery Time Objective, or RTO) and the time that IT is capable (Recovery Time Capability /RTC) of restoring the data to 100% can differ greatly.
This challenge to a successful enterprise level disaster recovery solution may seem insurmountable, but it can be met by updating the backup technologies and integrating them into your virtualization strategy.
The goal for a virtualized enterprise level DR strategy is to have verified RPO’s and RTO’s in minutes, with scheduled table top exercises for mission critical applications. Protect the application spanning multiple servers, VMs, as well as storage systems with a single policy as well as a single automated failover / failback process. Storing the DR information for hours ensures multiple recovery points.
By virtualizing your DR solution, you are ensured the replicated applications are protected after the failover by replication to the source.
Replication needs to be followed by execution. Error free automation is essential to the DR process to ensure accurate RTOs. Table-top, or non-disruptive exercises are an excellent way to reduce errors and refine RTC and RPOs. Non-disruptive testing can be done without impacting the applications and data so that business departments are not affected in any way. Involving them in, or informing them of the testing also provides IT with an opportunity to show awareness to the business units of technology and the ability to provide world class recovery.
Upon completion of testing, organizations can see that failover / failback meets the RTO of the business area. One of the clients I was working with used the test environment, tightly orchestrated with their DR / BCP software to manage and test the replacement of their mission-critical middleware. IT was not only able to utilize their run books and manage to the RTO, business units “impacted” were also automatically notified and updated on progress.
The Bottom Line
Bottom line is that relying on and using a backup solution alone to meet business recovery objectives will not work. The datacenter environment has matured and changed, and a backup solution alone cannot possibly meet the needs of a modern day virtualization infrastructure. Backups are just not a complete information and application recovery solution. Creating an all-encompassing virtualized DR program, supported with comprehensive documentation, business validated RPOs and verified RTCs prevents you from making this career limiting move and helps ensure longevity to your career.