Tuesday, September 4, 2012

Traditional DR…and its Imminent Demise?

 

Backhoe Damaging Underground Lines

My primary focus at VMworld 2012 was Disaster Recovery, which caused me to think a fair amount about the future of DR in general—it’s necessity, utility, and longevity.  Have we really escaped “traditional” DR?  Will the methods employed today exist as we know them in 10 years, or just be another integral part of the infrastructure?

Each session invariably started off comparing the “traditional” disaster recovery of yesterday against the virtualization-enabled DR of today, where the old machinations are replaced with flipping a software switch.

With the exception of American National Bank’s and Varrow’s Active/Active datacenter (INF-BCO1883), I can’t help but see this as still being traditional DR—only with today’s tools.

Let’s take a look at some of the main points in “traditional” DR versus today’s:

  • Before virtualization, restoring to the same hardware used in production was a challenge. If it could not be met, time was wasted.  Virtualization gives us a common hardware set, eliminating those hardware compatibility woes. 

    While a valid point, what if you could always purchase bland hardware, generic x86 servers at Wal-Mart as a commodity, like that which virtualization presents to the OS?
  • Tapes could not always be restored and took too much time. Virtualization gets us to replication technologies that avoid tape.

    Excluding array-based replication, disk-to-disk-to-tape solutions with replication were pitched as disaster recovery aids 10 years ago, specifically to get around the problems of tape.
  • New systems/applications required new servers.

    You got me there. Virtualization wins, and I’m happy with that.

Same process, new tools—albeit faster, better, stronger tools. It’s still traditional DR to me.

Now enter Cloud-based DR.  With DR in the cloud, there is no need to lease or require that which remains idle. In essence, keep an off-site copy of your data—a Good Thing anyway—and pay for what you need when you need.  Disaster Recovery has now moved fully from capex to opex.  It’s cloud being used for what cloud is intended. 

But not at the application level.

In an application-centric world, everything behaves like the modern applications to which we’ve become accustomed.  We are blissfully unaware, yet fully appreciative of Facebook, Google, Twitter, and the like spanning multiple datacenters.  We aren’t exposed to datacenter failures that they may encounter and we shouldn’t be. Nor should your customers. 

Your line-of-business systems need to be heading this way, today, for it is the key to availability across datacenters and devices (EUC was a big push at VMworld this year).  They shouldn’t care any more about what datacenter they occupy than how many instances are deployed.

The pieces are there. We’re seeing the increased popularity of orchestration with the likes of Chef and Puppet (in no particular order). Infrastructure manipulation via APIs such as Amazon provides into their Elastic Load Balancer.  Data—big or otherwise—replication, and sharding is becoming commonplace.

The hold-outs are back office systems that won’t get where we need them to be soon enough, yet demonstrate significant movement in this direction when you consider Office 365 and the like.

Once achieved, is expanding from private to public cloud based on increased load any different than contracting from one to the other based on availability?

Private, public, or hybrid the cloud is an extension of your datacenter. It’s the elasticity of your workloads at web-scale, that need not be within one datacenter.  If well orchestrated, you have “simple” contractions of your cloud based on not only load, but availability.

I see the agile system encompassing multiple datacenters at any point in time, expanding and contracting as load and availability changes. This, will be the new DR—no DR.  Just a well-designed modern system.

What are your thoughts?

Takeaway: Developers need to be aware of infrastructure; this could be interesting.

1 comment:

Platypus said...

I think DR is an operative concept at many levels. What you seem to be talking about is DR at the compute level, moving compute resources elsewhere in the event of a disaster. Yeah, that should die, or at least be subsumed by load balancing. "Hey look, load increased at our Chicago site [because the Dulles site went down but the reason doesn't matter]. Let's spin up some more resources there."

At the storage level, DR is very much alive and well. One way or another, data has to be kept available - and consistent, and as current as possible - at multiple sites. As multi-site deployments get bigger, and more complicated, this becomes a more difficult problem and ever. As those same deployments get more common, the need for solutions also increases. Maybe we can get some of those people who are idled by the death of compute DR to help out on storage DR. ;)

Disclaimer: I work on GlusterFS, which plays somewhat in this space (and will a lot more if I have anything to say about it).

Post a Comment