As an IT professional I spend my time advising others (and implementing) robust solutions for IT Architecture. It’s one thing to discuss and implement something (like backups), but it’s another thing to actually demonstrate what it can actually do. So here’s the review that finally came to fruition and it only took two years! This post is about realization of a use-case through thought and implementing a solution that actually saves your bacon (in this case, mine).
Like everyone who has ever owned any type of information processing device, it was a matter of time before a catastrophic failure happened. In my case, it was my laptop that finally decided to fail. Was it a hardware failure? (I should be so lucky), but it wasn’t. It was a reboot gone bad.
6 years ago, I decided to forgo the traditional workstation / desktop model and move to a laptop as my primary computer. After dealing with multiple computers in my environment and the management that came with the headache, I decided to streamline and move to a single machine infrastructure. I bought a Macbook Pro 17″ right before Apple decided to discontinue selling them. (In my personal opinion, it was a mistake, but who am I to judge? I don’t exactly use a computer like most people do.)
I specifically bought the laptop for a few reasons. It was the fastest laptop on the market, it had a 17″ screen, and it would be an ideal desktop replacement machine. Was I wrong about my choice? Absolutely not. One of the best investments I’ve ever made. The Macbook Pro was ideal as it had the one feature I desparately wanted; the ability to store vast amounts of data in a high performance device (The Promise R6 Disk Array). I realized early on that a laptop wouldn’t be sufficient to handle my data transfer and storage needs and Thunderbolt was the only solution I could turn to that was relatively compact and provided the performance capability I needed.
My laptop while decent wasn’t good enough for my needs (which I realized even at the time I purchased it), but upgrading the RAM to 16 GB and replacing the oem hard drive with a 512 GB SSD would give me the performance I needed.
My only concern was the need to backup my entire laptop in 1 hour or less assuming 80% storage utilization on the internal laptop. Not exactly a small order. You see, I value my time and in the event of a disaster, I need to be able to recover my environment in the event of a catastrophic failure in 1 hour or less. I need to be able to have all of my files available to me with data retention / reliability standards that provide me with the most recent copy of my data immediately upon recovery from the failure. I wanted near time backups. How do I define near time? Changes that have been made to my hard drives within 5 min. of the failure (which was inevitable). Did I accomplish it? Yes. Here’s how I did it.
There are two assumptions I made when developing the backup architecture:
- I can be in my home office (best case scenario) enabling a full restore of environment within 1 hour.
- I can be elsewhere in the world and my laptop needs to be replaced and data recovered (including applications) to a useful (limp-mode) state within 4 hours.
When in my home office, access to the Promise Disk Array allows for a full environmental restore within 1 hour.
When off-site, access to critical data and application infrastructure (OS environment) is necessary, but only within the confines of minimal toolset I need. (Application Suites, SSH keys and prompts, and documentation libraries [word, excel, project plans, visio]).
I may be offsite and my laptop is stolen or damaged beyond repair (like falling off a boat into the warm waters of the gulf-stream)
No one solution will accomplish my goal for complete environmental redundancy so I had to develop a strategy. Here’s what I came up with and I’d say it’s performed well.
I needed a solution that would allow me to image and constantly back-up my entire environment for replacement while giving me access to up-to-the-minute backups of my data. I was looking at a minimum of two prongs of attack. Here’s what I came up with.
Acronis True Image 2012 – Backup of entire SSD environment to the Promise R6 Disk Array with incrementals every night.
Dropbox – Backup of all Soft data (documents I’m working on) as well as assets (ssh keys).
Virtualbox on a separate laptop running a VM async’d to my existing environment that is imaged every night (automatically) and uploaded to both Google Drive and Dropbox.
So how does it work? Quite well actually.
In the event of any type of failure, I can restore my existing environment to a thunderbolt equipped laptop (that I can purchase from any Apple Store) on a whim OR I can purchase a cheap laptop or desktop from any big-box store and download the VM and Virtualbox for “limp mode” operations until I get home and repair my infrastructure.
Data I’ve been working on is updated via the dropbox client as soon as the new environment comes up. The key to this working efficiently is due to the fact that the image / VM is updated every night and uploaded to Dropbox. This leaves the majority of the “restore” lift to only the files I’ve been working on since the last backup. Even on a tethered or crappy wifi hotel connection, I can restore / be operational within an hour.
The Test (my disaster)
I rebooted my laptop into OSX for what should have been a quick video conferencing session, but all didn’t go too well. OSX booted up just fine, but when I tried to revert to my Bootcamp (windows) environment, windows had become corrupted. Was I worried? Not really. Was I annoyed? yes. You see, I had a flight the next day and not having my trusty laptop with a working windows environment was .. just annoying.
So, it’s 2:30 AM, my daughter is asleep upstairs, and my windows partition is completely corrupted. What to do?
Being that I was at home (lucky me), I just pulled out my USB drive with my Acronis recovery disk on it, restored a baseline Windows installation with all of the drivers installed. This allowed me to restore the partition allowing me access to the Thunderbolt array that held the most recent copy of the backup. Another 10 min. later, all the applications and data was restored and 15 min. after that, Dropbox had updated and restored the most current versions of my files.
Total downtime? 52 min.
Total data lost? nothing.
All vendors will claim their backup solutions will cover every contingency and that’s all you will ever need. Is this true? Absolutely not. If I had used Acronis only, I would have only recovered data up to my last backup effectively forcing me to lose a day’s work. If I had used Dropbox, it would have taken a very long time to recover all of my files, but would have still forced me to reinstall all of my applications and preferences.
The combination of the two technologies allowed me to restore my environment and data within 1 hour with no data loss. It would have taken an additional 10 minutes if I needed to change / replace the hard drive.
What if I had a critical deadline and I didn’t have access to the necessary hardware? I would have used Microsoft RDP to access the VM running on the other laptop to complete my deadline and then moved to working on the repair. Oh wait! My laptop is down. How would I RDP?
Covered.. I have a chromebook in the trunk of my car that I use for these types of emergencies (and for everything else). Bottom line.. I’m covered.
The point of this post? To think about IT strategy and use-cases vs. relying on any one Out of the Box solution to cover your bases. IT (Information Technology) isn’t about picking a tool, double-clicking on install and walking away, because your job is done. It’s about thinking through scenarios and coming up with solutions to your problems BEFORE they happen. This allows you to recover from your disaster(s) when they happen in an efficient and reliable manner.
So there you have it. Strategy, planning, and ingenuity wins again. Be smart people. Spend a couple of hours implementing a robust solution that suits your needs and you’ll be glad you did. In the case of data loss and catastrophic failure, it’s not a matter of IF it will happen, but a question of WHEN it will happen.