Welcome!

Bill Roth, Ulitzer Editor-at-Large

Bill Roth

Subscribe to Bill Roth: eMailAlertsEmail Alerts
Get Bill Roth via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


I have been recently reading about Amazon’s Storage Gateway:

http://aws.typepad.com/aws/2012/01/the-aws-storage-gateway-integrate-your-existing-on-premises-applications-with-aws-cloud-storage.html

The product is effectively an iSCSI target that has the Amazon S3 client on its back, to backup iSCSI LUNs to the local Amazon S3 storage. This arrangement has a number of immediate problems.

In general, disks are used either for file storage or for databases. In either case, there’s implicit or explicit transactional semantics that must be preserved when backing up storage. For raw files, this often is an open-to-close semantics whereby an open-for-write file is considered not to be “consistent” or even “visible” to other users until it is actually closed. For databases, update transactions have of course exact definition and scope.

Problem is, on the block target (low) level - there is no context and no information of those higher level transactions. It is, therefore, easy to imagine that Cloud Storage on the backend is being busily populated with LUN images that are internally inconsistent and cannot be used to restore the data.

When I said “being busily populated” above – I meant it. The other problem with AWS Storage Gateway can be described this way: disks are generally never idling - applications quite often are.

Operating systems and other applications that use block storage generate a lot of temporary and transitory content. File content and databases gets re-indexed, software reinstalled or upgraded, YouTube videos and other Internet junk uploaded into temporary folders—and so on.

However, at the block storage level, the information on the relative value of all this activity is totally missing. The only thing that a block target “sees” is plenty of new blocks that need to be snapshot-ed and backed up. Which it then is (yes, busily) executing. Garbage in, garbage out.

Finally, there is an iSCSI connection. I love iSCSI with all my heart but – the latency! The protocol is not very famous for its latency, let’s put it this way. And so, as a user, to take advantage local Cloud backup in the Amazon implementation, I’d now have to go iSCSI. One word of advice then – stress test it first,  really well.

Read the original blog entry...

More Stories By Bill Roth

Bill Roth is a Silicon Valley veteran with over 20 years in the industry. He has played numerous product marketing, product management and engineering roles at companies like BEA, Sun, Morgan Stanley, and EBay Enterprise. He was recently named one of the World's 30 Most Influential Cloud Bloggers.