Bill Roth, Ulitzer Editor-at-Large

Bill Roth

Subscribe to Bill Roth: eMailAlertsEmail Alerts
Get Bill Roth via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

What exactly is "Cloud Storage"?Everyone seems to agree that it is the greatest thing since sliced bread, but the messaging about exactly what is "Cloud Storage" and how it differs from conventional storage is slightly fuzzy. Once the hype on "Cloud Storage" is examined more closely it will appear less as a revolution than as a natural evolution of NAS storage, and an evolution that can co-exist and collaborate with NAS storage.

The term "Cloud Storage" is used in many different ways, generally describing the same set of solutions, but with crucial differences in emphasis.

Consider the definition from Wikipedia (retrieved from http://en.wikipedia.org/wiki/Cloud_storage on Feb 9, 2012):

Cloud storage is a model of networked online storage where data is stored on virtualized pools of storage which are generally hosted by third parties. Hosting companies operate large data centers; and people who require their data to be hosted buy or lease storage capacity from them and use it for their storage needs. The data center operators, in the background, virtualize the resources according to the requirements of the customer and expose them as storage pools, which the customers can themselves use to store files or data objects. Physically, the resource may span across multiple servers.

Cloud storage services may be accessed through a web service application programming interface (API), or through a Web-based user interface.

That definition encompasses three distinct differences that are cited as to what makes "Cloud Storage" different:

* Who controls the machines where the data is stored
** Cloud: some service provider.
** Non-Cloud: the data owner.
* Where the data resides.
** Cloud: at a central data center.
** Non-Cloud: on the data owner's premises.
* How the data is accessed:
** Cloud: web-oriented APIs.
** Non-Cloud: POSIX derived APIs.

But what is the real essence of Cloud Storage?

The first way to cut through the hype is to look at who is promoting these various definitions.

For example, Apple promotes iCloud which defines Cloud services as being something that applications access. Think about that, for some reason Apple doesn't think that the end user needs access to their music files, only the Apple supplied music player needs to access them. What, you have an application or device that plays music that wasn't supplied by Apple? What's wrong with you?

Similarly, some of the first proponents of Cloud Storage all have data centers and are all willing to lease you storage space there. Google, Yahoo, Amazon, et al have a bias toward perceiving "cloud storage" as being about lots of people paying them to use their storage.

The last criteria, how data is accessed,  looks essential. But event there the picture is slightly fuzzy.

Some classify any web-oriented access as being "Cloud".  Suppose that you encode NFS RPC calls  using XML and then send it over HTTP. Does that would qualify as a "Cloud" storage protocol? It strikes us as still being a NAS protocol, just a less efficient one.

Indeed, while RESTful HTTP interfaces are common this is not actually an optimum interface for storage purposes. But it was one familiar to the teams that designed the first versions of Cloud Storage. Threre are also some cloud storage projects that do not use HTTP, such as Sheepdog (http://www.osrg.net/sheepdog/). The real distinction on what makes something "Cloud Storage" is not how the messages are encoded, but what operations the messages convey

Cloud Storage Systems all use some form of Get/Put paradigm. The operations get or put some version of an object. The classic POSIX paradigm you create or open a file, then read or write and finally you close the file handle received in the create or open. When your application does not need true concurrent sharing of files under the POSIX model then a get/put paradigm is better, and it will scale to far larger sizes than the traditional POSIX file handle paradigm.

But it really takes more than a get/put orientation to make a object/file access protocol a "cloud storage" protocol. FTP operations are get and put. Now FTP is a valid protocol to access cloud storage, and frequently Cloud Storage providers allow this option. But FTP itself does not create any expectation of a virtualized location. Under the protocol an FTP client connects to a specific server and then gets or puts from that specific server.

The wikipedia definition includes a requirement that the data center virtualizes the location of the data. Nobody hails FTP as "the first cloud storage protocol", and the presumption that FTP gets and puts to a specific location is the major reason why. So the abstraction of location and replication is clearly a crucial aspect of cloud storage.

There is a clear technical reason as to why "Cloud Storage" requires get/put operationsl. True virtualization of location is not compatible with the POSIX paradigm. Having a file handle for a shared open file simply does not scale when the round trip times get long.

So the two essential elements of a Cloud Storage service are:

1. a get/put mode of operations and
2. the service taking responsibility for the location of the objects.

Interestingly, latter NAS protocols such as NFSv4 and DFS/CIFS already take responsibility for the location of objects. The NFS names for these features are Referrals and Delegations. Cloud Storage presentations present Cloud Storage as revolutionary, totally replacing ancient POSIX-bound storage systems. But if you examine the statements made about the prior art you'll realze that they are mostly talking about NFSv2 and NFSv3. The capabilities of NFSv4 are ignored. The ability of NFSv4 and DFS/CIFS to virtualize locations is a perfect example of how Cloud Storage is really an evolution from NAS storage.

Nexenta's philosophy is naturally aligned with Cloud Storage. Nexenta's solution from day one has been and will always remain: commodity hardware, enterprise grade features, open storage.

Coming installments will explore how NAS and Cloud storage are not as different as usually painted, how Nexenta's Namespace Cluster scales very well compared to the "pre-cloud" potrayals, why NAS and Cloud each solve problems for the user, why ZFS is as solid of a solution for Cloud Storage as it is for NAS, and a few ideas on how NexentaStor can incorporate Cloud Storage with NAS and SAN services.

Read the original blog entry...

More Stories By Bill Roth

Bill Roth is a Silicon Valley veteran with over 20 years in the industry. He has played numerous product marketing, product management and engineering roles at companies like BEA, Sun, Morgan Stanley, and EBay Enterprise. He was recently named one of the World's 30 Most Influential Cloud Bloggers.