Bill Roth, Ulitzer Editor-at-Large

Bill Roth

Subscribe to Bill Roth: eMailAlertsEmail Alerts
Get Bill Roth via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Related Topics: VMware Journal, 4G Technology Journal

Blog Feed Post

Under the Covers: The ZFS Transaction Group Rollback "Hail Mary"

By John McLaughlin, Nexenta

"Uh-oh!" or "Oh No!" or 'Oh Sh*t!" -- sometimes customers do things they really should not have done and sometimes they realize very quickly that they just "shot themselves in the foot!" There is a small window of opportunity where Nexenta Professional Services can save such customers. To have any hope of recovery, the pool must be exported as soon as possible.

zfs feature 2 resized 600

For example, one customer resized an ISCSI zvol that was being used his VMware server. The customer's intent was to reclaim some unused space. 2Tb had been allocated to the zvol and the customer resized it to 1.4G -- oh no! - he meant 1.4T. The zvol was resized back to 2TB but his data was gone - seen as read errors on the VMware server.

ZFS writes changes to a pool in groups - transaction groups (txgs). As ZFS is a copy-on-write file system, data, including meta-data, is not overwritten. Instead, new blocks are allocated for the new data and new meta-data blocks point to the new data all the way up to the top-level “uberblock”.

ZFS maintains a list of the last 127 "uberblocks" and the current one. Each time a transaction group (txg) is committed to the pool, the oldest entry is replaced. These historical uberblocks provide a kind of temporary or transient snapshot, providing avery short window in which you can rollback to a specific state of a pool.

Many kinds of activity will update a pool and cause new txgs in pool, limiting how far back in time you can recover to. That's why it is important to export the pool as soon as possible.

A Nexena PS engineer can re-import your damaged pool and review the recent history, perhaps finding a transaction group before "the event" that caused the damage. The engineer can also pull information about the transactions groups from the disks in the pool, and maybe, just maybe, find a point in time to recover to.

If a rollback is performed, all changes make to the pool after the point of recovery will be lost. If the pool that was impacted by "The event" was taken off-line quickly, that may not be an issue, but are you be sure?

You've seen those basket ball plays when there's 3 seconds on the clock and the team is down by one and they have the ball at the other team's basket and a desparate throw is made across most of the length of the court? That's called a "Hail Mary" as is attempting a Transaction Group Rollback!

Read the original blog entry...

More Stories By Bill Roth

Bill Roth is a Silicon Valley veteran with over 20 years in the industry. He has played numerous product marketing, product management and engineering roles at companies like BEA, Sun, Morgan Stanley, and EBay Enterprise. He was recently named one of the World's 30 Most Influential Cloud Bloggers.