The Petabyte Challenge…

June 16, 2009 at 8:53 pm 1 comment

Managing large amounts of data is very challenging technical endeavor, but for many cloud storage providers the volume of data they are tasked with protecting and maintaining is so large that it presents unique challenges.

One of those particularly daunting issues is “silent data corruption.” This topic is discussed by Zetta CTO Jeff Whitehead in a recent blog entry. Whitehead’s excellent description of the problem and how to analyze it includes a calculator to help estimate the probability of random disk failures – this should be required reading for any system administrator or architect of a cloud storage solution. If you’ve built one and this is news to you – you (and your customers) are in trouble…

We included a large excerpt below (Jeff: let us know if you would prefer we take it down):

IT professionals are well aware of many challenges related to scaling storage: capital required to house data, manage backups, data center space, power and cooling. One area many IT professionals haven’t had time to look at, however, is how increasing data footprints translate into increased risk of data loss or data corruption. To put this in context, IDC recently reported that data volumes will increase by a “factor of almost five,” while “total IT budgets worldwide will only grow by a factor of 1.2 and IT staff by a factor of 1.1.” In this context of constraints, being asked to do more with less, without special attention to data risk management, risk inevitably increases.

I believe that many IT professionals and CIO’s will be very surprised to see that while Data Loss (ie, simultaneous drive failures) may not be very probable, Data Corruption (the data on disk is no longer what was originally written out by the application) is shockingly likely, and has caused outages for even some of the most technologically advanced high end environments.

The objective of this blog is to introduce or reintroduce the concept of “Mean Time To Data Loss (MTTDL),” whereby IT professionals, CIOs, and risk managers can create a probabilistic model for evaluating the reliability and probability of data loss for your current environment, and also compare and contrast with how Zetta is advancing the state of the art for cost effective data protection.

MTTDL is a tool, and to be effective one must understand its limitations. The inputs to the model are as follows:

The number of hard drives (data set size/system performance)
The reliability of each hard drive
The probability of reading a given hard drive correctly without error (see prior blog about silent data corruption)
The redundancy encoding of the system
The rebuild rate.
Mean Time to Data Loss is in many respects a best case scenario, because it ignores risks to data integrity such as fire, natural disaster, human error, and other common causes of storage failures. It also ignores autocorrelation¸ or drives failing at the same time due to similar workload, similar manufacturing batches, firmware issues, or the like. Despite these limitations, MTTDL is still one of the better tools for evaluating the data protection features of a storage system.

Entry filed under: Uncategorized. Tags: , , .

Update on Carbonite Incident – More Info From CEO… NBC (Network Backup Corp) Brings Cloud Backup Patent Suit

1 Comment Add your own

  • 1. Jeff Whitehead  |  June 17, 2009 at 1:13 am

    Thanks for the write up! I’ve been following your blog for a while, was happy to see the trackback.

    Thanks,

    JW

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


Recent Posts

RSS News about cloud-based backup

  • Happy safe holiday Internet shopping December 4, 2016

%d bloggers like this: