RAID1 increases chances of disk failure


fact

When I was first told this, my initial response was “rubbish“, there was no way that using RAID1 (mirroring) could increase the chance of disk failure, was there?

Then I actually thought about the statement again, and I was wrong!

Consider 2 hard disks A and B. The make/model/manufacturer is irrelevant for this proof.

Given that P(e) is the probability of event e occurring, we can state that:

P(disk A fails) = a

P(disk B fails) = b

Trivially, in a single disk system using either disk A or disk B, the probability of disk failure will be a or b, or for ease of demonstration we shall pick the average of the two.

Therefore, the average chance of a single disk machine suffering disk failure is (a+b)/2.

Now, consider the RAID1 system containing disks A and B.

The chance of a disk failing is the probability of disk A failing or the probability of disk B failing; so:

P(disk A fails or disk B fails) = P(disk A fails) + P(disk B fails) = a + b.

In fact, the chance of disk failure in a RAID1 system doubles. Hardly surprising if you stop and think about it.

Now, the statement I initially inferred actually was: RAID1 increases chances of data loss, which is obviously rubbish, as can be easily shown.

We know the chance of disk failure (and in this case data loss) with a single disk, (a+b)/2.

Now, using RAID1, the chance of data loss is defined as:

The probability of both disk A and disk B failing.

This is defined as:

P(disk A failing and disk B failing) = P(disk A failing) * P(disk B failing) = a * b.

It can be stated that any probability must be in the range 0 <= P(e) <= 1.

For any two numbers i,j; if both i and j satisfy 0 <= i <= 1; 0 <= j <= 1 then:

i * j < i; i * j < j

Therefore, the probability of data loss is lower when using RAID1, however the chance of disk failure doubles!

2 Responses to “RAID1 increases chances of disk failure”

  1. Sean Says:

    Actually, Andy, I think there are several misunderstandings here!
    My comparison was between a single disk and a RAID 1 system.
    The data loss probability for a single disk is the same as the rate of failure – call it P(disk).
    The data loss probability for a RAID 1 system is the sum of the failure probabilities of both disks, and the probabilities of one disk failing and the RAID controller failing to reconcile, plus the probability of the RAID controller failing catastrophically. That’s P(disk) ^ 2 + 2 * P(disk) + 2 * P(controller, reconcile) + P(controller, catastrophic). And that’s a very primitive model. You’d also have to factor in the false sense of security RAID creates.
    It’s also worth pointing out that I was talking specifically about consumer level systems, not professional, though I did not make that explicit.
    So, we’ve established that the chance of disk failure is greater with RAID 1, and that the chance of data loss if higher if P(disk) ^ 2 + 2 * P(disk) + 2 * P(controller, reconcile) + P(controller, catastrophic) > P(disk), which I can’t be bothered calculating since I am drunk.

  2. Sean Says:

    Math error. The long equation should be:
    P(disk) ^ 2 + 2 * P(disk) * P(controller, reconcile) + P(controller, catastrophic)

Leave a Reply