[CLUE-Tech] RAID-5
Roger Frank
rfrank at rfrank.net
Sun Oct 26 05:50:40 MST 2003
On Sat, 25 Oct 2003 19:36:14 -0400 (EDT)
Adam Bultman <adamb at glaven.org> wrote:
> With a RAID 5, when you get a disk failure, your RAID runs fine - although
> it is degraded in both speed and reliability. The SCSI drives themselves
> know they are having problems, and can report failures.
How do they do that? Does something go in /var/log/messages?
> Since you have a
> RAID-5, each drive contains enough parity information to lose one drive
> and still 'know' what the data is, using it's brothers - it uses the rest
> of the data to reconstruct the missing information. The card will do it
> on the fly, which is why you'll be operating slower than normal.
I'm hung up on the math of this. I have three 9G drives, one on each
physical channel. If they were all data there would be 27G. `dmesg` shows:
Oct 25 17:13:40 icicle kernel: scsi0: scanning virtual channel 0 for logical drives.
Oct 25 17:13:40 icicle kernel: Vendor: MegaRAID Model: LD0 RAID5 17364R Rev: A
Oct 25 17:13:40 icicle kernel: Type: Direct-Access ANSI SCSI revision: 02
Oct 25 17:13:40 icicle kernel: scsi0: scanning physical channel 0 for devices.
Oct 25 17:13:40 icicle kernel: scsi0: scanning physical channel 1 for devices.
Oct 25 17:13:40 icicle kernel: scsi0: scanning physical channel 2 for devices.
Oct 25 17:13:40 icicle kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Oct 25 17:13:40 icicle kernel: SCSI device sda: 35561472 512-byte hdwr sectors (18207 MB)
And with the filesystem:
/dev/sda1 17646824 3098096 14548728 18% /
So two-thirds of the total disk capacity (18G) holds data and the other third is being used for
redundancy. It seems that the one-third being used for redundancy can correct only an equal
amount of bad data on the other two-thirds of the capacity, or about half of the 18G.
> I recently had a hard drive fail in a VA linux machine (last weekend,
> actually.) The SCSI card reported it as bad, and it ran for most of the
> week with 3 of 4 drives - and ran well enough so that no one noticed any
> performance problems. I ordered a drive (NOT at my leisure - a RAID
> missing two drives cannot reconstruct information), popped it in, and the
> RAID card noticed first that the dead drive was removed, and then that a
> new drive was added. It then proceeded to use the information from the 3
> good drives to reconstruct the data for the new drive. 10 minutes later,
> I'm back in business.
I tried something similar. I shut down the system, took out one of the
three drives and put an identical one in and rebooted. It noticed that
the drive was changed when I booted, but it did not recover. It gave me
two choices: proceed or go into diagnostic mode. First I did diagnostic
mode and found nothing useful. I rebooted and selected 'proceed' and it
could not boot. It just sat quietly, with no apparent disk activity
for five minutes or so.
> > So maybe I should use RAID-1? That is with mirroring and
> > duplexing. But even there if I get a disk starting to fail,
> > the controller won't know who to believe, just that the two
> > disks disagree.
> >
>
> Mirroring is good, too. It requires fewer drives, but at a penalty of
> capacity. Instead of n-1 space, you have 1/2 the space there. And if a
> drive DOES die, the controller will know which one failed - usually
> controllers can. And again, the SCSI drives are smart enough to know that
> they are having problems, too. So the card will notice a problem,
> identify it, and in all liklihood, tell you the problem (i.e. ID 0
> failure). If you have a good enough RAID card, if you have a failed
> drive, you can put in a new drive, and it will recognize the new drive and
> rebuild mirror information automagically. Some of the servers I manage
> have mirrors in them. Takes a bit longer to boot after a crash, but it's
> nice to know that the data won't be lost unless my RAID card goes haywire
> and fries the drive.
Maybe buried in there is the key: "the SCSI drives are smart enough to
know they are having problems." If I have a mirror (RAID 1) I can see that
there is another copy of every data byte and the 50% capacity makes sense.
The controller I have allows RAID 0, 1 and 5 and it seems if you have three
drives you would always use RAID 5; two drives would use RAID 1.
Still, I can see how two drives keeps 1/2 of the total data backed-up
but three drives keeping 2/3 of the data backed up seems like something
for nothing.
> > Also, if the controller is fixing things on one disk, how can
> > I find out and perhaps replace the degrading disk? Is there
> > a raidtools that knows about hardware controllers?
>
> Again, if your controller is smart enough, it'll do it itself. The money
> is on having a smart enough RAID card to do that for you without your
> intervention.
But if a drive is getting flakey, I'd sure like to know. I don't want it
to be completely silent about it. I'd rather replace a drive that's giving
a few errors a day than wait until it's completely gone.
> When it comes to RAID cards, you really DO get what you pay
> for. I'd strongly recommend against the cheaper RAID cards (adaptec
> 2100s, DPT Decade, etc) because you PAY for it when you lose a drive (I
> spent 64 hours rebuilding a RAID with a 2100s a year or so ago, and that
> was in firmware - It could not do it in linux OR windows).
I'd sure like to watch (demo at school) this automagic repair in action.
Should I be able to power down cleanly, remove a drive, put in a physically
identical drive and have it recreate the drive I removed?
Adam, thanks for your feedback on this.
--
Roger Frank rfrank at rfrank.net
http://www.rfrank.net Ponderosa High School, Parker, Colorado
More information about the clue-tech
mailing list