[CLUE-Tech] RAID-5

Adam Bultman adamb at glaven.org
Sun Oct 26 08:42:38 MST 2003


> Adam Bultman <adamb at glaven.org> wrote:
> 
> > With a RAID 5, when you get a disk failure, your RAID runs fine - although
> > it is degraded in both speed and reliability.  The SCSI drives themselves
> > know they are having problems, and can report failures.  
> 
> How do they do that?  Does something go in /var/log/messages?
> 

That's where my Mylex DAC960 puts it.  The card and the kernel chatter 
with each other, and the syslog puts it in there.

> 
> I'm hung up on the math of this.  I have three 9G drives, one on each
> physical channel.  If they were all data there would be 27G.  `dmesg` shows:
> 
> Oct 25 17:13:40 icicle kernel: scsi0: scanning virtual channel 0 for logical drives.
> Oct 25 17:13:40 icicle kernel:   Vendor: MegaRAID  Model: LD0 RAID5 17364R  Rev:   A
> Oct 25 17:13:40 icicle kernel:   Type:   Direct-Access                      ANSI SCSI revision: 02
> Oct 25 17:13:40 icicle kernel: scsi0: scanning physical channel 0 for devices.
> Oct 25 17:13:40 icicle kernel: scsi0: scanning physical channel 1 for devices.
> Oct 25 17:13:40 icicle kernel: scsi0: scanning physical channel 2 for devices.
> Oct 25 17:13:40 icicle kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
> Oct 25 17:13:40 icicle kernel: SCSI device sda: 35561472 512-byte hdwr sectors (18207 MB)
> 
> And with the filesystem:
> /dev/sda1             17646824   3098096  14548728  18% /
> 
> So two-thirds of the total disk capacity (18G) holds data and the other third is being used for 
> redundancy.  It seems that the one-third being used for redundancy can 
correct only an equal
> amount of bad data on the other two-thirds of the capacity, or about half of the 18G.

The stripes on a RAID go like this (with three drives, say) for a given 
block of data: (say, a document)

Disk1		Disk2		Disk3

data		data		parity
data		parity		data
parity		data		data
.		.		.
.		.		.
.		.		.

Of course, this is an extreme simplification. The striping is much more
fine grained, but it's the same idea:  You spread the data and parity over
all the drives in the RAID so that if one dies, the rest of the drives
have enough information to rebuild the information on the lost drive.  
I'm sure there's ANDing and XORing of data and involved, but I'm not
really a whiz with how that works. (The RAID will be working at the bit
level.)  But it could take the existing data and parity to reconstruct the
missing data, or the data to make the parity.  You 'lose' one drive's
capacity because it uses the equivalent of one drive for parity (and
therefore, you get N-1 capacity (9 GB * (3-1)), or 18 GB. It's safer since
it spreads it all out.


> 
> I tried something similar. I shut down the system, took out one of the
> three drives and put an identical one in and rebooted.  It noticed that
> the drive was changed when I booted, but it did not recover.  It gave me
> two choices: proceed or go into diagnostic mode.  First I did diagnostic
> mode and found nothing useful.  I rebooted and selected 'proceed' and it
> could not boot.  It just sat quietly, with no apparent disk activity
> for five minutes or so.

That's going to depend more on your RAID card than the ability of RAID-5.   
A bmw's likely to have more bells and whistles than a toyota tercel.  In 
my case with last weekend, the card was smart enough to know that 
something was wrong (and told me), and when it noticed a new drive, it 
rebuilt accordingly.    With a previous job, I had a RAID drive die in an 
adaptec 2100s based raid - the card knew enough that a drive was dead, but 
it couldn't tell from software which drive had died.  I had to boot into 
firmware to find out which drive was dead.  Then, I had to shut down, 
replace the drive, then go back into firmware and have it rebuild the 
RAID.  Unfortunately, the adaptec sucks, and it took several 'tries' to 
rebuild the RAID, with the final successful rebuild takin about 64 hours 
(And that card has a LOUD, LOUD, LOUD buzzer.  The adaptec support guy 
recommended putting tape over the speaker).


If your RAID card is a card that doesn't mesh well with linux, or if the 
card is kindacheap (I don't know much about MegaRAID) it might have fewer 
capacities than another card.  

I can't remember what the original purpose of the RAID is here, but if
it's a personal system, it might pay off to just use the card as a SCSI
card instead of working with a RAID (be it 1 or 5)  that might be VERY
difficult to rebuild in the case of a failure.  I've learned my lesson 
with it, that's for sure.  

> 
> Still, I can see how two drives keeps 1/2 of the total data backed-up
> but three drives keeping 2/3 of the data backed up seems like something
> for nothing.
>  

Never consider any data on a running hard drive as 'backed up'. Think of
it as a copy.  I feel safe when my data is on a tape. reason being:  I had
a SCSI card **fry** 3 hard drives before.  2 9 GB and a 2 GB. Poof!  
Right up th chain.  Everything was connected and terminated correctly;
SIIG (Don't buy anything from them) said it was my problem, but in the
end, they said it was their card's fault, then promptly hung up the phone.  
The 2 9 GB were brand new. The 2 GB was old, but in perfect condition.

> I'd sure like to watch (demo at school) this automagic repair in action.
> Should I be able to power down cleanly, remove a drive, put in a physically
> identical drive and have it recreate the drive I removed?
> 

I'm not in denver;  if I was, and I had a box where I could kill the RAID, 
I'd certainly show you. Alas, I'm in Grand Rapids, MI, and am unable to.  
The moneyh's on finding someone with a server with a RAID with a backplane 
(allows you to hot-swap drives) that they aren't using in production so 
they can remove a drive, swap it (or take it and wipe the data on it in 
another machine) and put it back in.   IF it's a windows box, you can 
usually find software where you can watch it. IF it's linux, it might have 
smart enough drivesr to output to the console what it's doing ( as is the 
case with the mylex ).  Then, there are some boxes that just do it without 
creating any output (I had a dell who's drive died died a few years back, 
I put in a new drive and it silently rebuilt without creating any chatter 
- that was an NT4 box)

Again, hope this helps.

Adam



More information about the clue-tech mailing list