[clue-talk] BAARF - Battle Against Any Raid Five (er, Four, er Free.)

Sun Oct 21 18:51:03 MDT 2007

I really, really apologize but I'm too (lazy, tired) to reply inline
point-by-point -- but some thoughts:

The reason multiple standards for disk redundancy exist? Choice. Any
argument that RAID5 is inherently bad in all (or even most) situations
is flawed. There are a LOT of situations in which RAID5 is
inappropriate, and a LOT of situations in which it's misused. But when
you need storage over speed with some level of redundancy it's the way
to go.

Same for RAID6 (I remember when that used to be called RAID5DP -- RAID5,
Double Parity).

On the other side, RAID1, RAID0+1, and RAID1+0 -are- fast relative to
RAID5 but they all share the same problem -- they're expensive. You have
to buy at a minimum two disks for every one you'd buy for native write,
and that's before we worry about hot spares, etc.

Generally, if a RAID5 solution is properly engineered, the write
performance hit is at least somewhat negated by a large about of fast
cache in front of the disk. These days that's usually in the range of
gigabytes when dealing with expensive FibreChannel storage (EMC, NetApp,
Hitachi, 3Par, et al); I've seen as high as 128 GB of battery-backed
DRAM in front of the spindles in some solutions -- the host sees the
write as committed when it gets to the DRAM, and the array worries about
getting it to the spindles; the battery comes into play in case the
power goes out before DRAM is emptied. There are expensive RAID5
solutions on the market with performance within 5-10% the speed of
writing natively to the spindle.

And then there's the RAID5 hardware controllers built into PC's within
the means of folks like us. IMO those are generally god-awful, and
probably why RAID5 has earned such a bad name. In many cases Linux's md
software RAID facility is actually -faster- than the hardware RAID.

Bottom line is, don't assume RAID5 is evil; depends on the need and the
hardware and software driving it. If you need to deploy a storage
solution, spend some time playing around with tools like iozone to see
what makes the most sense for your needs -- configure a couple ways and
do performance testing that mimics the block sizes your application (or
filesystem) will be using.

dap

-- 
Dan Poler, RHCE
Senior Consultant
Red Hat, Inc.
E-Mail: dpoler at redhat.com

On Sun, 2007-10-21 at 12:10 -0600, Nate Duehr wrote:
> On Oct 21, 2007, at 7:55 AM, Angelo Bertolli wrote:
> 
> > On Fri, October 19, 2007 10:50 pm, Jed S. Baer wrote:
> >> On Fri, 19 Oct 2007 19:44:28 -0600
> >> Nate Duehr wrote:
> >>
> >>> Good technical detail about why RAID-5 isn't always (in fact rarely)
> >>> the correct technical solution for disk redundancy...
> >>>
> >>> http://www.miracleas.com/BAARF/BAARF2.html
> >>
> >> I remember reading Cary Millsap's articles way back when. But I  
> >> have to
> >> wonder, if RAID5 is so bad, why is it still so popular?
> >
> >
> > Because... it works.  I've started reading the site, and maybe it's
> > supposed to be satirical.
> 
> I don't think it is.  The "executive summary" of the articles as a  
> whole seems to be (to me anyway), "Instead of RAID 5, use a  
> combination of RAID 0 (striping) to get the size you need and then  
> lay RAID 1 over the top of it and mirror it to another drive or set  
> of drives."  (They also talk about going the opposite direction, RAID  
> 1's with RAID 0 laid over the top.)
> 
> The one article has some interesting failure analysis in it (RAID 5  
> is three times more likely to fail than RAID 01/10" but he doesn't  
> cite his source.  I've seen other work done on it, and the math  
> actually works out that way... he's "right" but the site does a poor  
> job of documenting/proving it.
> 
> > Actually we use RAID6 where I work when we can,
> > and RAID5 when we cannot.  Although RAID6 isn't one of the original
> > standards, it's become a de-facto standandard meaning that you  
> > essentially
> > have double parity using Solomon codes.
> 
> Yeah, that "standard" is becoming more popular and avoids the "loss  
> of more than one disk kills the entire RAID 5 array" argument.  It  
> doesn't address the huge performance hit that's taken on writes,  
> however.
> 
> > Now, keep in mind when you're reading these arguments that you're not
> > SUPPOSED to take all of your drives and make one huge RAID5.   
> > What's the
> > point of only having the capability of failing only ONE drive?  But  
> > like I
> > said:  it works because people determine how many drives can at one  
> > time
> > can fail and build their RAID5's accordingly.  It's really  
> > unnecessary to
> > make everything a RAID1:  one out of two drives do NOT fail before  
> > they
> > can be replaced.
> 
> I've seen RAID 5's fail because more than one disk died, but they  
> were systems that were not monitored correctly or at all.
> 
> I have no worries that people using RAID 5 for production systems ($$ 
> $) *are* going to monitor their physical disk states correctly, or  
> they WILL after they screw up.  (Pain is a great motivator.)
> 
> The more interesting topic embedded in those articles is that many of  
> them were written many years ago, when RAID 5 first got "popular" and  
> the articles kinda hint at a more interesting dynamic that occurred.   
> Instead of engineers THINKING about whether or not RAID 5 was the  
> "correct" solution, when it comes to write speed, people just plowed  
> ahead and forced the hardware manufacturers of all modern disk sub- 
> systems to add fast cache (memory) to their controllers to make up  
> for the hit that was being taken on write performance for RAID 5.
> 
> One could argue that now that hardware-based systems all typically  
> have that feature/functionality/"solution" that the argument the site  
> makes is somewhat moot.  But...
> 
> Many Linux admins are doing all of this via md (software) and not in  
> hardware.  So it probably behooves us to seriously look at whether  
> the performance hit on writes is worth the RAID 5 "goodness" or if a  
> RAID 10 setup would both perform better and have a slightly lower  
> real-world failure rate.
> 
> (Honestly I have no opinion -- just stating some ideas for  
> discussion.  Most of the systems I work on are in a RAID 0+1 or 1+0  
> type of configuration these days, though -- just as an anecdote.  Or  
> they're using a SAN where all the disk management is "offloaded" to  
> something more "intelligent" than just a RAID setup... hot-spare  
> disks automatically used, etc.  I'm interested in what others are  
> doing, just out of curiosity.)
> 
> > Also, when you get into RAID hardware, you can set up a drive to be a
> > global spare.  Therefore, even with RAID5 you can be sure that if  
> > no one
> > is there to replace a drive, you'll have some time before it's time  
> > to get
> > another one.
> 
> You can set up global spares on MOST RAID systems.  Older stuff  
> didn't have it, sadly.  It still doesn't alleviate the need to  
> properly monitor the hardware, though.
> 
> (And you still have to monitor the hardware even in RAID 0+1/1+0  
> setups too, no doubt about that.)
> 
> > We also do use RAID1, but that's only in the case on individual  
> > machines
> > where we don't need huge amounts of storage.  The articles on the site
> > seem to make a case for RAID5 for Oracle databases.  There may be a  
> > case
> > for that, but the idea that everyone should just be using RAID1  
> > instead of
> > RAID5 is pretty silly.
> 
> I'm not sure that technically it's "silly" -- they just don't state  
> their case very clearly or concisely with citations for the math  
> involved in the failure risk analysis.  I know I've seen it  
> somewhere, but I can't find it right at the moment... the bookmarks  
> file is unorganized and out of control again (heh) and GoogleFu is  
> lacking today.  (GRIN)
> 
> Basically they're saying that a well-designed multiple disk RAID 0+1/1 
> +0 system will smoke a RAID 5 on write speed in most cases, and will  
> exhibit less risk of failure taking down the system.  I'm not sure I  
> disagree, but with the MTBF of disks going up quite a bit (well, it  
> seems that they have anyway, but I can't cite that either -- just  
> personal "evidence"), if either is monitored correctly, a fix can  
> almost always be deployed before anyone cares or notices that uses  
> the system.
> 
> The really interesting analysis lies in the performance hit... I  
> think, anyway.  Having used both systems in commercial environments,  
> even the performance hit is somewhat moot -- companies never ask the  
> sysadmin/engineer to redesign the system layout to "fix" performance  
> problems anymore, they just immediately start "throwing money at the  
> problem" and buying bigger/faster disk sub-systems before anyone has  
> time to do the analysis, if the system is making $.
> 
> Only us "margin" users of RAID on our home/small-business Linux  
> systems (especially those of us who don't have big company budgets)  
> might gain some "performance-Fu" from thinking a bit about this.  And  
> we're all pretty likely to ignore it until the next time we build/ 
> rebuild a system, anyway... since reality means that other things are  
> usually more important to do... (GRIN).
> 
> I'm sure someone will feel strongly about this, now that I've thrown  
> out all this drek on the topic.
> 
> Anyone?  Bueller?
> 
> :-)
> 
> --
> Nate Duehr
> nate at natetech.com
> 
> 
> 
> _______________________________________________
> clue-talk mailing list
> clue-talk at cluedenver.org
> http://www.cluedenver.org/mailman/listinfo/clue-talk

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://cluedenver.org/pipermail/clue-talk/attachments/20071021/6a5ca88a/attachment-0002.bin