[clue-tech] RAID no. of disks

Nate Duehr nate at natetech.com
Thu Apr 9 12:43:15 MDT 2009


In the systems I work on daily, the decision was made years ago that
redundancy was king, and storage space was NOT the primary goal... (some of
these poor old systems are still running 18 GB drives!)...

They decided to use dual mirrored internal disks in the Sun servers for the
OS, and then there are two external 4-disk JBOD's on separate SCSI
controllers, with two RAID 1/0 setups on those, one for the application and
another for "upgrades" and then partitioned those into application and
database space.  

Those dual external JBODs are shared between a "live" system and a "standby"
Sun in a "failover cluster" environment (Veritas VCS).

That yielded the best "what's the most likely to fail during non-maintenance
windows" scenarios, and we see virtually zero downtime that's not scheduled.
The stuff that can fail and we see nothing but a failover as the maximum
downtime is: 

Whole Sun server, power, whatever (they have dual power supplies)
Either SCSI controller in either server
Cabling to a single JBOD from either system (yes, cables do fail - seen it)
Single disk in either mirror in either JBOD (if the "right" disks fail, you
could have four disks dead and still be operating -- policy is that a single
disk failure means a replacement that weekend, no matter what)
Either internal disk in the Sun servers themselves

Etc.. etc.. etc..

If you're not going for loads of disk space, creative uses of RAID 1 can
offer better performance and "more" redundancy, than say if they had just
slapped one big RAID5'd JBOD external to both servers, which is the common
"newbie" mistake when looking for uptime.  These systems are typically on
5-nines SLA's...

But as others have pointed out in past threads on this topic... mirroring
isn't cheap and uses a lot of disks.  2 Pedabytes of online data MIRRORED
would get hideously expensive.

So the question for the RAID decisions in life always comes down to... do
you want uptime, or do you want oceans of space with the ability to lose ONE
disk?  It's pretty easy to do the needs analysis from there, from a
technical standpoint.  From the business standpoint... the "MAKE A DECISION"
standpoint, you present the options and prices to whoever's making the
purchasing decision, and INCLUDE how much overtime it'll take on a weekend
to get it all back online in each scenario (the part most analyses leave
out)... and let the bosses choose...

Nate 

-----Original Message-----
From: clue-tech-bounces at cluedenver.org
[mailto:clue-tech-bounces at cluedenver.org] On Behalf Of Angelo Bertolli
Sent: Wednesday, April 08, 2009 8:58 PM
To: CLUE technical discussion
Subject: Re: [clue-tech] RAID no. of disks

Yeah, I've read it all before.  With upwards of probably 2 Petabytes of 
data at this point, we're not going to spend the extra money to have 
completely redundant data.  Everything is a tradeoff.



More information about the clue-tech mailing list