[CLUE-Tech] RAID 1 on Linux
Nate Duehr
nate at natetech.com
Wed Oct 20 03:16:45 MDT 2004
On Oct 19, 2004, at 7:41 PM, Carl Schelin wrote:
> Ok, question on RAIDing an existing linux system.
>
> A little background first. I'm mainly a Solaris guy.
> I've installed RAIDs on Sun boxes many times. The
> procedure is simple. Create the raid using the
> existing slices and add the new slices to the new
> raid. They syncronize and it's working.
Yeah, I was mainly a non-Solaris guy until the last couple of years,
and I must say, software RAID on Solaris is pretty simple to deal with.
> In a misguided attempt, and because I didn't see the
> RAID option the previous times I've installed Mandrake
> (under the expert menu), I installed Mandrake 9.1 on
> an 80 gig seagate.
Done that.
> After futzing around with mdadm and raidtools, the
> second disk was so fscked up, I had to use dd to fix
> it (I dd'd the good hda over hdc). Finally I was able
> to do a fresh install and found the RAID options under
> the expert menu.
Done that too. ;-)
> All of the documentation I've seen appear to show that
> the only way to make an existing system in to RAID 1,
> is to back it off, install to RAID and restore the
> data.
Nahh, you can make a RAID1 out of an existing partition. See below.
> Does anyone have a pointer to a document that debunks
> this? Can I in fact, add a second disk and make the
> system RAID 1 or do I have to back it off and
> reinstall?
I finally figured out most of this from an article in Sysadmin magazine
about it. Unfortunately I don't think this particular article is
available online anywhere.
> Just so you know, I've read the Managing RAID on LINUX
> book (three years out of date), the Software Raid
> HOWTO over at unthought, the Quick Software RAID over
> at linuxhomenetworking, the kernel raid list (just
> poking around in the archives) and even the various
> man pages for mdadm and mkraid.
I'd have to agree here -- I had some questions early-on, and no one
seems to have been able to find time to update much in the way of docs.
It'd be a good project for a Hacking Society meeting if I weren't
working until 9PM every night on my new schedule. (And getting me up
early in the morning to write docs, just isn't ever going to happen.
Heh.)
> Of course if it's in one of these documents, please
> point me at the right section.
>
> Thanks for any pointers.
Carl, one of the other folks is right in their hunch. You can create a
RAID1 with a failed member directly.
So you create a new "RAID'ed" filesystem on the new disk that is
configured with your original partition on the good disk as a failed
member, mount the "RAID" (in quotes because it's really only the new
disk at this point), copy the files over, and then edit fstab to use
the RAID for that filesystem and either remount or reboot (depending on
what filesystem you're talking about here), then "repair" the RAID by
hot-adding the original partition back in.
When you first boot/remount, the system uses the "RAID" and it comes up
in degraded mode. You stop and check everything carefully at this
point and after you're darn sure all your data is there (you of COURSE
made backups before starting all this right? GRIN...) you can then
hot-add the "failed" partition (your original data partition) to the
RAID1 and it'll synchronize up and be happy. You then go into the
raidtab configuration and tell it that disk is no longer a failed
member also, after you've started the sync. The key to this is when
setting up the RAID initially you use the "failed-disk" nomenclature in
your raidtab instead of the "raid-disk" tag. "raid-disk" for the new
drive, "failed-disk" for the old. Kinda scary the first time you do it
because you're not sure if it's going to fiddle with that good disk
you're running from. Best to practice with an unused but mounted and
formatted partition with some data in it on the "good" disk first, if
you have one.
One word of caution here: Make VERY sure your new partitions are ever
so slightly smaller than the partitions you're starting with -- if the
two physical disks are not the exact same geometry. If you attempt to
hot-add the original partition and it's even a few blocks smaller than
the "degraded RAID" partition, you'll get a failure message immediately
that the new partition is too small.
The article in Sysadmin also showed how to layer LVM on top of the
RAID's -- I didn't really feel the need to go that far, but it was a
nifty idea, so you could resize everything on the fly, but at the cost
of huge overhead.
Once your sync is going, you can cat /proc/mdstat to see how it's doing
and do reboot tests or whatever when it's all done.
This works beautifully for non "/" filesystems. "/" is a bit harder --
you have to reconfigure your bootloader to use the md device, make sure
your kernel supports it, etc. And ultimately you're really only
booting off of one disk so you need to build some options for yourself
to boot from the other one into your boot menu for times when you have
a real disk failure. And if you're using an initrd with your kernel,
you gotta make sure it's remade too so stuff uses the md device at
boot. You also MUST edit the partition table with fdisk or your other
favorite partition tool and change the partition type to Linux RAID, if
you want the kernel to use it at boot-time.
Here's the rub though -- software RAID1 on 2.4 kernels from hard
testing I read on some of the Debian mailing lists from folks like
Russel Coker who wrote bonnie++, shows that there's NO intelligence
about read performance in a Linux Kernel software RAID-1. It *always*
reads from a single disk, and writes to both. It gives you
zero-performance-gain for reads, which a lot of Solaris admins would
expect to see from their much more mature software RAID software. You
just get the redundancy and a speed penalty on writes. At one point he
did some really wacky tests like RAID1 across an internal IDE disk and
an external USB v1 disk -- the kernel would sometimes pick the external
(slow) disk as the one it was mainly working from (even though it was
hideously slower than the internal) and would do all reads from the USB
disk, even though a much faster DMA-enabled disk was sitting there
doing virtually nothing in the RAID1 array. That's how I read his test
data, anyway.
So you make your system slower and gain some data redundancy. As
someone put it recently -- anyone who wants to be a Linux Kernel
superstar and create themselves much fame could fix Linux RAID 1 in the
kernel, right now. That's paraphrased from a quote I saw in a magazine
from one of the kernel developers about RAID-1 support.
Personally I found the performance hit on one of my busier machines not
to be worth it, and I switched back from software RAID-1 to rsync'ing
to the second drive periodically and to another machine across the
network.
My experience with disk failures and Linux software-RAID was not good
either -- I sat and watched a drive fail in my RAID-1 server one
night, kernel messages clearly showed it throwing hardware errors, yet
software RAID never tagged it as "bad" in any way, and during the next
system reboot (bad idea on my part), software RAID somehow decided the
disk with the ERRORS was the good disk and started syncing the bad data
to the good disk. (Definitely my fault, I forgot to tag the drive bad
myself.) Thank goodness for backups. I'm definitely NOT impressed
with the Linux kernel RAID-1.
Supposedly, the kernel RAID-5 is much more mature-acting and gets more
effort from developers, is what I found out when I was researching that
lovely "let's sync the bad data to the good disk" episode, to see if it
was common.
I don't keep up on the "latest and greatest" linux kernels, so my
experience was on a late 2.4 series kernel. Perhaps someone
kindhearted has been working on the later 2.6 kernels and the
performance issues are better. Best bet would be to do some
performance tests in your environment with your kernel, if possible.
Hope this helps.
--
Nate Duehr, nate at natetech.com
More information about the clue-tech
mailing list