[clue-tech] GRUB woes (this is another long message you'll probably want to skip)

David Anselmi anselmi at anselmi.us
Thu Feb 17 20:33:17 MST 2005


If you don't appreciate my sense of humor, forgive me on account that I 
actually read all the useless stuff you wrote (well, I read the useful 
stuff too). ;-)

William wrote:
[...]
> Important note:  The drives are physically mapped to the IDE channels as follows (and I have
> checked the DIPs on these drives several times over to confirm the Master/Slave relationships):
> Primary IDE:  Master = HDD1, Slave = HDD2
> Secondary IDE:  Master = HDD2, CD-ROM1
> (makes sense, right?)

It would make sense without the typo.  You mean HDD3 is the secondary 
master of course.

> The machine has a "Made for Windows ME" decal on it, but it was running Windows XP Professional
> _slowly_ when it was delivered to me.  The first thing I did was fdisk the drives to wipe out the
> partitions (I did not create new partitions).  I then installed Fedora Core 3.  That worked
> blissfully, and it created a single LVM for all three drives.  Nice.

Hooray!  Too bad you let CentOS change the partitions at all.  Oh well, 
maybe it's not as progressive as FC3.

> Turn the page, and I have found that I want to use CentOS instead of Fedora on all my servers. 
[...]
> Important note:  It's after midnight on Tuesday, and I'd been up since 6:30am.  I'm not rational.

No, that's not important.  Just entertaining.

> I created ext3 partitions to consume all the drive space on HDD2 and HDD3.  Then I had an
> ill-conceived stroke of bad genius.  I decided to install the OS onto HDD3 and use the other two
> (relatively massive) drives for /var and /mailroot (as this machine will ultimately become a mail
> storage box).  Here's where I screwed up:  I moved the /, /boot, and swap partitions to HDD3 and
> marked /boot as a Primary Partition.

Linux cares not a whit about primary (or bootable) partitions.  I wonder 
whether marking it broke your BIOS somehow?  I'd guess not but I don't 
recall a "primary partition" marking--thought it said bootable.  Oh 
well, you didn't say which partitioner you used.

> After selecting the packages I wanted, the installer formatted the partitions and copied the
> packages.  When finished, it invited me to remove the installation media and reboot, which I did. 
> Surprise, surprise, the machine would not boot.

So you've broken your boot loader.  No big deal.  Do you know where it's 
installed?  MBR of hda?  You don't seem to current on the boot process 
so you might do some research there (no offense intended, if you're 
current ignore me).

> Instead, I got dumped into a Grub> prompt.  I rebooted again.  Same
> Grub> prompt.  I rebooted yet again with the same result.  I played
> with Grub and was met with little more than "Error 15:  File not
> found" at every turn.

Rebooted again?  Isn't doing the same thing and expecting different 
results the definition of insanity?  Maybe I can't help you.

[...]
> Troubleshooting mode begins.  I drop in my old Windows 98 boot disk again and check the partitions
> with fdisk.  WIERDNESS APPEARS!

Wow.  You run Win98 fdisk on a (relatively) complex Linux setup and call 
that troubleshooting?  At best Win98 will give you plain old DOS 
partitions on each disk.

> For reasons beyond my understanding, HDD3 is now listed as HDD1, the
> primary drive!!  The other two drives follow in the correct order.
> Impossible, I think.  I reboot and watch the BIOS report as it lists
> the drives.  Everything is in the correct order (as listed above).

See.  Win98 is confused, gives you bogus data (it is a really well 
designed program after all), and you let it confuse you.  I don't call 
that troubleshooting.

Did you consider using a Linux live CD or rescue disk?  At least that 
would speak the same language as your installer.  No, NT and XP don't 
count either.

[...]
> I give up.  The system will not boot at all and I can't get rid of that damn GRUB.  The drives are
> still suffering an identity crisis as to who is the Primary Master (in fdisk) and I've tried
> everything I can think of to dump GRUB and get HDD1 to take over the Master Primary role.
> 
> How do I fix this?

You fix your boot loader.  I'll guess that by this time you've hosed 
your partitions enough that you can't use any files on them from your 
install attempts.  Get a Linux booted and a shell prompt (probably 
CentOS or FC3 install CDs will get you this--try alt-f2 or ctl-alt-f2).

Let's slick the partition tables so you're starting fresh:

dd if=/dev/zero of=/dev/hd[abc] bs=1024 count=4

I think only the first 512 bytes matter, but what the heck.

Now reboot (no, that isn't necessary, really, but baby steps).  Check 
the BIOS and make sure it reports the drives correctly.  Also make sure 
that the floppy is higher priority than the hard drives in boot order 
(looks like the CD already is, keep it that way).

Reinstall.  You shouldn't have any problems.  When it's time to reboot, 
don't.  Instead, put in the floppy, and put your boot loader there.  Use 
whichever grub or lilo you're more comfortable with.  Read their docs to 
figure out how to do this.  In lilo it's boot=/dev/fd0 or something in 
lilo.conf.  In grub probably run grub-install /dev/fd0.

You have to make sure your config is right, especially if you try wacky 
stuff like you're doing.  That means the boot loader and kernel know 
what your root partition is (may be one config line, may be two--two 
doesn't hurt even if unnecessary, and grub uses different syntax for its 
root line).  You may have to make sure they know about an initrd if you 
have one.  And of course the boot loader needs the path to the kernel.

Here's a short primer on boot loaders (check it, I'm running off 
memory).  Basically they occupy the first block (well, the old 512B 
blocks I guess) of a disk or partition.  When the BIOS is ready to start 
the OS it loads that block and runs it.  That block loads the kernel and 
runs it.

In Lilo, the boot block is hard coded with the location of the kernel 
blocks.  Lilo knows nothing about file systems so all it does is load 
blocks from its list.  Hence you have to run lilo every time you change 
your kernel, even if you overwrite the old one with a new one with the 
same name.  The lilo command is what writes the boot loader and kernel 
location into the boot block.

In Grub, the boot block knows enough to find its config in the 
filesystem and to find the kernel file listed therein.  So changing the 
config or the kernel in Grub doesn't require any boot loader changes. 
Running grub-install installs the boot loader and only has to be rerun 
when you upgrade Grub itself.

If you boot and wind up at the grub prompt, you're home free.  You can 
set all the stuff it normally gets out of its config and once set it 
will boot for you.  I think you got this far with CentOS, you just 
didn't understand Grub well enough to walk it through by hand.

Good luck!  It's too bad you just missed installfest--this is a good 
problem to work there.

Dave



More information about the clue-tech mailing list