Todd,

> Try "ipcs" to see whats up with the ipc structures.
> It will probably show you the real problem.

Wow!  Thanks for the clues.  Next time I run into this problem, I will do
just that.  I bet it will show the problem, just as you say.

Thanks, Ed, for your suggestion.  It turns out on that filesystem (ext3) -
and on the shm filesystem - that there are plenty of inodes:

/dev/sda3            2146304  401511 1744793   19% /
/dev/sda1              26104      41   26063    1% /boot
none                   64234       1   64233    1% /dev/shm

> It sounds like you are out of shared memory or out of
> semaphore table entries.  The most likely culprit is
> a database - are you running one?

No, there is no database running on any of the systems I saw this on.

How would I know if I was running out of semaphore table entries?  Here
is the ouput of ipcs from this system with apache running normally:

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 950272     root      600        1052672    12         dest
0x00000000 983041     root      600        33554432   12         dest
0x00000000 1277954    root      600        516096     12         dest
0x00000000 1310723    root      600        33554432   12         dest
0x00000000 1343492    apache    600        132        12         dest
0x00000000 1146885    apache    600        46084      12         dest
0x0001ffb8 229382     root      666        64         4
0x00025990 262151     root      666        8244       4
0x00027cb9 294920     root      666        131232     1
0x00027cba 327689     root      666        131232     1
0x00027cbb 360458     root      666        131232     1

------ Semaphore Arrays --------
key        semid      owner      perms      nsems      status
0x00000000 1703936    apache    600        1
0x00000000 1736705    apache    600        1
0x00000000 65538      root      600        1
0x00000000 98307      root      600        1
0x00000000 1048580    root      600        1
0x00000000 1081349    root      600        1
0x00000000 1409030    root      600        1
0x00000000 1441799    root      600        1
0x00000000 1769480    root      600        1
0x00000000 1802249    root      600        1
0x00000000 360458     root      666        1
0x00000000 393227     root      666        1
0x0001ffb8 425996     root      666        1
0x000251c0 458765     root      666        1
0x000255a8 491534     root      666        1
0x00025990 524303     root      666        1
0x000278d1 557072     root      666        1
0x00027cb9 589841     root      666        1
0x00000000 622610     root      666        1
0x000278d2 655379     root      666        1
0x00027cba 688148     root      666        1
0x000278d3 720917     root      666        1
0x00027cbb 753686     root      666        1
0x00000000 2261015    apache    600        1
0x00000000 2293784    apache    600        1
0x00000000 2326553    root      600        1
0x00000000 2359322    root      600        1
0x00000000 2392091    apache    600        1
0x00000000 2424860    apache    600        1

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages


Do you know what I would see differently if the system was running out of
some IPC resource?  As I say the actual error condition is fairly short-
lived because it means that the server is not functioning, and I need it
to start functioning properly as soon as possible, so I generally reboot it
right away.  I will see if I can duplicate this error on a test system so
I can mess with it.

> You may need to do a kernel reconfigure and rebuild to
> allocate more shared memory and/or semaphore slots and/or
> message queues.

> Todd Williams

Thanks for the help, I really appreciate it.  I will look into the above
suggestion as well.

Here are some files from /proc/sys/kernel:

[root kernel]# /bin/pwd
/proc/sys/kernel
[root kernel]# cat sem
250     32000   32      128
[root kernel]# cat shmall
2097152
[root kernel]# cat shmmax
33554432
[root kernel]# cat shmmni
4096

According to /usr/src/linux-2.4/include/linux/sem.h, those numbers in
/proc/sys/kernel/sem mean:

250 => 250 semaphores per id (128 ids), maximum
32000 => 32,000 semaphores in system, maximum
32 => 1,000 ops per semop call, maximum
128 => I presume this means 128 ids, maximum

The shmmni means there can be 4,096 shared memory identifiers in the
system, maximum.

The shmall must be the number of bytes allocated for shared memory IPC
in the system as currently running.  It doesn't seem to be particularly
close to the limit.

The shmmax must be related to /usr/src/linux-2.4/include/linux/shm.h, 
where there is a #define SHMMAX 0x2000000, which when converted to 
decimal, is 33554432.

As you can tell I don't really know what I'm looking at or doing, but I'm
hoping that maybe there will be something obvious in the IPC/shm the next
time I run into this.

Do you know how to clear a kernel semaphore array or shared memory 
segment?  Now that I know what the array ID or the shared memory ID is,
I might be able to manually clear/free it the next time I have this
problem, if I knew how to do it.

Thanks again for your help.

--Jim

> Jim Ockers wrote:
> > Hi all,
> > 
> > Help!
> > 
> > I'm running apache 1.3.22 from Red Hat 7.2 (fully up2date), with SSL.  I have
> > gotten this error on several Dell servers now, all running 2.4.18 kernels from
> > kernel.org (not Red Hat kernels):
> > 
> > Starting httpd: Ouch! ap_mm_create(1048576, "/var/run/httpd.mm.13626") failed
> > Error: MM: mm:core: failed to acquire semaphore (No space left on device): OS: Invalid argument
> > 
> > The web server refuses to start once it decides it can't acquire the semaphore.
> > The only way to get the web server to start is to reboot the entire system,
> > once the above error is displayed.  (Every subsequent attempt to start the
> > web server using "service httpd {restart,start}" generates the same message.)
> > 
> > The SSLMutex is /var/log/httpd/ssl_mutex in the /etc/httpd/conf/httpd.conf.
> > It creates files called /var/log/httpd/ssl_mutex.12345 where the numbers are
> > supposedly the parent PID, but I've never see any processes running with the
> > PID shown in the ssl_mutex file.
> > 
> > There are never any /var/run/htt* files that I've seen, even though the error
> > message above acts like there's a problem with such a file.
> > 
> > Here's a code fragment from apache 2.0, I think, which shows what is
> > failing:
> > 
> > #if defined(MM_SEMT_IPCSEM)
> > fdsem = semget(IPC_PRIVATE, 1, IPC_CREAT|IPC_EXCL|S_IRUSR|S_IWUSR);
> > if (fdsem == -1 && errno == EEXIST)
> > fdsem = semget(IPC_PRIVATE, 1, IPC_EXCL|S_IRUSR|S_IWUSR);
> > if (fdsem == -1)
> > FAIL(MM_ERR_CORE|MM_ERR_SYSTEM, "failed to acquire semaphore");
> > mm_core_semctlarg.val = 0;
> > semctl(fdsem, 0, SETVAL, mm_core_semctlarg);
> > fdsem_rd = semget(IPC_PRIVATE, 1, IPC_CREAT|IPC_EXCL|S_IRUSR|S_IWUSR);
> > if (fdsem_rd == -1 && errno == EEXIST)
> > fdsem_rd = semget(IPC_PRIVATE, 1, IPC_EXCL|S_IRUSR|S_IWUSR);
> > if (fdsem_rd == -1)
> > FAIL(MM_ERR_CORE|MM_ERR_SYSTEM, "failed to acquire semaphore");
> > mm_core_semctlarg.val = 0;
> > semctl(fdsem_rd, 0, SETVAL, mm_core_semctlarg);
> > #endif /* MM_SEMT_IPCSEM */
> > 
> > I don't know anything about semget() but I'm hoping someone here on the
> > list can tell me what I can do about this.  (I also know very little
> > about SYSV-IPC other than it seems to work most of the time.)
> > 
> > These systems have a mount for /dev/shm - here's a typical mount output:
> > 
> > /dev/sda3 on / type ext3 (rw)
> > none on /proc type proc (rw)
> > usbdevfs on /proc/bus/usb type usbdevfs (rw)
> > /dev/sda1 on /boot type ext3 (rw)
> > none on /dev/pts type devpts (rw,gid=5,mode=620)
> > none on /dev/shm type tmpfs (rw)
> > 
> > The httpd semaphore failure  seems to happen at random but after the system 
> > has been running for 1+ days and the web server has been restarted a bunch 
> > of times.  We typically see it on servers that are undergoing heavy 
> > configuration changes because they are being newly configured, since the 
> > web server is restarted a lot while the configuration changes are being 
> > made.
> > 
> > Should I upgrade to apache 2.0, and lose the Red Hat support?  Do I have
> > to use a Red Hat kernel to make this problem go away?
> > 
> > I'd appreciate any clues or help about this!  Thanks..
> > 
> > --
> > Jim Ockers (ockers@ockers.net)
> > Contact info: please see http://www.ockers.net/
> > 
> > Fight Spam! Join CAUCE (Coalition Against Unsolicited Commercial Email)
> > at http://www.cauce.org/ .