Todd, > Try "ipcs" to see whats up with the ipc structures. > It will probably show you the real problem. Wow! Thanks for the clues. Next time I run into this problem, I will do just that. I bet it will show the problem, just as you say. Thanks, Ed, for your suggestion. It turns out on that filesystem (ext3) - and on the shm filesystem - that there are plenty of inodes: /dev/sda3 2146304 401511 1744793 19% / /dev/sda1 26104 41 26063 1% /boot none 64234 1 64233 1% /dev/shm > It sounds like you are out of shared memory or out of > semaphore table entries. The most likely culprit is > a database - are you running one? No, there is no database running on any of the systems I saw this on. How would I know if I was running out of semaphore table entries? Here is the ouput of ipcs from this system with apache running normally: ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 950272 root 600 1052672 12 dest 0x00000000 983041 root 600 33554432 12 dest 0x00000000 1277954 root 600 516096 12 dest 0x00000000 1310723 root 600 33554432 12 dest 0x00000000 1343492 apache 600 132 12 dest 0x00000000 1146885 apache 600 46084 12 dest 0x0001ffb8 229382 root 666 64 4 0x00025990 262151 root 666 8244 4 0x00027cb9 294920 root 666 131232 1 0x00027cba 327689 root 666 131232 1 0x00027cbb 360458 root 666 131232 1 ------ Semaphore Arrays -------- key semid owner perms nsems status 0x00000000 1703936 apache 600 1 0x00000000 1736705 apache 600 1 0x00000000 65538 root 600 1 0x00000000 98307 root 600 1 0x00000000 1048580 root 600 1 0x00000000 1081349 root 600 1 0x00000000 1409030 root 600 1 0x00000000 1441799 root 600 1 0x00000000 1769480 root 600 1 0x00000000 1802249 root 600 1 0x00000000 360458 root 666 1 0x00000000 393227 root 666 1 0x0001ffb8 425996 root 666 1 0x000251c0 458765 root 666 1 0x000255a8 491534 root 666 1 0x00025990 524303 root 666 1 0x000278d1 557072 root 666 1 0x00027cb9 589841 root 666 1 0x00000000 622610 root 666 1 0x000278d2 655379 root 666 1 0x00027cba 688148 root 666 1 0x000278d3 720917 root 666 1 0x00027cbb 753686 root 666 1 0x00000000 2261015 apache 600 1 0x00000000 2293784 apache 600 1 0x00000000 2326553 root 600 1 0x00000000 2359322 root 600 1 0x00000000 2392091 apache 600 1 0x00000000 2424860 apache 600 1 ------ Message Queues -------- key msqid owner perms used-bytes messages Do you know what I would see differently if the system was running out of some IPC resource? As I say the actual error condition is fairly short- lived because it means that the server is not functioning, and I need it to start functioning properly as soon as possible, so I generally reboot it right away. I will see if I can duplicate this error on a test system so I can mess with it. > You may need to do a kernel reconfigure and rebuild to > allocate more shared memory and/or semaphore slots and/or > message queues. > Todd Williams Thanks for the help, I really appreciate it. I will look into the above suggestion as well. Here are some files from /proc/sys/kernel: [root kernel]# /bin/pwd /proc/sys/kernel [root kernel]# cat sem 250 32000 32 128 [root kernel]# cat shmall 2097152 [root kernel]# cat shmmax 33554432 [root kernel]# cat shmmni 4096 According to /usr/src/linux-2.4/include/linux/sem.h, those numbers in /proc/sys/kernel/sem mean: 250 => 250 semaphores per id (128 ids), maximum 32000 => 32,000 semaphores in system, maximum 32 => 1,000 ops per semop call, maximum 128 => I presume this means 128 ids, maximum The shmmni means there can be 4,096 shared memory identifiers in the system, maximum. The shmall must be the number of bytes allocated for shared memory IPC in the system as currently running. It doesn't seem to be particularly close to the limit. The shmmax must be related to /usr/src/linux-2.4/include/linux/shm.h, where there is a #define SHMMAX 0x2000000, which when converted to decimal, is 33554432. As you can tell I don't really know what I'm looking at or doing, but I'm hoping that maybe there will be something obvious in the IPC/shm the next time I run into this. Do you know how to clear a kernel semaphore array or shared memory segment? Now that I know what the array ID or the shared memory ID is, I might be able to manually clear/free it the next time I have this problem, if I knew how to do it. Thanks again for your help. --Jim > Jim Ockers wrote: > > Hi all, > > > > Help! > > > > I'm running apache 1.3.22 from Red Hat 7.2 (fully up2date), with SSL. I have > > gotten this error on several Dell servers now, all running 2.4.18 kernels from > > kernel.org (not Red Hat kernels): > > > > Starting httpd: Ouch! ap_mm_create(1048576, "/var/run/httpd.mm.13626") failed > > Error: MM: mm:core: failed to acquire semaphore (No space left on device): OS: Invalid argument > > > > The web server refuses to start once it decides it can't acquire the semaphore. > > The only way to get the web server to start is to reboot the entire system, > > once the above error is displayed. (Every subsequent attempt to start the > > web server using "service httpd {restart,start}" generates the same message.) > > > > The SSLMutex is /var/log/httpd/ssl_mutex in the /etc/httpd/conf/httpd.conf. > > It creates files called /var/log/httpd/ssl_mutex.12345 where the numbers are > > supposedly the parent PID, but I've never see any processes running with the > > PID shown in the ssl_mutex file. > > > > There are never any /var/run/htt* files that I've seen, even though the error > > message above acts like there's a problem with such a file. > > > > Here's a code fragment from apache 2.0, I think, which shows what is > > failing: > > > > #if defined(MM_SEMT_IPCSEM) > > fdsem = semget(IPC_PRIVATE, 1, IPC_CREAT|IPC_EXCL|S_IRUSR|S_IWUSR); > > if (fdsem == -1 && errno == EEXIST) > > fdsem = semget(IPC_PRIVATE, 1, IPC_EXCL|S_IRUSR|S_IWUSR); > > if (fdsem == -1) > > FAIL(MM_ERR_CORE|MM_ERR_SYSTEM, "failed to acquire semaphore"); > > mm_core_semctlarg.val = 0; > > semctl(fdsem, 0, SETVAL, mm_core_semctlarg); > > fdsem_rd = semget(IPC_PRIVATE, 1, IPC_CREAT|IPC_EXCL|S_IRUSR|S_IWUSR); > > if (fdsem_rd == -1 && errno == EEXIST) > > fdsem_rd = semget(IPC_PRIVATE, 1, IPC_EXCL|S_IRUSR|S_IWUSR); > > if (fdsem_rd == -1) > > FAIL(MM_ERR_CORE|MM_ERR_SYSTEM, "failed to acquire semaphore"); > > mm_core_semctlarg.val = 0; > > semctl(fdsem_rd, 0, SETVAL, mm_core_semctlarg); > > #endif /* MM_SEMT_IPCSEM */ > > > > I don't know anything about semget() but I'm hoping someone here on the > > list can tell me what I can do about this. (I also know very little > > about SYSV-IPC other than it seems to work most of the time.) > > > > These systems have a mount for /dev/shm - here's a typical mount output: > > > > /dev/sda3 on / type ext3 (rw) > > none on /proc type proc (rw) > > usbdevfs on /proc/bus/usb type usbdevfs (rw) > > /dev/sda1 on /boot type ext3 (rw) > > none on /dev/pts type devpts (rw,gid=5,mode=620) > > none on /dev/shm type tmpfs (rw) > > > > The httpd semaphore failure seems to happen at random but after the system > > has been running for 1+ days and the web server has been restarted a bunch > > of times. We typically see it on servers that are undergoing heavy > > configuration changes because they are being newly configured, since the > > web server is restarted a lot while the configuration changes are being > > made. > > > > Should I upgrade to apache 2.0, and lose the Red Hat support? Do I have > > to use a Red Hat kernel to make this problem go away? > > > > I'd appreciate any clues or help about this! Thanks.. > > > > -- > > Jim Ockers (ockers@ockers.net) > > Contact info: please see http://www.ockers.net/ > > > > Fight Spam! Join CAUCE (Coalition Against Unsolicited Commercial Email) > > at http://www.cauce.org/ .