[clue-tech] Network Woes

Jim Ockers ockers at ockers.net
Mon Jun 19 16:52:30 MDT 2006


Todd,

> Setup: Everything in the house is connected to a switch.  A DSL modem
> and a server have fixed LAN IP addresses.  The other computers receive
> DHCP services from the server (DHCP server services are turned off on
> the DSL modem).
> 
> Symptoms:
> I lose the ability to make new connections through the DSL modem
> (either into or out of my server).  New connections include Web
> queries, ssh, telnet, ftp, etc.  Also during this time some LAN
> traffic is hamstrung.
> 
> Examples:
> local_one$ ping local_two #works OK
> local_one$ ssh local_two # 15-30 second delay before prompted for password
> local_one$ ssh 192.168.1.246 # prompted for password immediately

Please run ssh inside of strace and see what system call is waiting
for a return code.

If it is a gethostbyname() call then you can also see the IP address
of the DNS server your system is trying to contact.  You won't see the
actual text "gethostbyname(...)" because that is not a system call -
strace shows only system calls.

For example here is part of an ssh connect attempt from one of my
local machines to another.  This is from the strace output.  Of course
the output below starts with the part where it engages libresolv.so.2
to do a DNS lookup.

 ...
open("/lib/libresolv.so.2", O_RDONLY)   = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20\'\0"..., 1024) = 1024
fstat64(3, {st_mode=S_IFREG|0755, st_size=68925, ...}) = 0
old_mmap(NULL, 69408, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40139000
mprotect(0x40147000, 12064, PROT_NONE)  = 0
old_mmap(0x40147000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0xe000) = 0x40147000
old_mmap(0x40148000, 7968, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x40148000
close(3)                                = 0
munmap(0x40014000, 117947)              = 0
brk(0x809d000)                          = 0x809d000
brk(0x80ae000)                          = 0x80ae000
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
connect(3, {sin_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}}, 28) = 0
send(3, "\307\'\1\0\0\1\0\0\0\0\0\0\6niamey\6ockers\3net\0\0"..., 35, 0) = 35
gettimeofday({1150756816, 682489}, NULL) = 0
poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "\307\'\205\200\0\1\0\0\0\1\0\0\6niamey\6ockers\3net\0\0"..., 65536, 0, {sin_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 85
close(3)                                = 0
 ...

If I was having a DNS problem the connect(3, ...) would be the last line
shown in the output for the length of time that the DNS resolver library
takes to time out.  Say, 30 seconds or so.  (I think.  In any case it's
always been obvious to me that the problem was DNS when using strace as
the debugger.)

Once you know the IP address of the DNS server ssh is querying, and the
name it's trying to query, you can try the same query yourself manually
in another terminal to see if it works.

Example, using the above information:

# dig +short @127.0.0.1 niamey.ockers.net.
142.179.181.230

You can get a lot more debugging from dig as well - try the +trace option
to dig and get rid of the +short option of course.

Hope this helps,
Jim

-- 
Jim Ockers, P.Eng. (ockers at ockers.net)
Contact info: please see http://www.ockers.net/



More information about the clue-tech mailing list