[CLUE-Tech] Linux TCP sockets SYN -> long delay waiting for tcp_syn_retries

Jim Ockers ockers at ockers.net
Fri Oct 1 22:45:20 MDT 2004


Hi everyone,

I'm working on a Linux problem in which the equivalent function on
Windows works better, and I need to understand why.  On our VSAT
network Windows telnet takes 8-9 seconds to time out, and Linux takes
3 minutes and 9 seconds.

This is way too long and is causing a serious application problem.

For testing I can simulate the production environment with 
a Linux server with iptables ACCEPT for telnet (23/tcp).  However 
no telnet daemon is running, so the connection is refused.  (This
is on purpose and is part of a failover design for the application.)

When I try to telnet to this host from a LAN, using both Windows 
and Linux, everything works fine and the connection is refused in 
a second or less.

# time telnet host
Trying 6.7.8.9...
telnet: connect to address 6.7.8.9: Connection refused 
0.000u 0.000s 0:00.02 0.0%      0+0k 0+0io 248pf+0w

When I use a VSAT satellite terminal IP link, and a Linux 2.4.22 kernel,
the time is 3 minutes and 9 seconds.  On the exact same VSAT system,
going through the same Linksys router and satmodem, a Windows 2000
telnet takes 8-9 seconds to give the "connect failed" message.

Microsoft Windows 2000 [Version 5.00.2195]
(C) Copyright 1985-2000 Microsoft Corp.

c:\>telnet 6.7.8.9
Connecting To 6.7.8.9...Could not open a connection to host on port 23 : Connect failed

According to ethereal here's what happens for the Windows attempt:

SYN is sent to 6.7.8.9:23
RST is received from 6.7.8.9:23
SYN sent
RST received
SYN sent
RST received
...then the telnet command returns and the error message above is 
printed.  Elapsed time is 8-9 seconds.

According to ethereal the same sequence of events happens for the
Linux telnet, but the SYN/RST happens 5 times.  This is the default
number of retries in /proc/sys/net/ipv4/tcp_syn_retries and is fine.  
Plus we can change it to a smaller number if we want.

The real problem is the timing - the RST is received within 2-3 
seconds of the SYN being sent, but it takes 3 minutes and 9 seconds 
for the telnet application, tcp/ip stack, or kernel to close the 
socket so the telnet process exits.

Indeed, this web page http://ipsysctl-tutorial.frozentux.net/ipsysctl-tutorial.html
says that a 180 second delay is to be expected:

   3.3.24. tcp_syn_retries

   The tcp_syn_retries variable tells the kernel how many times to 
   try to retransmit the initial SYN packet for an active TCP 
   connection attempt.

   This variable takes an integer value, but should not be set 
   higher than 255 since each retransmission will consume huge 
   amounts of time as well as some amounts of bandwidth. Each 
   connection retransmission takes aproximately 30-40 seconds. 
   The default setting is 5, which would lead to an aproximate 
   of 180 seconds delay before the connection times out. 

Can anyone with SOCK_STREAM socket programming experience suggest 
what we can do to make the TCP timeout be 10 seconds or less for 
connection attempts to closed services (RST received for every SYN)?

Why does it take so long for telnet to exit even though it gets a
RST right away for every SYN_SENT?

Thanks,
Jim

PS Of course the IP addresses have been changed to protect the
guilty.

PPS I'm wondering about the /proc/sys/net/ipv4/tcp_rfc1337 setting,
but our socket is in a SYN_SENT state and never TIME_WAIT, so I think
the tcp_rfc1337 is irrelevant.  If you disagree please let me know.

-- 
Jim Ockers, P.Eng. (ockers at ockers.net)
Contact info: please see http://www.ockers.net/



More information about the clue-tech mailing list