[CLUE-Tech] Linux TCP sockets SYN -> long delay waiting for tcp_syn_retries

David Anselmi anselmi at anselmi.us
Mon Oct 4 17:57:13 MDT 2004


Jim Ockers wrote:
[...]
> Actually the problem is not with telnet, it's with a Linux JMS client,

Not sure I have any ideas about that.

[...]
>>What's the time between SYNs (all the same or using a back-off algorithm)?
> 
> 
> I'm glad you asked - I think this is important.  Here is what ethereal
> says about the timing for Linux telnet to a closed port:
> 
> time=0s	SYN sent (initial)
> 1.4	RST received
> 3.0	SYN sent, retry 1
> 4.4	RST
> 9.0	SYN sent, retry 2
> 9.9	RST
> 21.0	SYN sent, retry 3
> 22.4	RST
> 45.0	SYN sent, retry 4
> 45.9	RST
> 93.0	SYN sent, retry 5
> 94.8	RST received
> 194.5	[telnet process exits (this is not shown in ethereal of course)]

Interesting.  I get this behavior connecting to a blocked port (i.e., a 
DROP rule in iptables--no RSTs sent).  And this is typical TCP behavior. 
   The initial RTO (retransmission timeout) is 3 sec.  Each retry waits 
twice as long as the first (3s, 6s, 12s, 24s, 48s, 96s).  The ~100s 
delay at the end is waiting for a response from the final retry and 
seems to be >180s by RFC (according to tcp.h).

If I telnet to a closed port and get a RST, only one SYN is sent, no 
retries.  So it looks like your RSTs aren't matching the SYNs you send. 
  Or maybe are blocked from getting to the sending machine (an INPUT 
rule maybe?)

> I'm not sure why it takes 100 seconds for the telnet process to exit
> after it gets the 6th RST in a row.  The JMS client shows the same
> delay when giving up on the defunct server.

Seems to be designed that way, see above.

[...]
>>the right options).  If it is long then the bug is in the telnet code 
>>and you need to debug it to find the cause and whether there might be a 
>>reason for it (or use different code).
> 
> I wonder about that, since it's the stock telnet, and the JMS client
> doesn't use the same code as telnet (maybe the same library though).
> Over our iDirect satellite system everything is fast.  On the LAN
> everything is fast.  This one VSAT network seems to break something
> on Linux, but the ethereal traces look the same to me (except for 
> the timing of course).  As I mentioned before, Windows telnet and
> TCP programs work fine on both VSAT systems and the LAN.

Could it be that the VSAT mangles the RST so that it isn't recognized as 
going with the SYN?  Or you're inadvertently blocking replies from the 
VSAT IP or something.

> 
>>Seems broken that something would wait a long time after being told a 
>>port is closed.  But it seems broken that several SYNs would be sent 
>>after a RST ('course I don't know nuthin bout the TCP specs).
> 
> 
> The /proc/sys/net/ipv4/tcp_syn_retries is set to 5 by default, you can
> set it lower.  Windows does 2 retries (as observed in ethereal).
> 
> I had the bright idea to try iptables -j REJECT with different targets
> on the server host.  I was hoping that maybe if TCP RST wasn't enough
> of a rejection, maybe something else would be stronger/quicker:

[...]
> all of which made for a really boring half hour.  Interestingly
> the DROP took the same amount of time for telnet to exit as the
> others.

Yes, your TCP is behaving as if it doesn't see the RSTs.  It seems not 
to see any of the ICMP errors either.

> When I set the tcp_syn_retries to 1, the telnet process exits in
> 12.3 seconds.  I think there is a good reason to use 2 or more
> SYN retries so I'm not sure that is a good solution.  Also you'll
> note from the timing above that the first retry's RST is received 
> after only 4.4 seconds, but the telnet process takes an additional
> 8 seconds to exit after receiving the second RST.

Like I said, it seems not to see the RSTs.  That's why you get the delay 
after the last RST--waiting for the last SYN to time out.  Where are you 
running ethereal?  Not on the telnet machine I'd guess.

Dave



More information about the clue-tech mailing list