[clue-tech] MySQL server reply packets delayed/lost under network congestion?

Jim Ockers ockers at ockers.net
Mon Nov 9 13:32:59 MST 2009


Hi everyone,

It turns out our MySQL server already had the skip-name-resolve option.  
I ran some packet traces and found the following times without any 
bandwidth hog and in the presence of a bandwidth hog.  The timestamps 
reported in the server packet trace are within 1-2 milliseconds of the 
timestamps in the client's packet trace.  The delay is the amount of 
elapsed time from the completion of the TCP 3-way handshake (syn, 
syn-ack, ack) to when the service banner is received by the client (in 
my case the "client" is windows telnet.exe):

service    no-hog start delay         hog-start-delay
ftp        .1ms                       ~1ms
mysql      1ms                        ~10-15 SECONDS
http       .1ms                       ~1ms

The Linux server has a mostly idle CPU, load averages around 0.00, and 
the bandwidth hog is just a FTP download from somewhere else with lots 
of bandwidth to the Windows client being written to the nul: device.  
The PPP connection has a maximum of about 750Kbps throughput over the 
921600bps RS422 serial link.

There is something strange we noticed in the packet trace.  The client 
initiates the TCP 3 way handshake to the server, and then instead of 
sending the banner the server tries to connect back to the client on 
port 3306 (it sends a SYN).  Of course the client is not running a MySQL 
server or anything else on port 3306 so it replies with a RST.  Then 
several seconds elapse, then the server finally sends the banner.

When the link is not hogged/saturated we don't observe this SYN-RST 
behavior where the server tries to connect to the client.  Google 
searches about this have not helped.  There is nothing in the mysql 
server config file which would indicate that it should try to connect to 
its clients on port 3306.

When we try this test from a Linux system connected the exact same way 
we don't get any of this weird behavior with MySQL.  However, I don't 
think it can be OS dependent since the service banner is supposed to be 
sent right after the TCP 3-way handshake.  I don't see how the Linux 
MySQL server would "know" anything about the client.  The 3 way 
handshake contains no specific information about the client operating 
system and the packets are small short and to the point.  The service 
banner also seems to be more or less the same regardless of client OS, 
the difference is timing.

When we add an iptables filter/OUTPUT rule preventing the server from 
connecting to port 3306 on the client's IP address, the hog-start-delay 
is more like 3 seconds, which is still a long time but is much shorter 
than without this iptables rule.  I think a socket() syscall seems to 
get a -1 if there is an iptables rule somewhere in the kernel which 
would prevent that socket from being established.  Obviously with this 
firewall rule in place the SYN does not show up in the packet trace from 
either end.

Also because this whole problem seems to be timing dependent the system 
does not always behave exactly this way.  For example if for some reason 
there is a TCP retransmission on the bandwidth-hog application which 
would cause the bandwidth usage to slow down, then the mysql banner at 
that instant might be faster than it would be had there been no 
interruption in the bandwidth hogging data flow.

This is all too weird, it is giving me a headache.  Anyone have any 
ideas?  Sorry for top posting and replying to my own post but I wanted 
to continue the thread.

Thanks,
Jim

Jim Ockers wrote:
> Hi Chris,
>
> Thank you for the out-of-the-box idea.  If MySQL was trying to do 
> inverse DNS it would certainly fail the way you describe because we 
> have not set it up.  However I am not sure about that because when the 
> link is not saturated, MySQL responds right away to the same requests 
> or queries.  I will try adding a /etc/hosts entry on the linux server 
> end for the client IP address and see if it is any better.
>
> Thanks,
> Jim
>
> chris fedde wrote:
>> Having read through this thread I begin to wonder if it is a DNS
>> issue.  mysql may be looking for and failing to find reverse dns
>> entries.  The characteristic for this particular problem is a 90
>> second delay from the tcp connection till the response banner.
>>
>> You can check to see if this is the issue using tcpdump.
>>
>> On Sun, Nov 8, 2009 at 9:17 PM, Jim Ockers <ockers at ockers.net> wrote:
>>   
>>> Hi Dave,
>>>
>>> David L. Anselmi wrote:
>>>     
>>>> Jim Ockers wrote:
>>>>       
>>>>> Whenever there is sustained high bandwidth network traffic over the PPP
>>>>> link (such as downloading a large e-mail attachment, or streaming video) the
>>>>> mysql connections all start to time out because response packets from the
>>>>> server are not received at the client.  I observed this by doing a simple
>>>>> "telnet server 3306" and observed that the MySQL banner response with
>>>>> version number was delayed by several seconds or until the bandwidth-hogging
>>>>> stopped.
>>>>>
>>>>> What I don't understand is why I can still transfer data from other
>>>>> servers on the Linux system, such as "net view \\server" and "dir
>>>>> \\server\sharename" (using the Windows redirector to talk to Samba on the
>>>>> Linux system).  Also the apache web server on the Linux system responds
>>>>> normally, both from a web browser and also in response to "telnet server 80"
>>>>> and then "GET / HTTP/1.0".
>>>>>         
>>>> I assume that other services are delayed like the MySQL banner.  Most of
>>>> what you tried though would use fairly small responses (one packet, even?)
>>>> so maybe the MySQL protocol is more verbose.
>>>>
>>>>       
>>> No, actually the only service that is delayed is MySQL response packets
>>> including the banner.  Apache responds right away to a GET / HTTP/1.0
>>> request, samba responds right away to a Windows redirector request, FTP
>>> responds right away, and so forth.  MySQL is the only one that seems to take
>>> an unreasonably long time to respond when there is something hogging the
>>> bandwidth.
>>>     
>>>> It seems that clients have several timeouts available:
>>>>
>>>> http://dev.mysql.com/doc/refman/5.1/en/mysql-options.html
>>>>
>>>> Maybe the protocol takes too long in some cases.
>>>>       
>>> I'm not too sure about that, but since the banner is delayed I wonder if
>>> MySQL has some sort of built-in congestion management or something.
>>>
>>> Thanks though,
>>> Jim
>>>
>>>
>>>       

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://cluedenver.org/pipermail/clue-tech/attachments/20091109/4d855b1c/attachment-0001.html


More information about the clue-tech mailing list