[clue-tech] MySQL server reply packets delayed/lost under network congestion?

Jim Ockers ockers at ockers.net
Wed Nov 25 10:39:12 MST 2009


Hi everyone,

Replying to my own post. :)  We figured it out.  MySQL tries to set the 
IP TOS to a nonzero value.  I made a bogus python service that does the 
same system calls as MySQL including all the same socket options and it 
behaves the same as MySQL.  Here is the offending system call (from strace):

setsockopt(fd, SOL_IP, IP_TOS, [8], 4)

Here is the Python code to handle an incoming connection just like MySQL 
does, including setting the TOS:

#!/usr/bin/python
 
from socket import *
import string

s = socket(AF_INET, SOCK_STREAM)
senddata1 = "220 JimO Bogus Service ready\n"
HOST = "0.0.0.0"
PORT = 26
s.setsockopt( SOL_SOCKET, SO_REUSEADDR, 1 )
s.bind((HOST, PORT)) # Bind the socket to an IP Address and Port
s.listen(1) # Have the socket listen for a connection
 
(incomingsocket, address) = s.accept() # Accept an incoming connection
 
port = incomingsocket.getsockname()[1]
incomingsocket.setsockopt( SOL_IP, IP_TOS, 8 )
incomingsocket.setsockopt( SOL_TCP, TCP_NODELAY, 1 )
peer = incomingsocket.getpeername()[1]
incomingsocket.setsockopt( SOL_SOCKET, SO_KEEPALIVE, 1 )
incomingsocket.send(senddata1) # Send our banner
 
incomingsocket.close

If we get rid of that setsockopt( SOL_IP, IP_TOS, 8 ) line of python 
code then the service banner is printed right away even when the 
connection is busy.  So the questions for the group are:

1. Does anyone know how to make MySQL not do that, either a config file 
option or command line option?
2. Is there some really excellent reason why MySQL would do that?
3. Is there any other way (iptables mangle, for example) that we could 
fix this packet's TOS _BEFORE_ it hits the network 
stack/queues/buffers?  We tried with iptables mangle/OUTPUT and it was 
too late in the network processing to have any effect on the packet.

Thanks,
Jim

PS Here is the MySQL code from vio/viosocket.c of MySQL 5.0.88 which we 
think sets this.  In our experience "fastsend" is not exactly what 
happens when this option is set. :)

int vio_fastsend(Vio * vio __attribute__((unused)))
{
  int r=0;
  DBUG_ENTER("vio_fastsend");

#if defined(IPTOS_THROUGHPUT) && !defined(__EMX__)
  {
    int tos = IPTOS_THROUGHPUT;
    r= setsockopt(vio->sd, IPPROTO_IP, IP_TOS, (void *) &tos, sizeof(tos));
  }
#endif                                    /* IPTOS_THROUGHPUT && !__EMX__ */
  if (!r)
  {
#ifdef __WIN__
    BOOL nodelay= 1;
#else
    int nodelay = 1;
#endif

    r= setsockopt(vio->sd, IPPROTO_TCP, TCP_NODELAY,
                  IF_WIN(const char*, void*) &nodelay,
                  sizeof(nodelay));
  }
  if (r)
  {
    DBUG_PRINT("warning", ("Couldn't set socket option for fast send"));
    r= -1;
  }

  DBUG_PRINT("exit", ("%d", r));
  DBUG_RETURN(r);
}


Jim Ockers wrote:
> OK this is complicated enough that I think I have to top-post.  We 
> have more information so here's what we've found so far.
>
> Recap: We are using MySQL over a 921600 bps serial connection with 
> Linux pppd on one side and Windows RAS on the other side.  The MySQL 
> server is on the Linux side and we are using Windows XP as the 
> client.  The ppp interface txqueuelen is 100 and the MTU is 1500, all 
> other PPP options are standard or default.  Everything works fine 
> until we start a high bandwidth download where the Windows system 
> retrieves something from the Internet, in which the Linux server has 
> to route the packets.  It does not do NAT.  The internet-connected 
> interface is eth0, so the big download comes in the Linux server on 
> eth0 and goes through netfilter and routing and goes out the PPP 
> interface.  The big download can be youtube HD streaming, a big FTP 
> file, a big image or document retrieved via HTTP, or whatever.
>
> It does not matter how big the file is - if there are no dropped 
> packets, then the MySQL initial banner response is delayed until the 
> completion of the download, which could be hundreds of megabytes.  We 
> are asking for the banner response by typing "telnet 10.0.0.2 3306" 
> and are expecting the MySQL banner packet to show up right away, but 
> it doesn't.  There is nothing unusual in the SYN or ACK packets.
>
> The problem is that ALL MySQL response packets generated by mysqld 
> seem to be not inserted into the Linux kernel network queue discipline 
> send buffer until after the transfer is finished, or until there is a 
> dropped packet which causes a momentary delay due to TCP retransmit 
> somewhere in the middle of the big download.  The SYN, SYN-ACK, ACK 3 
> way handshake works immediately because those are generated by the OS, 
> but the MySQL banner is delayed, as are all other MySQL query 
> responses including "select 1".  All other services including httpd 
> (apache), vsftpd, smbd, etc. work as expected and the service banner 
> (or whatever data) is transmitted within 100ms or so of the completion 
> of the 3 way handshake.  The timing of packets as shown in a PCAP 
> packet capture is more or less identical on both sender and receiver.
>
> 1. This problem exists on MySQL 4 on 2.4.28 kernel (Red Hat 7.2 base).
> 2. The exact same problem exists on MySQL 5 on 2.6.18 kernel (CentOS 
> 5.2 base).
> 3. The problem is masked when we change the ppp interface txqueuelen 
> to 3, thus causing dropped packets inside the Linux kernel and TCP 
> retransmits which slows down the connection.
> 4. The problem is masked when we enable tc (traffic shaping) on the 
> 2.6 kernel and set an absolute bandwidth limit on the PPP interface.  
> No specific shaping rules or prioritization is required, all we have 
> to do to change the queueing behavior is to enable a bandwidth limit.
> 5. The problem is masked when we change the Windows serial port RX 
> FIFO to some value that causes UART overruns and thus dropped packets 
> and TCP retransmits.
> 6. The problem is masked if we make the PPP MTU smaller than 1500.
> 7. If we FTP the file directly from the Linux server, the problem does 
> not exist.  Only downloads from "the other side" of the Linux server 
> cause this problem.
> 8. MySQL sets a non-zero TOS in the packet TCP header.  However we 
> mangled it with iptables and this had no effect on the queue discipline.
>
> The problem seems to be related to MySQL's ability (in all versions of 
> MySQL) to write packets to the transmit queue of the Linux kernel.  We 
> think it is something to do with the queue discipline for sockets or 
> socket buffers on a PPP interface, because when we set the interface 
> bandwidth with tc on the 2.6 kernel, the queue discipline seems to 
> change behavior.  We are pretty sure the problem does not exist on 
> ethernet interfaces, and since there is nothing about this in any 
> google search we must be the only people in the world doing MySQL 
> queries over a Linux PPP interface.
>
> Does anyone have any ideas what we could do to make MySQL behave like 
> all the other Linux services and write its packets to the network even 
> when the interface is busy?  Or else does anyone have any ideas what 
> we could change in the networking in the kernel to make it less 
> unfriendly to MySQL packets when the PPP interface is busy with 
> throughput-traffic?
>
> Thanks,
> Jim
>
> chris fedde wrote:
>> "In for a penny, in for a pound."  As they say.
>>
>> I'm sure you've turned on all the mysqld logging available.  The
>> --debug option turns on lots of diagnostics but unfortunately the
>> typical distro packages don't have the --debug  features enabled.
>> you might glean some information running strace on the mysqld running
>> process while the connection problem happens.
>>
>> That the problem is traffic related might be an indication of some
>> priority queueing behaviour  in an intermediate router or firewall
>> device.   Or perhaps the network has some asymmetric bandwidth feature
>> that expresses itself under load.
>>
>> When diagnosing  "heisenbugs" the key to solving them is isolating the
>> moving parts,  can you test connectivity to the sql server from some
>> client that is not across the serial link.  Do packet traces taken on
>> the client match those taken on the server?
>>
>> It does seem perplexing.   But it sounds like you have a reasonable
>> handle on it.
>>
>> chris
>>
>>
>>     

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://cluedenver.org/pipermail/clue-tech/attachments/20091125/f05f427c/attachment-0001.html


More information about the clue-tech mailing list