[CLUE-Tech] Sockets and http question

Sean LeBlanc seanleblanc at americanisp.net
Sun Dec 29 20:56:31 MST 2002


On 12-29 20:01, David Anselmi wrote:
> Sean LeBlanc wrote:
> [...]
> >I'm building ethereal port right now - I tried capturing packets with
> >ettercap since I already had that, but I wasn't getting the stuff being 
> >sent
> >by Lynx, only what was returning - even though it says it is sniffing both.
> >I ran this: ettercap -Nzs 192.168.1.3 <remotehostip>, FWIW.
> >
> >Well, here's the code:
> 
> Tried it.  I see that your test case spits out the google page but 
> doesn't return while yahoo does return when it's done.
> 
> I think this has to do with google sending a "Connection: keep-alive" 
> header.  It is trying to use the same connection for your next request 
> so the server (google) doesn't close the connection.  Yahoo does.
> 
> Ethereal shows you this nicely for allmusic.com.  Sniff a connection to 
> yahoo and then to allmusic.  Stop the capture and select the first 
> packet with destination {yahoo|allmusic}, right click and select follow 
> TCP stream.
> 
> The yahoo stream shows your get and yahoo's reply (you can see the cache 
> miss from americanisp).  The allmusic stream shows your get and the 
> reply for the main page and then additional gets and replies for all the 
> rest of the junk (images and so on) on the page.  That's what the keep 
> alive does.
> 
> The easy way to fix this, if it works, is to send "GET %s HTTP/1.0\r\n" 
> rather than HTTP/1.1.  Supposedly 1.0 doesn't support persistent 
> connections.  I say supposedly because google replies that it is using 
> HTTP/1.0 but sends "Connection: keep-alive" anyway.  You could also try 
> sending a "Connection: close" header in your get.  Failing that, I guess 
> you'd have to look for the </html> at the end of the reply.
> 
> You could also set a short timeout on the socket, so it closes quickly 
> when data stops coming, or use a timer in your while loop.  I'm not sure 
> how to do that and it would be troublesome with servers that are slow to 
> answer.  Looks like Google's keep-alive lasts about 2 minutes (seems 
> long but I'm sure they know what they're doing).

Sweet! Thanks. I was working on a quickie Java version and it was exhibiting
the same behavior. Sending 1.0 instead of 1.1 does the trick of having it
exit appropriately. It's always something small like that, isn't it? At
least it wasn't an off-by-one error this time...

Allmusic must have upgraded their servers, because it used to work just fine
with 1.1, and then suddenly....didn't. 

BTW, about checking for </html>, some of these were sending </body> as the
last thing. The specific page that I was having the trouble in the first
place on allmusic.com choked about 1/5 of the way through.

Here's similar functionality in an equally slapdash java program:

######## test.java #######

import java.net.*;
import java.io.*;

public class test
{
 public static void main(String args[])
 {
  try
  {
   Socket sock = new Socket(args[0], 80);
   OutputStream os = sock.getOutputStream();
   InputStream is = sock.getInputStream();
   String s =  "GET " + args[1] + " HTTP/1.0\r\n" +
   "Host: " + args[0] + ":80\r\n" +
   "Accept: text/html, text/*\r\n" +
   "Content-type: application/x-www-form-urlencoded\r\n" +
   "\r\n\r\n";

   byte b[] = s.getBytes();
   os.write(b);
   os.flush();
   int data;
   while ( (data=is.read())!=-1)
   {
    System.out.print( (char) data);
   }
  }
  catch (Exception e)
  {
   e.printStackTrace();
  }
 }
}


-- 
Sean LeBlanc:seanleblanc at americanisp.net  
http://users.americanisp.net/~seanleblanc/
Get MLAC at: http://sourceforge.net/projects/mlac/
I also believe that academic freedom should protect the right of a professor 
or student to advocate Marxism, socialism, communism, or any other minority 
viewpoint -- no matter how distasteful to the majority, provided... 
-Richard M. Nixon 
(contributed by Chris Johnston) 



More information about the clue-tech mailing list