[clue-tech] Fun perl script

Tue Aug 1 11:35:03 MDT 2006

On Tue, Aug 01, 2006 at 10:56:31AM -0600, Mike Staver wrote:
> I'll definitely share some stories as soon as I get it working here. I 
> tried what Jeff suggested, and it worked only for google searches. 
> Meaning, if I type in "Windows" under a search on google, the results 
> page is full of linux results. That's cool, but not what I was 
> expecting.  I was hoping to be able to have the any word on the page be 
> replaced, so I still have some tweaking to do.  I've tried different 
> variations like:
> 
> #!/usr/bin/perl
> $|=1;
> $count = 0;
> $pid = $$;
> while (<>) {
>         chomp $_;
>         if ($_ =~ /(.*\.jpg)/i) {
>                 $url = $1;
>                 system("/usr/bin/wget", "-q", 
> "-O","/space/WebPages/images/$pid-$count.jpg", "$url");
>                 system("/usr/bin/mogrify", 
> "-flip","/space/WebPages/images/$pid-$count.jpg");
>                 print "http://10.0.0.16/squid/$pid-$count.jpg\n";
>         }
>         elsif ($_ =~ /(.*\.gif)/i) {
>                 $url = $1;
>                 system("/usr/bin/wget", "-q", 
> "-O","/space/WebPages/images/$pid-$count.gif", "$url");
>                 system("/usr/bin/mogrify", 
> "-flip","/space/WebPages/images/$pid-$count.gif");
>                 print "http://10.0.0.16/squid/$pid-$count.gif\n";
> 
>         }
>         elsif ($_ =~ /(Clinton|Bush|Reagan|Castro|Gibson)/i) {
>                 print "Staver";
>         }
>         elsif ($_ =~ /(Windows|Microsoft)/i) {
>                 print "Linux";
>         }
>         else {
>                 print "$_\n";;
>         }
>         $count++;
> }
> 
> However, my variations to the script seem to have no effect on the 
> output...

It looks like this script is reading the file name, not the contents
of the file itself.

It looks like it is passed URLs as input, and if it is a .jpg or
a .gif file, it retrieves the file with wget and saves the file in
/space/WebPages/images/XXXX-YYY.jpg (or .gif), then outputs a
new URL to reference the file from 10.0.0.16 (presumably that server's
IP address).  If the URL isn't a .jpg or .gif, it just passes the URL
to the output.  Some other program actaully uses the URL later on.

It would be possible, perhaps, to put in a similar line for .html (or
other URL suffixes) that performs a similar wget, but pipes the
result through sed to perform the manipulations and saves the file,
then generate a new URL for the modified file and print that.

I believe that what the above change will accomplish is to replace URLs
that contain "Clinton|Bush|..." with the probably non-functional URL "Staver",
and "Windows|Microsoft" with "Linux".  Note that it is not replacing the
target word, but the entire URL, so if somebody requests the URL
"http://SecretsOfWindows.html", this script would return "Linux", which
probably would not work in whatever context a URL is wanted.

Marcus Hall
marcus at tuells.org