[clue-tech] Fun perl script
marcus hall
marcus at tuells.org
Tue Aug 1 11:35:03 MDT 2006
On Tue, Aug 01, 2006 at 10:56:31AM -0600, Mike Staver wrote:
> I'll definitely share some stories as soon as I get it working here. I
> tried what Jeff suggested, and it worked only for google searches.
> Meaning, if I type in "Windows" under a search on google, the results
> page is full of linux results. That's cool, but not what I was
> expecting. I was hoping to be able to have the any word on the page be
> replaced, so I still have some tweaking to do. I've tried different
> variations like:
>
> #!/usr/bin/perl
> $|=1;
> $count = 0;
> $pid = $$;
> while (<>) {
> chomp $_;
> if ($_ =~ /(.*\.jpg)/i) {
> $url = $1;
> system("/usr/bin/wget", "-q",
> "-O","/space/WebPages/images/$pid-$count.jpg", "$url");
> system("/usr/bin/mogrify",
> "-flip","/space/WebPages/images/$pid-$count.jpg");
> print "http://10.0.0.16/squid/$pid-$count.jpg\n";
> }
> elsif ($_ =~ /(.*\.gif)/i) {
> $url = $1;
> system("/usr/bin/wget", "-q",
> "-O","/space/WebPages/images/$pid-$count.gif", "$url");
> system("/usr/bin/mogrify",
> "-flip","/space/WebPages/images/$pid-$count.gif");
> print "http://10.0.0.16/squid/$pid-$count.gif\n";
>
> }
> elsif ($_ =~ /(Clinton|Bush|Reagan|Castro|Gibson)/i) {
> print "Staver";
> }
> elsif ($_ =~ /(Windows|Microsoft)/i) {
> print "Linux";
> }
> else {
> print "$_\n";;
> }
> $count++;
> }
>
> However, my variations to the script seem to have no effect on the
> output...
It looks like this script is reading the file name, not the contents
of the file itself.
It looks like it is passed URLs as input, and if it is a .jpg or
a .gif file, it retrieves the file with wget and saves the file in
/space/WebPages/images/XXXX-YYY.jpg (or .gif), then outputs a
new URL to reference the file from 10.0.0.16 (presumably that server's
IP address). If the URL isn't a .jpg or .gif, it just passes the URL
to the output. Some other program actaully uses the URL later on.
It would be possible, perhaps, to put in a similar line for .html (or
other URL suffixes) that performs a similar wget, but pipes the
result through sed to perform the manipulations and saves the file,
then generate a new URL for the modified file and print that.
I believe that what the above change will accomplish is to replace URLs
that contain "Clinton|Bush|..." with the probably non-functional URL "Staver",
and "Windows|Microsoft" with "Linux". Note that it is not replacing the
target word, but the entire URL, so if somebody requests the URL
"http://SecretsOfWindows.html", this script would return "Linux", which
probably would not work in whatever context a URL is wanted.
Marcus Hall
marcus at tuells.org
More information about the clue-tech
mailing list