[clue-tech] [spam?] text processing howto

Bruce Ediger bediger at stratigery.com
Wed Oct 6 18:25:43 MDT 2010


On Wed, 6 Oct 2010, David L. Anselmi wrote:

> I knew AWK before I learned perl.  There's a relatively small set of problems that awk is as good or
> better than perl (like this one).  Beyond that AWK is not nearly as easy to use or read.  So you're

Here's how to find the mean and median of a single-column file of numbers, in a
2-line sh script:

#!/bin/sh
# This costs memory two ways: (1) sort has to buffer up everything somehow, and
# awk has to have a huge array in-memory.
sort -n |
awk 'BEGIN{c=0;sum=0;}/^[^#]/{a[c++]=$1;sum+=$1;}END{ave=sum/c;if((c%2)==1){median=a[int(c/2)];}else{median=(a[c/2]+a[c/2-1])/2;}print sum,"    ",c,"   ",ave," ",median,"  ",a[0],"    ",a[c-1]}'

"awk" is a lot better than perl, and sometimes better than "cut" or "sed", in
shell scripts.  And as far as regular expressions go, PCRE is indeed very
powerful.  A PCRE that can find prime numbers got bandied about a few weeks
ago, which proves that PCRE is Turing complete, and that it handles expressions
that are way more than "regular".

However, PCRE can take forever to match certain pathological strings:
http://swtch.com/~rsc/regexp/regexp1.html

I don't know about the GNU awk that shows up in most Linuxes, but the
original awk didn't have that problem.


More information about the clue-tech mailing list