[clue-tech] [spam?] text processing howto

David L. Anselmi anselmi at anselmi.us
Wed Oct 6 19:36:23 MDT 2010


Bruce Ediger wrote:
> Here's how to find the mean and median of a single-column file of numbers, in a
> 2-line sh script:
>
> #!/bin/sh
> # This costs memory two ways: (1) sort has to buffer up everything somehow, and
> # awk has to have a huge array in-memory.
> sort -n |
> awk 'BEGIN{c=0;sum=0;}/^[^#]/{a[c++]=$1;sum+=$1;}END{ave=sum/c;if((c%2)==1){median=a[int(c/2)];}else{median=(a[c/2]+a[c/2-1])/2;}print sum,"    ",c,"   ",ave," ",median,"  ",a[0],"    ",a[c-1]}'

OK, so count statements rather than lines.  It's nonsense to count this as 2 lines when a) you 
didn't need a new line after the | (so it would be one line), and b) the awk script has 900 statements.

> "awk" is a lot better than perl, and sometimes better than "cut" or "sed", in
> shell scripts.

I agree with this, in shell scripts.  But if your awk script starts to exceed 80 characters it might 
be better to use perl anyway.

Dave


More information about the clue-tech mailing list