[clue-tech] [spam?] text processing howto
Dennis J Perkins
dennisjperkins at comcast.net
Wed Oct 6 21:00:42 MDT 2010
On Wed, 2010-10-06 at 19:36 -0600, David L. Anselmi wrote:
> Bruce Ediger wrote:
> > Here's how to find the mean and median of a single-column file of numbers, in a
> > 2-line sh script:
> >
> > #!/bin/sh
> > # This costs memory two ways: (1) sort has to buffer up everything somehow, and
> > # awk has to have a huge array in-memory.
> > sort -n |
> > awk 'BEGIN{c=0;sum=0;}/^[^#]/{a[c++]=$1;sum+=$1;}END{ave=sum/c;if((c%2)==1){median=a[int(c/2)];}else{median=(a[c/2]+a[c/2-1])/2;}print sum," ",c," ",ave," ",median," ",a[0]," ",a[c-1]}'
>
> OK, so count statements rather than lines. It's nonsense to count this as 2 lines when a) you
> didn't need a new line after the | (so it would be one line), and b) the awk script has 900 statements.
>
> > "awk" is a lot better than perl, and sometimes better than "cut" or "sed", in
> > shell scripts.
>
> I agree with this, in shell scripts. But if your awk script starts to exceed 80 characters it might
> be better to use perl anyway.
>
> Dave
> _______________________________________________
> clue-tech mailing list
> clue-tech at cluedenver.org
> http://cluedenver.org/mailman/listinfo/clue-tech
Possibly. The only time I wrote big awk programs was on SCO Unix. I
needed to process data that a very simplistic report language was
pulling out of a database. No arrays or suboutines, and I needed them.
So I piped the data to awk to do the heavy lifting and generate the
report. Why not perl? I don't like it. Aesthetics. I could have used
Ruby but I decided to try using awk. I found a few quirks but it did
the job.
More information about the clue-tech
mailing list