[clue-tech] [spam?] text processing howto

Dennis J Perkins dennisjperkins at comcast.net
Wed Oct 6 21:00:42 MDT 2010


On Wed, 2010-10-06 at 19:36 -0600, David L. Anselmi wrote:
> Bruce Ediger wrote:
> > Here's how to find the mean and median of a single-column file of numbers, in a
> > 2-line sh script:
> >
> > #!/bin/sh
> > # This costs memory two ways: (1) sort has to buffer up everything somehow, and
> > # awk has to have a huge array in-memory.
> > sort -n |
> > awk 'BEGIN{c=0;sum=0;}/^[^#]/{a[c++]=$1;sum+=$1;}END{ave=sum/c;if((c%2)==1){median=a[int(c/2)];}else{median=(a[c/2]+a[c/2-1])/2;}print sum,"    ",c,"   ",ave," ",median,"  ",a[0],"    ",a[c-1]}'
> 
> OK, so count statements rather than lines.  It's nonsense to count this as 2 lines when a) you 
> didn't need a new line after the | (so it would be one line), and b) the awk script has 900 statements.
> 
> > "awk" is a lot better than perl, and sometimes better than "cut" or "sed", in
> > shell scripts.
> 
> I agree with this, in shell scripts.  But if your awk script starts to exceed 80 characters it might 
> be better to use perl anyway.
> 
> Dave
> _______________________________________________
> clue-tech mailing list
> clue-tech at cluedenver.org
> http://cluedenver.org/mailman/listinfo/clue-tech

Possibly.  The only time I wrote big awk programs was on SCO Unix.  I
needed to process data that a very simplistic report language was
pulling out of a database.  No arrays or suboutines, and I needed them.
So I piped the data to awk to do the heavy lifting and generate the
report.  Why not perl?  I don't like it.  Aesthetics.  I could have used
Ruby but I decided to try using awk.  I found a few quirks but it did
the job.





More information about the clue-tech mailing list