[CLUE-Cert] [Fwd: using grep to detect numbers (from Tuesday's meeting)]

Sat Aug 28 11:48:40 MDT 2004

I had time to investigate using grep for detecting numbers instead of

scanning the string character by character, like you did.  Grep would 
process a string faster than using cut in a loop, but no one would 
notice anything in the problem we were working on.  It's hard to say 
which solution is better.  Using grep is smaller, but you need to 
understand regular expressions.

The simplest method is the script below.

[0-9] means that if the character being compared is a number from 0 to 
9, there is a match.

\+ means that one or more characters must match the preceding pattern, 
in this case [0-9], for there to be a match.  If we had used [0-9]*, 
then 0 or more characters that are a number would mean a match.

Note that some programs and programming languates use + instead of \+.

$? holds the result of the last executed command, in this case, the 
result comes from grep.  If it is 0, then grep found a match.

#!/bin/sh

echo -n "Number: "
read num

echo $num | grep '[0-9]\+' > /dev/null
if [ $? -eq 0 ]
then
   echo "number"
else
   echo "Nan"
fi

We can compress the script and make it more readable by combining two 
lines.  The "if" command always looks at the $? variable.  If this seems 
contradictory to saying "if [ $? -eq 0 ]", what is actually happening is 
that [ is actually a command that performs the operation between the 
braces and sets $?. 

if echo $num | grep '[0-9]\+' > /dev/null
then
   echo "number"
else
   echo "Nan"
fi

We can allow optional signed numbers by adding [+-]\?  \? means that 
there can be an optional + or - before the number.

if echo $num | grep '[+-]\?[0-9]\+' > /dev/null
then
   echo "number"
else
   echo "Nan"
fi

If you want floating point numbers, it gets more complicated.  The first 
grep can catch all but one pattern.  For example, it finds 12, 10. and 
2.1, but it can't find .2  We can fix this by doing a second comparision 
if the first one fails.  We use || for the OR-operation to say

         if   num = pattern1 || num = pattern2  which means    if   num 
= pattern1 OR   num = pattern2

This line is getting long, so we can break it into two lines to make it 
more readable, but we need to tell the shell to treat both lines as one 
line.  We do this by adding \ to the end of the line.  There cannot be 
any other characters, even spaces, after the backslash or the script 
won't work properly.

if echo $num | grep '[+-]\?[0-9]\+\.\?[0-9]*' > /dev/null   ||  \
  echo $num | grep '[+-]\?\.[0-9]\+' > /dev/null
then
   echo "number"
else
   echo "Nan"
fi

We can strip out any leading or trailing spaces that someone might 
accidentally type. 
^ means the beginning of the line, so  ^ * means zero or more spaces at 
the beginning of the line.
$ means end of the line, so  *$ means zero or more trailing spaces at 
the end of the line.

if echo $num | grep '^ *[+-]\?[0-9]\+\.\?[0-9]* *$' > /dev/null   ||  \
  echo $num | grep '^ [+-]\?\.[0-9]\+ *$' > /dev/null
then
   echo "number"
else
   echo "Nan"
fi

There is a Posix standard for some character combinations that we can 
use instead to improve readability, if such a thing can be said about 
regular expressions. :)

if echo $num | grep 
'^[[:space:]]*[+-]\?[[:digit:]]\+\.\?[[:digit:]]*[[:space:]]*$' > 
/dev/null   ||  \
  echo $num | grep '^ [+-]\?\.[[:digit:]]\+[[:space:]]*$' > /dev/null
then
   echo "number"
else
   echo "Nan"
fi