[CLUE-Cert] [Fwd: using grep to detect numbers (from Tuesday's meeting)]
Dennis J Perkins
dperkins at techangle.com
Sat Aug 28 11:48:40 MDT 2004
I had time to investigate using grep for detecting numbers instead of
scanning the string character by character, like you did. Grep would
process a string faster than using cut in a loop, but no one would
notice anything in the problem we were working on. It's hard to say
which solution is better. Using grep is smaller, but you need to
understand regular expressions.
The simplest method is the script below.
[0-9] means that if the character being compared is a number from 0 to
9, there is a match.
\+ means that one or more characters must match the preceding pattern,
in this case [0-9], for there to be a match. If we had used [0-9]*,
then 0 or more characters that are a number would mean a match.
Note that some programs and programming languates use + instead of \+.
$? holds the result of the last executed command, in this case, the
result comes from grep. If it is 0, then grep found a match.
#!/bin/sh
echo -n "Number: "
read num
echo $num | grep '[0-9]\+' > /dev/null
if [ $? -eq 0 ]
then
echo "number"
else
echo "Nan"
fi
We can compress the script and make it more readable by combining two
lines. The "if" command always looks at the $? variable. If this seems
contradictory to saying "if [ $? -eq 0 ]", what is actually happening is
that [ is actually a command that performs the operation between the
braces and sets $?.
if echo $num | grep '[0-9]\+' > /dev/null
then
echo "number"
else
echo "Nan"
fi
We can allow optional signed numbers by adding [+-]\? \? means that
there can be an optional + or - before the number.
if echo $num | grep '[+-]\?[0-9]\+' > /dev/null
then
echo "number"
else
echo "Nan"
fi
If you want floating point numbers, it gets more complicated. The first
grep can catch all but one pattern. For example, it finds 12, 10. and
2.1, but it can't find .2 We can fix this by doing a second comparision
if the first one fails. We use || for the OR-operation to say
if num = pattern1 || num = pattern2 which means if num
= pattern1 OR num = pattern2
This line is getting long, so we can break it into two lines to make it
more readable, but we need to tell the shell to treat both lines as one
line. We do this by adding \ to the end of the line. There cannot be
any other characters, even spaces, after the backslash or the script
won't work properly.
if echo $num | grep '[+-]\?[0-9]\+\.\?[0-9]*' > /dev/null || \
echo $num | grep '[+-]\?\.[0-9]\+' > /dev/null
then
echo "number"
else
echo "Nan"
fi
We can strip out any leading or trailing spaces that someone might
accidentally type.
^ means the beginning of the line, so ^ * means zero or more spaces at
the beginning of the line.
$ means end of the line, so *$ means zero or more trailing spaces at
the end of the line.
if echo $num | grep '^ *[+-]\?[0-9]\+\.\?[0-9]* *$' > /dev/null || \
echo $num | grep '^ [+-]\?\.[0-9]\+ *$' > /dev/null
then
echo "number"
else
echo "Nan"
fi
There is a Posix standard for some character combinations that we can
use instead to improve readability, if such a thing can be said about
regular expressions. :)
if echo $num | grep
'^[[:space:]]*[+-]\?[[:digit:]]\+\.\?[[:digit:]]*[[:space:]]*$' >
/dev/null || \
echo $num | grep '^ [+-]\?\.[[:digit:]]\+[[:space:]]*$' > /dev/null
then
echo "number"
else
echo "Nan"
fi
More information about the clue-cert
mailing list