[clue-tech] recursive grep

marcus hall marcus at tuells.org
Mon Oct 18 14:45:01 MDT 2010


On Mon, Oct 18, 2010 at 02:26:55PM -0600, Greg Knaddison wrote:
> I believe that adding find into the mix can make the process slower
> than grep, unless you are using find to exclude some files.
> 
> It would be interesting to benchmark them.

Yes, I'm sure that it adds overhead, especially with the -exec option that
executes grep for each file.  That produces *lots* of process creation overhead
for each and every file.

It's much more efficient to do:
	find /etc/ -type f -print0 | xargs -0 grep -l needle

This will create one grep per bunch of files instead of for each file.  It
still will be slower than a single recursive grep, but probably not by
much.  If the recursive grep is getting hung up because of opening a device
file or pipe or something that is blocking, or if it is in a symbolic link
loop, then this should help that.

Note that the -print0 and -0 arguments set find and xargs up to delimit the
file names with a '\0' character instead of whitespace.  That makes them
handle files with whitespace in their names properly.

Finally, I will note that if you want to see what the recursive grep is
up to, it might be worth a quick peek at the open files.  You can find the
process ID with a ps -fae | grep grep (find the one with the -R argument!),
then execute:
	ls -l /proc/123/fd

This will show symbolic links to all of the open files of the process.  The
highest numbered file descriptor is likely the one that grep is currently
searching, and if that points to a file that is a named pipe or a device
file, then that is probably what is causing grep to hang.  If you run the
ls a few times and the open files are changing each time, then it looks
like grep is still running, so it's either got a lot of files to search
and is still working diligently, or perhaps the directory structure is
looping somehow (I don't recall if grep -R follows symbolic links, but that
is the most likely way to get into an infinite loop).

marcus hall
marcus at tuells.org


More information about the clue-tech mailing list