I have had a problem that has been plaguing me for eons, and today I finally stopped and dealt with it, I refused to let it plague me anymore. And to my surprise I was able to solve it pretty quickly.

I use the find command in Linux very often. I also use sendmail and support it. I have a few hundred users on my server and I occasionally need to help a user find a piece of information in their email. So I ssh into the server, grab my trusty find command and go to work, this is where the problems typically occur.

Find and Grep, best friends in my tool belt

I love the fact that find can be combined with other commands through -exec or can be piped to xargs. I use this command combination to search through text files for anything. Often I am looking for a password that someone has emailed, or I’m looking for a reference to a database in our php/included files. Sometimes it’s as simple as finding which script I have written that contains this variable or function.

find /home/username/Maildir/* -exec grep -l password {} \;

or

find /home/username/Maildir/* | xargs grep -l password

This combination is what I use most often, but I have seen people combine find with rm, or with mv, cp, or zip etc. There are limitless possibilities with this combination.

Until I hit the limit

There are limits, I know this. Memory limits, file size limits, etc. But for the average sysadmin like myself I hardly run into these limits.

Except with find.

I often find myself cd-ing into a directory and executing a command like this:

find * -exec grep -l this-cool-string {} \;

And it works. Most of the time… until the times when the shell would respond with

-bash: /usr/bin/find: Argument list too long

And it drove me nuts, there was some limit, obviously it was related to the number of files in my file list (5065 in this directory today). But the quantity of files (emails in this case) is precisely why I need to use find and grep. So I knew there had to be a way to get around this limit.  I searched the find man page, no go. I searched the web, no go.

I eventually ran into this discussion on argument list too long. Which gave me the hint that I was doing something wrong. I should not have to edit and recompile the Kernel to search through a file list of >2000 files.  I mean, this is why I switched to linux, way more power than the other guy’s OS. And linux has the ability to do superman-like things from the command line. I started in DOS so I was really eager to switch away from a GUI only based OS. Besides, the command line has been here from, well, er, the begining.

Power in the wrong hands

I started to think I was doing something wrong, and so I started removing things and found that this command would fail too:

[root@server]# find *
-bash: /usr/bin/find: Argument list too long

This lead me to search for the words wildcard and linux and find but to no avail.  Then I tried something. I replaced the asterisk with a period. This is when the lights clicked and I got my searching feature back.

find . -exec grep -l this-cool-string {} \;

I knew this had to do with globing but I couldn’t prove it on my own, and here is the best resource I could find:

…there is shell globing and regular expressions. Some programs in GNU/Linux use regular expressions and some use shell globing (such as the find utility). The BASH interpreter itself uses globing, so if you don’t quote an asterisk the interpreter will send all the files in the current directory that match the pattern and send them to the program, not the [filename] string itself…

from: http://www.kbrandt.com/2007/12/finding-files-in-linux-review.html

Essentially I was overcoming the buffer by sending all of the files at the same time, or sending all of the file contents to the buffer at the same time by using the asterisk. When using the period find behaves much more sane and sends one file name string at a time.

I wish I had known this a year ago when I first bumped into this.