Mastering The Universal Search Tool
The command line is unusually rich and productive.
Its philosophy can’t be explained in a single sentence — It has to be experienced. But at the core is the idea that its power comes more from the interaction between programs than from the programs themselves.
One of the most iconic programs in Unix-like systems is
grep. It lets you search for patterns in text, making it a universal tool for many tasks. If you want to master the command line, learning
grep is a great start.
Grep is a “filter” program
The most important of the standard Unix programs are the “filters.” They read some input, perform a simple transformation on it, and write some output. They are designed to easily chain together in pipelines (
grep stands out as the most versatile of them — Between the simple filters (
sort), and the powerful but complex
A quick example
For now, let’s take
grep for a spin. Suppose we have a file called “poem”, and we want to look for the word “hero” in the poem. Just type “grep hero poem”:
grep can also look for lines that don’t match the pattern with the
You can think of it as “invert match”:
Grep can also search in several files, and there are options for counting,
numbering, and so on.
The man page synopsis of the
grep command is:
PATTERNS argument contains one or more patterns separated by newlines.
If you want multiple patterns but find that separating them by newlines is clumsy, you can pass them on multiple
-e PATTERNS options. You can also have your patterns in a separate file, and pass them with the
-f PATTERN_FILE option.
Pro-tip: you can pass
PATTERN_FILEto read patterns from standard input, which is useful in some pipeline situations.
PATTERNS should be quoted when using
grep in a shell because regular expressions’ special characters can overlap with the shell’s special characters.
Note: All examples from now on will wrap the search pattern in single quotes (
''), even if we don’t strictly need them. This default is suitable for real-life
grepbecause if you get zero results (always possible), you’ll want to modify the search with something that might otherwise clash with the shell, like a whitespace or a regex.
You can pass one or more files. A file named
- stands for standard input.
FILE is specified and the recursive flag (
-r) is used, grep defaults to search in every file under the current directory. If the recursive flag is absent, grep defaults to search only on standard input.
--ignore-case, to ignore case distinctions.
Match whole words
--word-regexp, to select only those lines containing matches that form whole words. It works by differentiating between “word constituent characters” (letters, digits and the underscore) and “non-word constituent characters” (all other characters).
It also recognizes words that are preceded and followed by the start or end of a line.
In short, it works similarly to a regular expression surrounded by
\>, which mean “start of word” and “end of word” respectively. It’s also similar to
\b in regular expressions, which matches any word boundary, whether it’s at the start or end.
Pro-tip: For the full details of the differences between
\>, check this guide.
Match an entire line
--line-regexp, to select only those matches that exactly match the whole line. Same as a regular expression surrounded by
$, which mean “start of line” and “end of line” respectively.
Count the number of matches
--count, to suppress normal output; Instead,
grep will print a count of matching lines for each input file.
Add some color: --color
Possible values are
auto will use colors if it
thinks your terminal supports it.
never is the default.
In fact, many Linux distros come with grep aliased to
grep --color auto.
The colors are configurable, but the defaults are as shown in this guide:
- Bold red for the matching text
- Magenta for file names
- Green for line numbers and byte offsets
- Cyan for separators
If you want to keep the colors in the output even when you redirect it somewhere else (like
less), you can use the option
Names of matching files
If you have multiple input files,
grep will prefix the file name for each match by default.
Suppose that the current directory contains our familiar “poem” file and another file with a poem by Charles Bukowski, and we want to look for the word “with” in any of those files. Just type
grep ‘with’ *:
--files-with-matches) to suppress normal output; Instead,
grep prints only the name of each input file from which there would be a match.
You can also pass
--files-without-match) to print the name of each input file from which there would be no match:
Stop after a number of matches
--max-count=NUM, to stop reading a file after
NUM matching lines.
This option reasonably adapts to other
-m NUM -vwill stop after the first
-m NUM -cwill not output a count greater than
-m 1also enables an advanced feature which is a variation of passing a file from standard input to a shell while loop. This can be leaner and faster than a regular
whileloop when you want to do something complex on every matching line of a log file:
while grep -m 1 'Grep'; do
done < poem
Print only the actual match
--only-matching, to print only the matched parts of a matching line, with each such match on a separate output line.
This comes in handy when you have regular expressions as your pattern — otherwise, you’re just wasting your time and energy on something you already know the answer to, which is a sure way to anger the Unix gods and invite their wrath! 😱
Keep it quiet with -q, --quiet, --silent
Do not write anything to standard output. Exit immediately with zero if
any match is found.
--no-messages— This flag suppresses error messages about nonexistent or unreadable files. Those messages occur frequently when using the
-r(recursive) flag. Sometimes we want to suppress them as they convey little useful information.
Output Prefix Control
As the name suggests, these are a series of flags that affect the prefix of each output line on
Prefix each line of output with the 1-based line number within its input file.
Print the file name for each match. This is the default when there is more than one file to search.
Suppress the prefixing of file names on output. This is the default when there is only one file (or only standard input) to search.
“Context lines” are non-matching lines that are near a matching line.
grep can show you these lines to give you a better picture of your search results and how they fit into the text.
Let’s look at an example:
Note the following about context lines:
- Matching lines normally have a
:to separate the actual line content from the prefix fields, if any. Instead, context lines use a
- If two consecutive groups would contain several matching lines, the groups are merged.
- Regardless of how the context line options are set,
grepnever outputs any given line more than once.
Here are the context line options:
Print NUM lines of context after matching lines.
Print NUM lines of context before matching lines.
Print NUM lines of context before and after matching lines. So,
-C 2 will
print a total of 5 lines: 2 before the match, the match itself, and 2 after.
-C option is more useful than
-B, unless you know that the relevant context is either before or after.
You can also pass
-NUM. For example,
-2 is the same as
When context lines are being shown, print
STRING instead of the default of
-- between groups of lines.
Or, just have no separator at all. Yolo.
Note: These options have no effect if
--only-matching) is specified.
File and Directory Selection Options
Ok, get ready for some advanced stuff.
A file is a sequence of bytes. Most files fall into two categories:
- Binary — a sequence of bytes whose meaning is determined by the program that reads it.
- Text — a sequence of natural-language characters (plus some formatting characters, such as “whitespace”), usually encoded in ASCII or UTF-8. By convention, lines are separated by newline characters (
While Grep works on any file, it usually reads from text files.
If a file’s data or metadata indicates that the file contains binary data (say, because of the presence of non-text bytes), then
grep will “suppress” printing matches. But even then,
grep will let you know if something matched inside of the binary file: it will print a message to standard error saying that a binary file matches.
By the way,
grep outputs “
binary file matches” because it would probably generate too much output or even crash your terminal if it attempted to output the full match — A binary file usually has no newline characters, so a match would be the entire file!
There are some flags that alter
grep’s behaviour with binary files:
grep discovers a binary file, it assumes that the file doesn’t match.
This is useful when searching recursively and you just don’t care about binary files.
Process a binary file as if it were text.
This is the “yolo” option and it’s usually a bad idea, as your terminal might interpret some of the binary as commands and crash.
A better option is to use the
strings utility to extract the text content from any file and then pipe it through
grep, like this:
strings binary-file | grep pattern
If an input file is a directory, use
ACTION to process it. Action can be:
read: Read the directories as if they were a normal file. But most
operating systems don’t allow you to read directories, so this will
print an error.
skip: Skip the directories.
recurse: Read all the files under each directory, recursively, following symbolic links only if provided explicitly as arguments.
--recursive, but it always follows symbolic links.
Exclude any command line file arguments (or any file found recursively) whose suffix or base name (the part after the last slash) matches the GLOB. As a glob, it can use
 as wildcards, and
\s to escape them.
--exclude, but it gets one or more
GLOBs from the provided
--exclude, but it only applies to directories. If a directory matches, it is skipped. It ignores redundant trailing slashes in the
--exclude. Both can be used at the same time. If contradicting,
the last one specified wins. If no
--exclude option matches,
the file is included unless the first option is
Gigachad-tip: If you have complex file requirements, the
findcommand might be more versatile than the
Ah, a true classic. It delimits option lists. It’s useful when searching on files that begin with
-. But who even names files like that in this day and age?
Warning, this is an advanced option: It replaces every new line in the input file with a binary
Basically, it turns the entire input into a single line. This allows
grep to match across lines, which goes completely against its primary directive.
It’s a bit hacky, so if you really need multi-line matching, you might want to check more advanced tools like
grep offers four variants:
- Basic Regular Expressions (BREs): Enabled by default, or with the
--basic-regexpoptions. This variant interprets patterns as basic regular expressions, as defined by POSIX.
- Extended Regular Expressions (EREs): Enabled with
--extended-regexp. Allows patterns to be interpreted as extended regular expressions, as defined by POSIX. In GNU Grep, BRE and ERE are the same, with different notation. In other grep implementations, BRE are generally more limited. Most of the time you want EREs as the notation is just better.
- Fixed Strings: Using
--fixed-strings, grep interprets
patterns as fixed strings instead of regular expressions. It significantly speeds up the search process.
- Perl-Compatible Regular Expressions (PCREs): With
p--perl-regexp, patterns are interpreted as PCREs. PCREs are highly versatile, often inspiring regex implementations in other languages. They include fancy features like lookbacks and backreferences.
grepis typically compiled with PCRE support. It’s worth noting that
grepprocesses text line by line, so PCRE directives that match line breaks won’t work as intended.
Gigachad note: Unix had
fgrep, historical counterparts to the modern
grep -F. They are deprecated and will be removed soon.
How to grep “ps” output like a true Chad?
ps -ef | grep '[c]ron'
The regular expression is pointless (
cron is the same as
[c]ron), but if the pattern had been written without the square brackets, it would have matched not only the
ps output line for
cron, but also the
ps output line for this
You can do “OR” with "|” on regex, but how do you do “AND”?
Just pipe two
grep 'foo' file | grep 'bar'
How to match empty lines?
'’ because that matches literally everything, use
How to match lines with only spaces?
How to search both in “stdin” and a file?
cat /etc/passwd | grep 'alain' - /etc/motd
What if you want a faster grep?
You might have heard that ripgrep is the new
grep. But the rumors of GNU Grep’s demise have been greatly exaggerated.
First of all, ask yourself this: Would you use a library that endorses murder? Ripgrep has no problem with assassination, it seems.
All joking aside, it is true that ripgrep is faster — Not to mention its trendy Rust language, and nice defaults for searching code.
But GNU Grep has been at the forefront of grep-ing for a long time, and its implementation is meant to change as faster algorithms are discovered in academia. Ripgrep might have declared victory too soon, for the slowest horse sometimes wins the race.
Think of ripgrep as the provider of a healthy dose of competition. And a great choice if you need more speed.
For a detailed comparison between GNU Grep and ripgrep, check this guide.
The journey into the world of
grep has unveiled its powers as an indispensable tool with a remarkable ability to search, filter, and manipulate text data.
If the discovery of
grep’s functionality has left you craving for more powerful text manipulation tools, then expanding your horizons to learn
awk is a natural progression.
Armed with the power of
awk there will be no obstacle in the text-processing realm that cannot be tamed.
Grep stands for Global regular expression print. As the name implies, Grep is used to search text files with regular expressions (shortly regex). It prints the lines matching the given pattern in a text file. If no file is given, grep will recursively search the given pattern in the files in current directory.What flavor regex does grep use? ›
grep understands three different versions of regular expression syntax: basic (BRE), extended (ERE), and Perl-compatible (PCRE).How to use grep to search command? ›
It is used to search text and strings in a given file. In other words, grep command searches the given file for lines containing a match to the given strings or words. It is one of the most useful commands on Linux and Unix-like system for developers and sysadmins.What is the basic use of grep? ›
Grep is a useful command to search for matching patterns in a file. grep is short for "global regular expression print". If you are a system admin who needs to scrape through log files or a developer trying to find certain occurrences in the code file, then grep is a powerful command to use.How do I match a specific character in grep? ›
To match a character that is special to grep –E, put a backslash ( \ ) in front of the character. It is usually simpler to use grep –F when you don't need special pattern matching.How to use grep for pattern matching? ›
You can specify a pattern to search with either the -e or -f option. If you do not specify either option, grep (or egrep or fgrep) takes the first non-option argument as the pattern for which to search. If grep finds a line that matches a pattern, it displays the entire line.What are the grep flag options? ›
The four most commonly used flags to grep are -i (case-insensitive search), -l (list only the names of matching files), -w (which matches whole words only), and -v (invert; this lists only the lines that do not match the pattern). Another less well-known flag that is rather useful is -e.What are grep tools? ›
grep is a tool that originated from the UNIX world during the 1970's. It can search through files and folders (directories in UNIX) and check which lines in those files match a given regular expression. grep will output the file names and the line numbers or the actual lines that matched the regular expression.How does grep regex work? ›
The grep command (short for Global Regular Expressions Print) is a powerful text processing tool for searching through files and directories. When grep is combined with regex (regular expressions), advanced searching and output filtering become simple.How many types of grep are there? ›
Due its varying functionalities, it has many variants including grep, egrep (Extended GREP), fgrep (Fixed GREP), pgrep (Process GREP), rgrep (Recursive GREP) etc. But these variants have minor differences to original grep which has made them popular and to be used by various Linux programmers for specific tasks.
To also show you the lines before your matches, you can add -B to your grep. The -B 4 tells grep to also show the 4 lines before the match. Alternatively, to show the log lines that match after the keyword, use the -A parameter. In this example, it will tell grep to also show the 2 lines after the match.Can grep search for multiple lines? ›
Using grep -P or ggrep -P to grep multiple lines
-z treats the input as a set of lines, each being terminated by a zero byte instead of a new line. Essentially this allows grep to treat the file as a whole line as opposed to multiple lines.
The four most commonly used flags to grep are -i (case-insensitive search), -l (list only the names of matching files), -w (which matches whole words only), and -v (invert; this lists only the lines that do not match the pattern). Another less well-known flag that is rather useful is -e.How do I show 4 lines before grep? ›
To also show you the lines before your matches, you can add -B to your grep. The -B 4 tells grep to also show the 4 lines before the match. Alternatively, to show the log lines that match after the keyword, use the -A parameter.What are grep patterns? ›
A grep pattern, also known as a regular expression, describes the text that you are looking for. For instance, a pattern can describe words that begin with C and end in l.What is better than grep? ›
Try a Grep alternative, such as ripgrep
There are several Grep-like code search tools that are worth your time, including: ack. The Silver Searcher ( ag ) git-grep ( git grep , within Git repositories only)
If you've used grep to search for text or patterns in files, you'll love ripgrep - a command-line utility tool written in Rust. By default, ripgrep will respect gitignore rules and automatically skip hidden files/directories and binary files. ripgrep is grep on steroids.