---
title: Howto grep
subtitle: The classic UNIX command
author: Seth
date: 2023-05-30 08:00
publish_date: 2023-05-30 08:00

hero_classes: text-light title-h1h2 overlay-dark-gradient hero-large parallax
hero_image: tech_gnu-solarized.png

show_sidebar: true
show_breadcrumbs: true
show_pagination: true

taxonomy:
    category: tech
    tag: [ tech, unix, hack, cyberpunk ]
---

One of the classic UNIX commands, developed way back in 1974 by Ken
Thompson, is the Global Regular Expression Print (`grep`) command. It's
so ubiquitous in computing that it's frequently used as a verb
(\"grepping through a file\") and, depending on just how geeky your
audience is, it fits nicely into real world scenarios, too (\"I'll have
to grep my memory banks to recall that information\"). In short, grep is
a way to search through a file for a specific pattern of characters. If
that sounds like the modern **Find** function available in any word
processor or text editor, then you've already experienced the effects
that grep has had on the computing industry.

Far from just being a quaint old command that's been supplanted by
modern technology, grep's true power lies in two aspects:

-   Grep works in the terminal and operates on streams of data, so you
    can incorporate it into complex processes. You can not only *find* a
    word in a text file, you can extract the word, send it to another
    command, and so on.

-   Grep uses regular expression to provide a flexible search capability

Learning the `grep` command is easy, although it does take some
practice. This article introduces you to some of the features I find
most useful.

# Installing grep {#_installing_grep}

If you're using Linux, you already have `grep` installed.

On macOS, you have the BSD version of `grep` installed. This differs
slightly from the GNU version, so if you want to follow along exactly
with this article then install GNU grep from a project like
Homebrew or MacPorts.

## Basic grep {#_basic_grep}

The basic grep syntax is always the same. You provide `grep` a pattern
and a file you want it to search. In return, grep prints to your
terminal each line with a match.

``` bash
$ grep gnu gpl-3.0.txt
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
<http://www.gnu.org/licenses/>.
<http://www.gnu.org/philosophy/why-not-lgpl.html>.
```

By default, the `grep` command is case-sensitive, so \"gnu\" is
different from \"GNU\" or \"Gnu\". You can make it ignore capitalization
with the `--ignore-case` option.

``` bash
$ grep --ignore-case gnu gpl-3.0.txt
                    GNU GENERAL PUBLIC LICENSE
  The GNU General Public License is a free, copyleft license for
the GNU General Public License is intended to guarantee your freedom to
GNU General Public License for most of our software; it applies also to
[...16 more results...]
<http://www.gnu.org/licenses/>.
<http://www.gnu.org/philosophy/why-not-lgpl.html>.
```

You can also make `grep` return all lines *without* a match by using the
`--invert-match` option:

``` bash
$ grep --invert-match \
--ignore-case gnu gpl-3.0.txt
                      Version 3, 29 June 2007

 Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
[...648 lines...]
Public License instead of this License.  But first, please read
```

## Pipes {#_pipes}

It's useful to be able to find text in a file, but the true power of
POSIX is its ability to chain commands together through \"pipes\". I find that
my best use of `grep` is when it's combined with other tools, like `cut`
or `tr` or `curl`.

For instance, assume I have a file that happens to list some technical
papers I would like to download. I could open the file and manually
click on each link, and then click through Firefox options to save each
file to my hard drive, but that's a lot of time and clicking. Instead, I
could grep for the links in the file, printing *only* the matching
string by using the `--only-matching` option:

``` bash
$ grep --only-matching http\:\/\/.*pdf example.html
http://example.com/linux_whitepaper.pdf
http://example.com/bsd_whitepaper.pdf
http://example.com/important_security_topic.pdf
```

The output is a list of URLs, each on one line. This is a natural fit
for how Bash processes data, so instead of having the URLs printed to my
terminal, I can just pipe them into `curl`:

    $ grep --only-matching http\:\/\/.*pdf \
    example.html | curl --remote-name

This downloads each file, saving it according to its remote filename
onto my hard drive.

My search pattern in this example may seem cryptic to you. That's
because it uses regular expression, a kind of \"wildcard\" language
that's particularly useful when searching broadly through lots of text.

## Regular expression {#_regular_expression}

Nobody is under the illusion that regular expression (\"regex\" for
short) is easy. However, I find it often has a worse reputation than it
deserves. Admittedly, there's the potential for people to get a little
*too clever* with regex until it's so unreadable and so specifically
broad that it folds in on itself, but you don't have to over-do your
regex. Here's a brief introduction to regex the way I use it.

First, create a file called `example.txt` and enter this text into it:

``` text
Albania
Algeria
Canada
0
1
3
11
```

The most basic element of regex is the humble `.` character. It
represents a single character.

``` bash
$ grep Can.da example.txt
Canada
```

The pattern `Can.da` successfully returned `Canada` because the `.`
character represented any *one* character.

The `.` wildcard can be modified to represent more than one character by
these notations:

-   `?` match the preceding item zero or one time

-   `*` match the preceding item zero or more times

-   `+` match the preceding item one or more times

-   `{4}` match the preceding item up to four (or any number you enter
    in the braces) times

Armed with this knowledge, you can practice regex on `example.txt` all
afternoon, seeing what interesting combinations you come up with. Some
won't work, others will. The important thing is to analyse the results
so you understand why.

For instance, this fails to return any country:

``` bash
$ grep A.a example.txt
```

It fails because the `.` character can only ever match a single
character, unless you level it up. Using the `*` character, you can tell
`grep` to match a single character zero or as many times as necessary
until reaching the end of the word. Because you know the list you're
dealing with, though, you know that *zero times* is useless in this
instance. There are definitely no three-letter country names in this
list. So instead, you can use `+` to match a single character at least
once, and then again as many times as necessary until the end of the
word:

``` bash
$ grep A.+a example.txt
Albania
Algeria
```

You can use square brackets to provide a list of letters:

``` bash
$ grep [A,C].+a example.txt
Albania
Algeria
Canada
```

This works for numbers, too. The results may surprise you:

``` bash
$ grep [1-9] example.txt
1
3
11
```

Are you surprised to see 11 in a search for digits 1 to 9?

What happens if you add 13 to your list?

The reason these numbers are returned is because they include 1, which
is among the list of digits to match.

As you can see, regex is something of a puzzle, but through
experimentation and practice you can get comfortable with it and use it
to improve the way you grep through your data.

## Grep is good

There are far more options for the `grep` command than demonstrated in
this article. There are options to better format results, list files and
line numbers containing matches, provide context for results by printing
the lines surrounding a match, and much more.

<div class="mxs_attribution">
<p><a href="https://linux.pictures/" target="_blank">Solarized GNU</a>
by Linux Pictures under the 
<a href="https://linux.pictures/about" target="_blank">idgaf</a> license.
</p></div>