
Searching your file system can be tricky. For example, do you sometimes find it difficult to be specific or exact? Or perhaps it’s too noisy? Regex can solve these issues and more. It’s powerful, universal, and flexible, and the basics will carry you a very long way.
What is Regex?
Regex is a pattern-matching language; it’s a way to expressively describe patterns that match strings (e.g., words or sentences). For example, say you’re searching your hard drive for an image called foo, but you cannot remember if it’s a JPEG or a PNG. We can use regex with fd like this: fd ‘foo\.(jpg|png)’.
Many utilities make use of regex for searching, transforming, and interacting with text. For example, grep -E [regex], find -regex [regex], or fd [regex]. Using regex means that you can be very precise.
Regex is used everywhere. Most websites on the internet use it in one form or another. Regex is also common in utilities and applications, like ripgrep, Vim, Neovim, Emacs, and lots more.
Related
6 Ways to Find and Replace Text in the Linux Terminal
Find and replace text without leaving the Linux terminal.
The Different Flavors of Regex
Regex comes in different flavors, which essentially means different rules (aka syntax). There are many flavors, but they differ only in small ways. If you stick to the concepts that I cover later, they will work across most flavors and Linux utilities. You don’t need to think too hard about it.
PCRE (Perl Compatible Regular Expressions) is the most full-featured flavor. All the examples try to be compatible with PCRE.
When you become a pro, you can refer to the Wikipedia article that compares regex flavors or a comprehensive regex comparison table. Keep in mind that the concepts that you will learn should apply everywhere.
A Quick Overview of Regex to Dispel Any Mystery
The following list will introduce you gently to common regex features:
Concept |
Description |
---|---|
Character classes |
A list of specific characters that you want to match, e.g., [abc]. |
Match groups |
Brackets around related parts of the expression, like brackets in mathematics, e.g., (foo). |
Modifiers |
Change how the expression functions, e.g., case sensitivity. |
Anchors |
Define the start and end of a string, e.g., ^foo$. |
Quantifiers |
Indicate quantity, e.g., foo+, foo{3}, etc. |
Alternation |
Simply an or statement, e.g., foo|bar. |
DOTALL metacharacter |
Match anything, like a wildcard—it’s just a single period. |
You will mix and match these to describe a pattern.
These features only apply across the board if you use them with the command flags mentioned later. For example, grep -P foo.
The DOTALL metacharacter is like a wildcard because it matches everything. It’s simply a period. You will use this often in places where it should match anything.
Character Classes: Match Specific Characters in Any Order
Character classes are a list of characters, enclosed in square brackets, that you wish to match. For example, the following expression matches a, b, z, 1, 2, or 9:
[abz129]
This matches any alphanumeric character, upper or lowercase:
[a-zA-Z0-9]
The hyphen (-) has a special meaning in a character class, so if you want to match it literally, you must place it first [-a-z] or escape it [a-z\-].
Again, it’s important to understand that a character class matches exactly one character, unless you use a quantifier (covered later).
In the results box, you can see multiple characters highlighted. Each match corresponds to one of the characters in the character class.
If you look closely at the results box in the image, you will see that a single character class matches multiple characters. Global mode (g) is responsible for this. The global mode means that regex does not stop at the first match but instead keeps going and creates multiple matches.
Match Groups: Draw Boundaries Around Sub Expressions
In some ways, match groups are similar to brackets in mathematics. For example, when you write a mathematical expression like 1 + (2 / 2), it differs from (1 + 2) / 2. The calculation begins with the innermost brackets, which alters the result.
Brackets in regex work like boundaries; they group parts of the expression together. For example, foo(bar|baz) is not the same as foobar|baz, because the former will match foobar or foobaz; the latter will match foobar or baz.
Quantifiers: How to Specify Exact and Variable Amounts
Quantifiers allow us to define quantities. When we match a character with DOTALL or character classes, we use quantifiers to say how many. We can also apply quantifiers to match groups, so we can define quantities for entire expressions.
Match Zero or More Things With the Asterisk
The asterisk (*) metacharacter will match zero or more things. The following matches a, b, z, or an empty string:
[abz]*
Match One or More Things With the Plus Sign
The plus (+) metacharacter will match one or more things. The following matches one or more a, b, or z characters:
[abz]+
Make Things Optional With the Question Mark
The question mark (?) metacharacter makes the previous item optional. The following will match exactly one a, b, z, or nothing at all:
[abz]?
Define Exact Quantities With Curly Brackets
Curly brackets allow us to define an exact number. For example, the following will match a, b, or z exactly twice:
[abz]{2}
The following will match a, b, or z between 2 and 4 times:
[abz]{2,4}
A Summary of Quantifiers
- ?: Optional.
- *: Zero or more (zero means an empty string).
- +: One or more.
- {n,m}: Match between n and m items.
The plus (+) and question mark (?) metacharacters don’t work with most Linux utilities unless you use appropriate command flags. Flags are covered later.
Match as Much or as Little as You Want With Lazy and Greedy Quantifiers
Some quantifiers allow us to define an unspecified amount. For example, the plus sign (+) means one or more—anything greater than 0. The plus (+) and asterisk (*) metacharacters are greedy by default, which means that they try to match as much as possible. In contrast, we can make them lazy so that they match as little as possible.
Appending a question mark (?) makes them lazy. For example, the following will match a, b, or z, but it will stop after the first match (it’s lazy).
[abz]+?
The asterisk (*) metacharacter is similar except for one small detail: it matches zero or more items. The laziest possible match is zero, so the following will match nothing.
[abz]*?
Making a quantifier lazy can be more performant because it doesn’t need to process the entire string. If you’re searching millions of strings, matching only the first few characters can save a lot of time and resources.
Anchors Match the Start and End of Lines
Anchors are simple to understand. There are two, one that indicates the start of a line (^) and one that indicates the end ($). The following pattern matches foo exactly and nothing else:
^foo$
Modifiers: Flags That Change How Regex Works
Modifiers are a way to change how regex works. For example, we can make it case-sensitive. We’ve already looked at the global modifier, but it’s worth restating that they are flags that typically live at the end of an expression.
The Global Modifier
The global modifier (g) allows regex to continue searching for matches after it finds the first one, resulting in multiple matches. In contrast, disabling the global modifier causes regex to stop after finding the first match.
This expression matches individual uppercase letters and numbers. It ignores lowercase letters. Because the global modifier (g) is active, it matches multiple items.

Related
7 Linux Text-Processing Tips to Get the Most Out of Your Plain Text
The terminal’s not just for code anymore.
The Case Insensitivity Modifier
The case-insensitivity modifier (i) will match against both uppercase and lowercase when active.
The Multiline Modifier
Anchors define the start (^) and end ($) of a string. When the multiline modifier (m) is active, the anchors match the start and end of each line. When it’s inactive, the anchors match the entire string.
This is how the anchors behave when multiline mode is active:
^foo$
^bar$
And when it’s inactive:
^foo
bars$
To evaluate all lines when the multiline modifier (m) is active, you must also enable the global modifier (g). But remember that doing so will also create multiple matches.
Putting It All Together: Using Regex With Commands
So now for the grand finale: how do we put what we’ve learned to good use? As mentioned at the beginning, find, fd, grep, ripgrep, and sed all support regex. Pay attention to the command flags that I use; I chose them so that they use similar flavors.
For each command, I will use the following expression:
^.+/[fo]+\.(jpg|png)$
This pattern matches a POSIX path for a JPG or PNG file. For example:
/foo/bar/baz/foo.jpg
This expression covers everything that we’ve learned: anchors, character classes, quantifiers, alternation, match groups, and the DOTALL metacharacter. Here’s a summary of the expression (in order of appearance):
Segment |
Note |
---|---|
^ |
Match the start of the line. |
.+ |
Match any character, one or more times. |
/ |
Match a forward slash just before the file name. This literal slash defines a clear boundary for our file name, and the previous DOTALL will match all other path characters (including slashes). |
[fo]+ |
Match the letters f and o one or more times, e.g., fo, foo, ffoo, fffooo, fofofof. |
\. |
Match a literal period (not a DOTALL). |
(jpg|png) |
A match group; I’ve used it here to group these two patterns together. The pipe is called alternation, and using it like this means jpg or png. |
$ |
Match the end of the line. |
I will match all the expressions against a file (called examples) with the following contents:
/foo/bar/baz/foo.jpg
/one/two/thr/foo.png/this/should/not/match.jpg
For the find commands, I will create such files in my file system and search for them.
Using grep With Regex
For grep, we must use the -P flag, which enables its (limited) PCRE engine. PCRE is the most extensive flavor and supports almost every feature you can think of.
If you’re unfamiliar with grep, see this detailed guide on how to use it.
Using find With Regex
The find command supports multiple regex flavors. You can see a list of them with the following command:
find -regextype help
The flavor that most closely matches PCRE is posix-extended, aka POSIX ERE (Extended Regular Expressions). POSIX ERE is missing many advanced features, but it supports all the features that we’ve covered.
If this command seems long-winded, then you should probably use an alias to set the regex options by default.
Using fd With Regex
The fd command uses the regex Rust crate (a Rust package). The regex crate is nearly compatible with PCRE, so we can use it without much concern. However, by default, fd only matches against file names, so we must use the –full-path flag if we want to match against the entire file path.
Using ripgrep With Regex
Using regex with the ripgrep command is straightforward because it uses the regex Rust crate. As mentioned earlier, the Rust crate closely matches PCRE.
Using sed With Regex
The sed command is different from the others; it allows us to search and replace strings. If you’re unfamiliar with it, you should check out this great guide on how to use the sed command. The general form of the command is as follows:
s/pattern/replacement/
It’s not necessary to use forward slashes, because you can use any character that you wish. In the example below, I’ve used the pipe character instead of forward slashes so that I do not confuse sed. Using a pipe means that I do not need to escape the forward slashes in the provided replacement value (a path).
For sed, we need to use the GNU ERE regex flavor, and we do that with the -E flag. Everything that works in GNU ERE will work in PCRE, which is good for us.
Number 1 is the pattern. Number 2 is the desired value to replace matched lines with. Number 3 shows matched lines replaced with the new value. Number 4 is a line from the file that did not match and is left unmodified.
In the example, I extracted the results and replaced the file paths, similar to how cat and grep work, but sed also supports in-place editing via the -i flag.
So that’s it; those are the basics of regex. The basics will carry you very far. The important thing to remember is that many utilities use different regex flavors, and so, if you stray from the mentioned flags, you may find that some features do not work.
In addition to that, regex is something that you need to get your hands on before it truly makes sense. You can practice your skills, get insights, and learn more via regex101, a powerful online playground that provides tips and guidance.

Related
How to Get a Cheatsheet for Any Command in the Linux Terminal
Sometimes cheating is necessary.
Source link