Interactive RegEx Tutorial


0 / 20 exercises completed

This hands-on tutorial will walk you through Regular Expressions step by step. Each section builds on the last. Complete exercises to unlock your progress. Don’t worry about memorizing everything — focus on understanding the patterns.

Three exercise types appear throughout:

  • Build it (Parsons): drag and drop regex fragments into the correct order.
  • Write it (Free): type a regex from scratch.
  • Fix it (Fixer Upper): a broken regex is given — debug and repair it.

Your progress is saved in your browser automatically.

Literal Matching

The simplest regex is just the text you want to find. The pattern cat matches the exact characters c, a, t — in that order, wherever they appear. This means it matches inside words too: cat appears in “education” and “scatter”.

Key points:

  • RegEx is case-sensitive by default: cat does not match “Cat” or “CAT”.
  • The engine scans left-to-right, reporting every non-overlapping match.

Character Classes

A character class [...] matches any single character listed inside the brackets. For example, [aeiou] matches any one lowercase vowel.

You can also use ranges: [a-z] matches any lowercase letter, [0-9] matches any digit, and [A-Za-z] matches any letter regardless of case.

To negate a class, place ^ right after the opening bracket: [^a-z] matches any character that is not a lowercase letter — digits, punctuation, spaces, etc.

Shorthand Classes & the Dot

Writing out full character classes every time gets tedious. RegEx provides shorthand escape sequences:

Shorthand Meaning Equivalent Class
\d Any digit [0-9]
\D Any non-digit [^0-9]
\w Any “word” character [a-zA-Z0-9_]
\W Any non-word character [^a-zA-Z0-9_]
\s Any whitespace [ \t\n\r\f]
\S Any non-whitespace [^ \t\n\r\f]

The dot . is a wildcard that matches any single character (except newline). Because the dot matches almost everything, it is powerful but easy to overuse. When you actually need to match a literal period, escape it: \.

Anchors & Boundaries

So far every pattern matches anywhere inside a string. Anchors constrain where a match can occur without consuming characters:

Anchor Meaning
^ Start of string (or line in multiline mode)
$ End of string (or line in multiline mode)
\b Word boundary — the point between a “word” character (\w) and a “non-word” character (\W), or vice versa

Anchors are critical for validation. Without them, the pattern \d+ would match the 42 inside "hello42world". Adding anchors — ^\d+$ — ensures the entire string must be digits.

Word boundaries (\b) let you match whole words. \bgo\b matches the standalone word “go” but not “goal” or “cargo”.

Quantifiers

Quantifiers control how many times the preceding element must appear:

Quantifier Meaning
* Zero or more times
+ One or more times
? Zero or one time (optional)
{n} Exactly n times
{n,} n or more times
{n,m} Between n and m times

Common misconception: * vs +

Students frequently confuse these two. The key difference:

  • a*b matches b, ab, aab, aaab, … — the a is optional (zero or more).
  • a+b matches ab, aab, aaab, … — at least one a is required.

If you want “one or more”, reach for +. If you genuinely mean “zero or more”, use *. Getting this wrong is one of the most common sources of regex bugs.

Alternation & Combining

The pipe | works like a logical OR: cat|dog matches either “cat” or “dog”. Alternation has low precedence, so gray|grey matches the full words — you don’t need parentheses for simple cases.

When you combine multiple regex features, patterns become expressive:

  • gr[ae]y — character class for the spelling variant.
  • \d{2}:\d{2} — two digits, a colon, two digits (time format).
  • ^(0[1-9]|1[0-2])/(0[1-9]|[12]\d|3[01])$ — a full date validator.

Start simple and add complexity only when tests demand it.

Greedy vs. Lazy

By default, quantifiers are greedy — they match as much text as possible. This often surprises beginners.

Consider matching HTML tags with <.*> against the string <b>bold</b>:

  • Greedy <.*> matches <b>bold</b> — the entire string! The .* gobbles everything up, then backtracks just enough to find the last >.
  • Lazy <.*?> matches <b> and then </b> separately. Adding ? after the quantifier makes it match as little as possible.

The lazy versions: *?, +?, ??, {n,m}?

Use the step-through visualizer in the first exercise below to see exactly how the engine behaves differently in each mode.

Groups & Capturing

Parentheses (...) serve two purposes:

  1. Grouping: Treat multiple characters as a single unit for quantifiers. (na){2,} means “the sequence na repeated 2 or more times” — matching nana, nanana, etc.

  2. Capturing: The engine remembers what each group matched, which is useful in search-and-replace operations (backreferences like \1 or $1).

If you only need grouping without capturing, use a non-capturing group: (?:...)

Lookaheads & Lookbehinds

Lookaround assertions check what comes before or after the current position without including it in the match. They are “zero-width” — they don’t consume characters.

Syntax Name Meaning
(?=...) Positive lookahead What follows must match ...
(?!...) Negative lookahead What follows must NOT match ...
(?<=...) Positive lookbehind What precedes must match ...
(?<!...) Negative lookbehind What precedes must NOT match ...

A classic use case: password validation. To require at least one digit AND one uppercase letter, you can chain lookaheads at the start: ^(?=.*\d)(?=.*[A-Z]).+$. Each lookahead checks a condition independently, and the .+ at the end actually consumes the string.

Lookbehinds are useful for extracting values after a known prefix — like capturing dollar amounts after a $ sign without including the $ itself.