Regular Expressions for Beginners: Learn Regex in 10 Minutes
A gentle introduction to regular expressions. Learn the most important regex syntax, how to think in patterns, and practical examples for common tasks like email validation and text extraction.
December 24, 2024
Regular expressions — often called regex or regexp — are one of the most powerful and most intimidating tools in a developer's toolkit. A regex that validates an email address might look like an alien language: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$. But once you understand the building blocks, regex becomes intuitive — a concise language for describing patterns in text.
What is a Regular Expression?
A regular expression is a sequence of characters that defines a search pattern. When you apply a regex to a string, it finds all parts of the string that match the pattern. You can then extract those matches, replace them, split the string on them, or simply test whether a match exists.
Regex is built into almost every programming language and many tools. The same core syntax works in JavaScript, Python, Java, PHP, Ruby, Go, Perl, grep, sed, VS Code's find-and-replace, and many more.
The Building Blocks
Literal Characters
The simplest regex is just a literal string. The pattern cat matches the text "cat" wherever it appears in the input. Case-sensitive by default.
The Dot (.)
The dot matches any single character except a newline. The pattern c.t matches "cat", "cut", "cot", "c9t" — any character in the middle position.
Character Classes [ ]
Square brackets match any one character from the set. [aeiou] matches any vowel. [0-9] matches any digit. [a-zA-Z] matches any letter. [^0-9] (caret inside brackets) matches anything that is NOT a digit.
Shorthand Character Classes
\d— Any digit (equivalent to[0-9])\D— Any non-digit\w— Any word character: letters, digits, underscore ([a-zA-Z0-9_])\W— Any non-word character\s— Any whitespace (space, tab, newline)\S— Any non-whitespace
Anchors
Anchors match positions, not characters:
^— Start of the string (or start of line in multiline mode)$— End of the string (or end of line)\b— Word boundary (the position between a word character and a non-word character)
^\d+$ matches a string that consists entirely of digits (start, one or more digits, end). Without anchors, \d+ would match the digits inside "abc123def" too.
Quantifiers
Quantifiers specify how many times the preceding element must appear:
*— Zero or more times+— One or more times?— Zero or one time (makes it optional){n}— Exactly n times{n,}— n or more times{n,m}— Between n and m times
\d{3}-\d{4} matches a phone number format like "555-1234".
Alternation (|)
The pipe character acts like an OR operator. cat|dog matches either "cat" or "dog".
Groups ( )
Parentheses group parts of a pattern and capture the matched text. (cat|dog)s? matches "cat", "cats", "dog", or "dogs". Captured groups can be referenced in replacement strings (e.g., swap first and last name).
Non-capturing groups (?:...) group without capturing — useful for applying quantifiers without creating a capture group.
Common Regex Patterns Explained
Email Validation
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$Breakdown: Start (^), one or more word chars/dots/symbols before the @, the literal @, domain name characters, a literal dot, TLD of 2+ letters, end ($). Note: No regex perfectly validates all valid email addresses per RFC 5321. For strict validation, send a confirmation email.
Phone Number (flexible)
^[\+]?[(]?[0-9]{3}[)]?[-\s\.]?[0-9]{3}[-\s\.]?[0-9]{4,6}$Matches most common US/international phone number formats.
URL
https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)Hex Color Code
^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$Regex Flags (Modifiers)
Flags modify how the regex engine behaves:
i— Case-insensitive:/cat/imatches "Cat", "CAT"g— Global: find all matches, not just the firstm— Multiline:^and$match start/end of each lines— Dotall:.also matches newlines
Regex in JavaScript
// Test if a string matches a pattern
const isDigitOnly = /^\d+$/.test("12345"); // true
// Find all matches
const matches = "cat sat on mat".match(/\w+at/g); // ["cat", "sat", "mat"]
// Replace matches
const result = "2024-01-15".replace(/(\d{4})-(\d{2})-(\d{2})/, "$3/$2/$1");
// → "15/01/2024"Summary
Regular expressions are a mini-language for pattern matching in text. The key building blocks are: literal characters, the dot (any char), character classes ([...]), shorthand classes (\d \w \s), anchors (^ $), quantifiers (* + ? {n}), alternation (|), and groups ((...)). Start with simple patterns and build complexity gradually. Use a regex tester to experiment with patterns against real input before using them in production code.