Regular Expressions
Regular expressions (regex) let you describe text patterns — then search, validate, or transform strings that match those patterns. Java provides first-class regex support through the java.util.regex package, which is both powerful and highly performant.
What Is a Regular Expression?
A regular expression is a sequence of characters that defines a search pattern. You write a pattern like \d{3}-\d{4} and Java checks whether a string (say, a phone number) fits that shape.
Java’s regex engine follows the POSIX NFA (non-deterministic finite automaton) flavour — the same family used by Perl and Python — so experience in those languages transfers almost directly.
The Core Classes
Three classes do almost all the work:
| Class | Role |
|---|---|
Pattern | Compiles a regex string into an efficient internal form |
Matcher | Applies a compiled Pattern against a specific input string |
PatternSyntaxException | Thrown when your regex syntax is invalid |
You never instantiate Pattern with new. Instead, use the factory method Pattern.compile(regex).
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class RegexBasics {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("\\d+"); // one or more digits
Matcher matcher = pattern.matcher("Order 42 ships in 3 days");
while (matcher.find()) {
System.out.println("Found: " + matcher.group()
+ " at index " + matcher.start());
}
}
}
Output:
Found: 42 at index 6
Found: 3 at index 19
Quick Pattern Syntax Reference
Character Classes
| Syntax | Matches |
|---|---|
[abc] | a, b, or c |
[^abc] | anything except a, b, or c |
[a-z] | any lowercase letter |
[a-zA-Z0-9] | alphanumeric |
. | any character except newline |
\d | digit (0–9) |
\D | non-digit |
\w | word character ([a-zA-Z0-9_]) |
\W | non-word character |
\s | whitespace (space, tab, newline…) |
\S | non-whitespace |
Note: In a Java string literal you must escape the backslash, so
\din a regex is written"\\d"in Java source code.
Quantifiers
| Syntax | Meaning |
|---|---|
? | 0 or 1 times |
* | 0 or more |
+ | 1 or more |
{n} | exactly n times |
{n,} | at least n times |
{n,m} | between n and m times (inclusive) |
Append ? to make a quantifier lazy (match as few characters as possible): +?, *?, {n,m}?.
Anchors and Boundaries
| Syntax | Matches |
|---|---|
^ | start of input (or start of line with MULTILINE) |
$ | end of input |
\b | word boundary |
\B | non-word boundary |
Checking for a Full Match
matches() on Matcher (or the shorthand String.matches()) checks whether the entire string matches the pattern — useful for input validation.
public class EmailValidator {
public static void main(String[] args) {
String emailRegex = "^[\\w.+-]+@[\\w-]+\\.[a-zA-Z]{2,}$";
String[] emails = {"[email protected]", "bad@", "[email protected]"};
for (String email : emails) {
boolean valid = email.matches(emailRegex);
System.out.println(email + " -> " + (valid ? "valid" : "invalid"));
}
}
}
Output:
[email protected] -> valid
bad@ -> invalid
[email protected] -> valid
Tip:
String.matches(regex)implicitly anchors the pattern at both ends — it is equivalent toPattern.matches("^" + regex + "$", input). UseMatcher.find()when you only want to locate a match anywhere in the string.
Capturing Groups
Wrap part of a pattern in ( ) to capture that portion separately. Groups are numbered left to right by their opening parenthesis.
import java.util.regex.*;
public class DateParser {
public static void main(String[] args) {
Pattern p = Pattern.compile("(\\d{4})-(\\d{2})-(\\d{2})");
Matcher m = p.matcher("Invoice date: 2024-03-15");
if (m.find()) {
System.out.println("Year: " + m.group(1));
System.out.println("Month: " + m.group(2));
System.out.println("Day: " + m.group(3));
}
}
}
Output:
Year: 2024
Month: 03
Day: 15
Named Groups (Java 7+)
Use (?<name>...) to give a group a readable name instead of a number:
Pattern p = Pattern.compile("(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})");
Matcher m = p.matcher("2024-03-15");
if (m.matches()) {
System.out.println("Year: " + m.group("year"));
}
Find and Replace
Matcher.replaceAll() and replaceFirst() let you swap matched text:
public class Censor {
public static void main(String[] args) {
String text = "The price is 100 and 200 dollars.";
// Replace all numbers with ***
String censored = text.replaceAll("\\d+", "***");
System.out.println(censored);
}
}
Output:
The price is *** and *** dollars.
You can also use String.replaceAll() directly — it compiles the pattern internally every call, so prefer Pattern.compile() when you reuse the same pattern.
Splitting Strings
Pattern.split() (or String.split()) divides a string wherever the pattern matches:
public class SplitDemo {
public static void main(String[] args) {
String csv = "apple , banana, cherry , date";
// Split on comma + optional surrounding whitespace
String[] fruits = csv.split("\\s*,\\s*");
for (String fruit : fruits) {
System.out.println(fruit.trim());
}
}
}
Output:
apple
banana
cherry
date
Pattern Flags
Pass flags as a second argument to Pattern.compile() to change matching behaviour:
| Flag constant | Shorthand | Effect |
|---|---|---|
Pattern.CASE_INSENSITIVE | (?i) | Ignore letter case |
Pattern.MULTILINE | (?m) | ^/$ match line boundaries |
Pattern.DOTALL | (?s) | . matches newline too |
Pattern.COMMENTS | (?x) | Ignore whitespace and # comments in pattern |
Pattern p = Pattern.compile("hello", Pattern.CASE_INSENSITIVE);
System.out.println(p.matcher("Hello World").find()); // true
You can also embed flags inline: "(?i)hello" is equivalent.
Non-Capturing Groups and Lookaheads
Sometimes you want to group without capturing. Use (?:...):
// Match "color" or "colour" without capturing the 'u?'
Pattern p = Pattern.compile("colou?r");
Lookaheads let you assert context without consuming characters:
| Syntax | Meaning |
|---|---|
(?=...) | Positive lookahead — must be followed by |
(?!...) | Negative lookahead — must NOT be followed by |
(?<=...) | Positive lookbehind — must be preceded by |
(?<!...) | Negative lookbehind — must NOT be preceded by |
// Find "Java" only when followed by " 21"
Pattern p = Pattern.compile("Java(?= 21)");
Matcher m = p.matcher("Java 8 and Java 21 are LTS");
while (m.find()) {
System.out.println("Matched at: " + m.start()); // only the Java 21 occurrence
}
Under the Hood
How Pattern.compile() Works
Pattern.compile(regex) tokenises the regex string and constructs an internal NFA graph. Each node in the graph represents one regex “state” and holds a reference to the next possible states. This compilation step is relatively expensive — that is why you should store Pattern instances in a static field when the same pattern is used repeatedly rather than recompiling on every method call.
// Good practice — compile once
private static final Pattern PHONE =
Pattern.compile("\\+?\\d[\\d\\s()-]{7,14}\\d");
Backtracking and Catastrophic Backtracking
Java’s NFA engine uses backtracking: when a path fails it rewinds and tries an alternative. Most patterns are fine, but nested quantifiers like (a+)+ on a non-matching input can trigger exponential backtracking — sometimes called ReDoS. The fix is to use possessive quantifiers (++, *+) or atomic groups ((?>...)) where available, or restructure the pattern.
Warning: Never apply user-supplied regex strings directly in
Pattern.compile()without validation — a malicious input can cause catastrophic backtracking and hang your application.
Thread Safety
Pattern objects are immutable and thread-safe — share them freely across threads. Matcher objects are not thread-safe — create a new Matcher per thread (or per call) via pattern.matcher(input).
Practical Example: Password Strength Validator
import java.util.regex.*;
public class PasswordValidator {
// At least 8 chars, one uppercase, one lowercase, one digit, one special char
private static final Pattern STRONG =
Pattern.compile("^(?=.*[A-Z])(?=.*[a-z])(?=.*\\d)(?=.*[@#$%^&+=!]).{8,}$");
public static boolean isStrong(String password) {
return STRONG.matcher(password).matches();
}
public static void main(String[] args) {
System.out.println(isStrong("Weak1")); // false
System.out.println(isStrong("Str0ng@Pass!")); // true
}
}
Output:
false
true
Related Topics
- Strings — understand Java’s
Stringclass before applying regex to it - String Methods —
matches(),replaceAll(), andsplit()are the bridge betweenStringand regex - StringTokenizer — a simpler (but less flexible) alternative for splitting strings by delimiter
- Pattern Matching — Java 16+
instanceofpattern matching, a different but related “matching” concept - Stream API — combine regex with streams to filter and transform collections of strings elegantly
- Custom Exceptions — handle
PatternSyntaxExceptiongracefully in user-facing validation code