StringTokenizer
StringTokenizer is a classic Java utility that breaks a string into smaller pieces — called tokens — based on one or more delimiter characters. It lives in java.util and has been part of Java since version 1.0, making it one of the oldest string-processing tools in the standard library.
Why StringTokenizer Exists
Before String.split() arrived in Java 1.4, StringTokenizer was the go-to way to parse delimited text like CSV lines, command strings, or configuration values. Today it is considered a legacy class, but it still appears in older codebases and is occasionally useful when you need a lightweight, allocation-friendly tokenizer without regex overhead.
Note: The official Java documentation itself recommends using
String.split()orjava.util.Scannerfor new code. But understandingStringTokenizeris valuable both for reading older code and for the rare cases where its simpler model fits perfectly.
Creating a StringTokenizer
StringTokenizer has three constructors:
import java.util.StringTokenizer;
// 1. Default delimiter: whitespace (\t, \n, \r, \f, and space)
StringTokenizer st1 = new StringTokenizer("hello world java");
// 2. Custom delimiter string
StringTokenizer st2 = new StringTokenizer("red,green,blue", ",");
// 3. Custom delimiter + return delimiters as tokens (true = include them)
StringTokenizer st3 = new StringTokenizer("a:b:c", ":", true);
The second argument is a delimiter string, not a regex. Every character in that string is treated as an independent delimiter. So "," means “comma is a delimiter”, not “the string , is a delimiter pattern”.
Iterating Over Tokens
The primary API is a small set of methods:
| Method | Returns | Description |
|---|---|---|
hasMoreTokens() | boolean | true if more tokens remain |
nextToken() | String | Returns the next token |
nextToken(String delim) | String | Changes delimiter mid-stream, then returns next token |
countTokens() | int | Estimates remaining tokens (without consuming them) |
hasMoreElements() | boolean | Same as hasMoreTokens() (implements Enumeration) |
nextElement() | Object | Same as nextToken() (implements Enumeration) |
The classic usage pattern:
import java.util.StringTokenizer;
public class TokenExample {
public static void main(String[] args) {
String sentence = "Java is fun to learn";
StringTokenizer st = new StringTokenizer(sentence);
System.out.println("Token count: " + st.countTokens());
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
}
}
Output:
Token count: 5
Java
is
fun
to
learn
Using a Custom Delimiter
import java.util.StringTokenizer;
public class CsvTokenizer {
public static void main(String[] args) {
String csv = "Alice,30,Engineer";
StringTokenizer st = new StringTokenizer(csv, ",");
String name = st.nextToken();
int age = Integer.parseInt(st.nextToken());
String role = st.nextToken();
System.out.println(name + " is " + age + " and works as " + role);
}
}
Output:
Alice is 30 and works as Engineer
Multiple Delimiter Characters
Every character you include in the delimiter string acts as its own delimiter. You can split on both commas and semicolons at once:
import java.util.StringTokenizer;
public class MultiDelim {
public static void main(String[] args) {
String data = "apple,banana;cherry,date";
StringTokenizer st = new StringTokenizer(data, ",;");
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
}
}
Output:
apple
banana
cherry
date
Returning Delimiters as Tokens
Pass true as the third constructor argument to include delimiter characters in the token stream. This is handy when you need to know which delimiter separated two values:
import java.util.StringTokenizer;
public class DelimAsToken {
public static void main(String[] args) {
StringTokenizer st = new StringTokenizer("a+b-c", "+-", true);
while (st.hasMoreTokens()) {
System.out.println("[" + st.nextToken() + "]");
}
}
}
Output:
[a]
[+]
[b]
[-]
[c]
Changing the Delimiter Mid-Stream
You can call nextToken(String newDelim) to switch to a different delimiter for just that one call — and all subsequent calls use the new delimiter until you change it again:
import java.util.StringTokenizer;
public class ChangeDelim {
public static void main(String[] args) {
// First token split by space, rest by comma
StringTokenizer st = new StringTokenizer("section1 a,b,c");
String section = st.nextToken(); // uses space
String rest = st.nextToken(","); // switches to comma, gets "a"
String b = st.nextToken(); // still comma, gets "b"
String c = st.nextToken(); // still comma, gets "c"
System.out.println(section + " | " + rest + " | " + b + " | " + c);
}
}
Output:
section1 | a | b | c
StringTokenizer vs String.split() vs Scanner
Choosing the right tool matters. Here is a quick comparison:
| Feature | StringTokenizer | String.split() | Scanner |
|---|---|---|---|
| Regex support | No | Yes | Yes |
| Returns array | No (iterator style) | Yes | No (stream style) |
| Empty tokens | Skipped silently | Included | Skipped |
| Performance | Fastest (no regex) | Moderate | Flexible |
| Recommended for new code | No (legacy) | Yes (simple cases) | Yes (flexible parsing) |
| Java version | 1.0+ | 1.4+ | 5+ |
Warning:
StringTokenizersilently skips consecutive delimiters — it never gives you an empty token. If you have"a,,b"and split on",", you get"a"and"b"with no indication of the missing middle field.String.split(",")returns["a", "", "b"], preserving the empty slot, which is usually what you want for structured data.
For most new code, prefer String.split() for simple splitting or Scanner for interactive / stream-based parsing.
Under the Hood
StringTokenizer is intentionally simple. Internally it keeps three pieces of state:
currentPosition— index into the original string where scanning should resume.maxPosition— the length of the string (end boundary).delimiters— the delimiter string you provided (or the default whitespace set).
When you call nextToken(), it:
- Skips forward past any delimiter characters starting at
currentPosition. - Scans forward until it hits the next delimiter or the end of the string.
- Returns the substring between those two positions and advances
currentPosition.
Because it works directly on the original String and uses String.substring() internally, there is no regex compilation, no array allocation, and no Pattern/Matcher overhead. For tight loops that parse millions of simple delimited lines, this can be measurably faster than split().
However, modern JVMs have closed most of that gap, and String.split() with a single-character non-regex delimiter is heavily optimized since Java 8 — it takes a fast path that avoids regex entirely when the delimiter is a single character with no special regex meaning.
StringTokenizer also implements the Enumeration<Object> interface (a legacy precursor to Iterator), which is why it has hasMoreElements() and nextElement() alongside the more readable hasMoreTokens() / nextToken() pair.
Common Pitfalls
- Missing empty tokens. As noted above, consecutive delimiters produce no empty token. This will silently corrupt structured data with optional fields.
- Not thread-safe. Each
StringTokenizerinstance is stateful; never share one across threads without external synchronization. - Delimiter characters, not strings.
new StringTokenizer(s, "->")treats-and>as two separate one-character delimiters, not the literal two-character sequence->. UseString.split("->")if you need a multi-character delimiter. countTokens()is an estimate. It counts fromcurrentPositionto the end, so its value decreases as you consume tokens. Calling it before iterating is fine; calling it mid-loop gives you remaining tokens, not total tokens.
import java.util.StringTokenizer;
public class PitfallDemo {
public static void main(String[] args) {
// Consecutive delimiters: empty field is lost!
StringTokenizer st = new StringTokenizer("Alice,,Engineer", ",");
System.out.println(st.countTokens()); // 2, not 3!
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
}
}
Output:
2
Alice
Engineer
The age field disappears entirely. With String.split(",") you would get ["Alice", "", "Engineer"] and could detect the missing value.
Related Topics
- String Methods — the full reference for
Stringinstance methods includingsplit(),indexOf(), andsubstring() - Scanner — a flexible, regex-powered alternative for parsing strings, files, and streams token by token
- Strings — the foundation page covering
Stringcreation, immutability, and the string pool - Regular Expressions —
PatternandMatcherfor powerful pattern-based splitting and matching - StringBuilder — for building strings efficiently when constructing output from parsed tokens