Character classes
Character classes distinguish kinds of characters such as, for example, distinguishing between letters and digits.
Try it
Types
Characters | Meaning |
---|---|
[xyz]
|
Character class: Matches any one of the enclosed characters. You can specify a range of characters by using a hyphen, but if the hyphen appears as the first or last character enclosed in the square brackets, it is taken as a literal hyphen to be included in the character class as a normal character.
For example,
For example,
For example,
When the |
|
Negated character class:
Matches anything that is not enclosed in the square brackets. You can specify a range
of characters by using a hyphen, but if the hyphen appears as the
first character after the Note: The ^ character may also indicate the beginning of input. |
. |
Wildcard:
Matches any single character except line terminators:
|
\d |
Digit character class escape:
Matches any digit (Arabic numeral). Equivalent to |
\D |
Non-digit character class escape:
Matches any character that is not a digit (Arabic numeral). Equivalent
to |
\w |
Word character class escape:
Matches any alphanumeric character from the basic Latin alphabet,
including the underscore. Equivalent to |
\W |
Non-word character class escape:
Matches any character that is not a word character from the basic
Latin alphabet. Equivalent to |
\s |
White space character class escape:
Matches a single white space character, including space, tab, form
feed, line feed, and other Unicode spaces. Equivalent to
|
\S |
Non-white space character class escape:
Matches a single character other than white space. Equivalent to
|
\t |
Matches a horizontal tab. |
\r |
Matches a carriage return. |
\n |
Matches a linefeed. |
\v |
Matches a vertical tab. |
\f |
Matches a form-feed. |
[\b] |
Matches a backspace. If you're looking for the word-boundary assertion
(\b ), see
Assertions.
|
\0 |
Matches a NUL character. Do not follow this with another digit. |
\cX
|
Matches a control character using
caret notation, where "X" is a letter from A–Z (corresponding to code points
|
\xhh
|
Matches the character with the code hh (two
hexadecimal digits).
|
\uhhhh
|
Matches a UTF-16 code-unit with the value
hhhh (four hexadecimal digits).
|
\u{hhhh} or \u{hhhhh}
|
(Only when the u flag is set.) Matches the character with
the Unicode value U+hhhh or U+hhhhh
(hexadecimal digits).
|
\p{UnicodeProperty} ,
\P{UnicodeProperty}
|
Unicode character class escape: Matches a character based on its Unicode character properties: for example, emoji characters, or Japanese katakana characters, or Chinese/Japanese Han/Kanji characters, etc.). |
\ |
Indicates that the following character should be treated specially, or "escaped". It behaves one of two ways.
Note: To match this character literally, escape it
with itself. In other words to search for |
x|y
|
Disjunction:
Matches either "x" or "y". Each component, separated by a pipe (
Note: A disjunction is another way to specify "a set of choices", but it's not a character class. Disjunctions are not atoms — you need to use a group to make it part of a bigger pattern. |
Examples
Looking for a series of digits
In this example, we match a sequence of 4 digits with \d{4}
. \b
indicates a word boundary (i.e. do not start or end matching in the middle of a number sequence).
const randomData = "015 354 8787 687351 3512 8735";
const regexpFourDigits = /\b\d{4}\b/g;
console.table(randomData.match(regexpFourDigits));
// ['8787', '3512', '8735']
See more examples in the character class escape reference.
Looking for a word (from the latin alphabet) starting with A
In this example, we match a word starting with the letter A. \b
indicates a word boundary (i.e. do not start matching in the middle of a word). [aA]
indicates the letter "a" or "A". \w+
indicates any character from the Latin alphabet, multiple times (+
is a quantifier). Note that because we already match until there are no more word characters, an end \b
boundary is not necessary.
const aliceExcerpt =
"I'm sure I'm not Ada,' she said, 'for her hair goes in such long ringlets, and mine doesn't go in ringlets at all.";
const regexpWordStartingWithA = /\b[aA]\w+/g;
console.table(aliceExcerpt.match(regexpWordStartingWithA));
// ['Ada', 'and', 'at', 'all']
See more examples in the character class escape reference.
Looking for a word (from Unicode characters)
Instead of the Latin alphabet, we can use a range of Unicode characters to identify a word (thus being able to deal with text in other languages like Russian or Arabic). The "Basic Multilingual Plane" of Unicode contains most of the characters used around the world and we can use character classes and ranges to match words written with those characters.
const nonEnglishText = "Приключения Алисы в Стране чудес";
const regexpBMPWord = /([\u0000-\u0019\u0021-\uFFFF])+/gu;
// BMP goes through U+0000 to U+FFFF but space is U+0020
console.table(nonEnglishText.match(regexpBMPWord));
["Приключения", "Алисы", "в", "Стране", "чудес"];
See more examples in the Unicode character class escape reference.
Counting vowels
In this example, we count the number of vowels (A, E, I, O, U, Y) in a text. The g
flag is used to match all occurrences of the pattern in the text. The i
flag is used to make the pattern case-insensitive, so it matches both uppercase and lowercase vowels.
const aliceExcerpt =
"There was a long silence after this, and Alice could only hear whispers now and then.";
const regexpVowels = /[aeiouy]/gi;
console.log("Number of vowels:", aliceExcerpt.match(regexpVowels).length);
// Number of vowels: 26
See also
- Regular expressions guide
- Assertions guide
- Quantifiers guide
- Groups and backreferences guide
RegExp
- Regular expressions reference
- Character class:
[...]
,[^...]
- Character class escape:
\d
,\D
,\w
,\W
,\s
,\S
- Character escape:
\n
,\u{...}
- Disjunction:
|
- Unicode character class escape:
\p{...}
,\P{...}
- Wildcard:
.