Regular expression syntax cheat sheet
This page provides an overall cheat sheet of all the capabilities of RegExp
syntax by aggregating the content of the articles in the RegExp
guide. If you need more information on a specific topic, please follow the link on the corresponding heading to access the full article or head to the guide.
Character classes
Character classes distinguish kinds of characters such as, for example, distinguishing between letters and digits.
Characters | Meaning |
---|---|
[xyz]
|
Character class: Matches any one of the enclosed characters. You can specify a range of characters by using a hyphen, but if the hyphen appears as the first or last character enclosed in the square brackets, it is taken as a literal hyphen to be included in the character class as a normal character.
For example,
For example,
For example, |
|
Negated character class:
Matches anything that is not enclosed in the square brackets. You can specify a range
of characters by using a hyphen, but if the hyphen appears as the
first character after the Note: The ^ character may also indicate the beginning of input. |
. |
Wildcard:
Matches any single character except line terminators:
|
\d |
Digit character class escape:
Matches any digit (Arabic numeral). Equivalent to |
\D |
Non-digit character class escape:
Matches any character that is not a digit (Arabic numeral). Equivalent
to |
\w |
Word character class escape:
Matches any alphanumeric character from the basic Latin alphabet,
including the underscore. Equivalent to |
\W |
Non-word character class escape:
Matches any character that is not a word character from the basic
Latin alphabet. Equivalent to |
\s |
White space character class escape:
Matches a single white space character, including space, tab, form
feed, line feed, and other Unicode spaces. Equivalent to
|
\S |
Non-white space character class escape:
Matches a single character other than white space. Equivalent to
|
\t |
Matches a horizontal tab. |
\r |
Matches a carriage return. |
\n |
Matches a linefeed. |
\v |
Matches a vertical tab. |
\f |
Matches a form-feed. |
[\b] |
Matches a backspace. If you're looking for the word-boundary assertion
(\b ), see
Assertions.
|
\0 |
Matches a NUL character. Do not follow this with another digit. |
\cX
|
Matches a control character using
caret notation, where "X" is a letter from A–Z (corresponding to code points
|
\xhh
|
Matches the character with the code hh (two
hexadecimal digits).
|
\uhhhh
|
Matches a UTF-16 code-unit with the value
hhhh (four hexadecimal digits).
|
\u{hhhh} or \u{hhhhh}
|
(Only when the u flag is set.) Matches the character with
the Unicode value U+hhhh or U+hhhhh
(hexadecimal digits).
|
\p{UnicodeProperty} ,
\P{UnicodeProperty}
|
Unicode character class escape: Matches a character based on its Unicode character properties: for example, emoji characters, or Japanese katakana characters, or Chinese/Japanese Han/Kanji characters, etc.). |
\ |
Indicates that the following character should be treated specially, or "escaped". It behaves one of two ways.
Note: To match this character literally, escape it
with itself. In other words to search for |
x|y
|
Disjunction:
Matches either "x" or "y". Each component, separated by a pipe (
Note: A disjunction is another way to specify "a set of choices", but it's not a character class. Disjunctions are not atoms — you need to use a group to make it part of a bigger pattern. |
Assertions
Assertions include boundaries, which indicate the beginnings and endings of lines and words, and other patterns indicating in some way that a match is possible (including look-ahead, look-behind, and conditional expressions).
Boundary-type assertions
Characters | Meaning |
---|---|
^ |
Input boundary beginning assertion:
Matches the beginning of input. If the Note: This character has a different meaning when it appears at the start of a character class. |
$ |
Input boundary end assertion:
Matches the end of input. If the |
\b |
Word boundary assertion: Matches a word boundary. This is the position where a word character is not followed or preceded by another word-character, such as between a letter and a space. Note that a matched word boundary is not included in the match. In other words, the length of a matched word boundary is zero. Examples:
To match a backspace character ( |
\B |
Non-word-boundary assertion:
Matches a non-word boundary. This is a position where the previous and
next character are of the same type: Either both must be words, or
both must be non-words, for example between two letters or between two
spaces. The beginning and end of a string are considered non-words.
Same as the matched word boundary, the matched non-word boundary is
also not included in the match. For example,
|
Other assertions
Note:
The ?
character may also be used as a quantifier.
Characters | Meaning |
---|---|
x(?=y) |
Lookahead assertion:
Matches "x" only if "x" is
followed by "y". For example, |
x(?!y) |
Negative lookahead assertion:
Matches "x" only if "x"
is not followed by "y". For example, |
(?<=y)x |
Lookbehind assertion:
Matches "x" only if "x" is
preceded by "y". For example,
|
(?<!y)x |
Negative lookbehind assertion:
Matches "x" only if
"x" is not preceded by "y". For example,
|
Groups and backreferences
Groups and backreferences indicate groups of expression characters.
Characters | Meaning |
---|---|
(x) |
Capturing group:
Matches
A regular expression may have multiple capturing groups. In results,
matches to capturing groups typically in an array whose members are in
the same order as the left parentheses in the capturing group. This is
usually just the order of the capturing groups themselves. This
becomes important when capturing groups are nested. Matches are
accessed using the index of the result's elements ( Capturing groups have a performance penalty. If you don't need the matched substring to be recalled, prefer non-capturing parentheses (see below).
|
(?<Name>x) |
Named capturing group:
Matches "x" and stores it on
the groups property of the returned matches under the name specified
by
For example, to extract the United States area code from a phone
number, we could use |
(?:x) |
Non-capturing group:
Matches "x" but does not remember
the match. The matched substring cannot be recalled from the resulting
array's elements ( |
(?flags:x) , (?:flags-flags:x) |
Modifier:
Enables or disables the specified flags only to the enclosed pattern. Only the |
\n
|
Backreference:
Where "n" is a positive integer. Matches the same substring matched by
the nth capturing group in the regular expression
(counting left parentheses). For example,
|
\k<Name> |
Named backreference:
A back reference to the last substring matching the
Named capture group specified by
For example,
Note: |
Quantifiers
Quantifiers indicate numbers of characters or expressions to match.
Note: In the following, item refers not only to singular characters, but also includes character classes and groups and backreferences.
Characters | Meaning |
---|---|
x*
|
Matches the preceding item "x" 0 or more times. For example,
|
x+
|
Matches the preceding item "x" 1 or more times. Equivalent to
|
x?
|
Matches the preceding item "x" 0 or 1 times. For example,
If used immediately after any of the quantifiers |
x{n}
|
Where "n" is a non-negative integer, matches exactly "n" occurrences of
the preceding item "x". For example, |
x{n,}
|
Where "n" is a non-negative integer, matches at least "n" occurrences of
the preceding item "x". For example, |
x{n,m}
|
Where "n" and "m" are non-negative integers and |
|
By default quantifiers like
|