Subsections

13.1 Metacharacters

^
 
The “hat” character matches the start of a string or the start of a line of text.

$
 
The dollar character matches the end of a string or the end of a line of text.

.
 
The full stop character matches any single character of any sort.

( and )
 
Parentheses can be used to define a subpattern within a regular expression. This is useful for applying a modifier to more than a single character (see Section 13.1.2). This is also useful for retaining original portions of a string when peforming a search-and-replace operation (see Section 13.2).

In some implementations of regular expressions, parentheses are literal and must be escaped in order to have their special meaning.

|
 
The vertical bar character subdivides a regular expression into alternative subpatterns. A match is made if either the pattern to the left of the vertical bar or the pattern to the right of the vertical bar is found.

Pattern alternatives can be made a subpattern within a large regular expression by enclosing the vertical bar and the alternatives within parentheses.

13.1.1 Ranges

[ and ]
 
Square brackets in a regular expression are used to indicate a character range. A character range will match any character in the range.

Within square brackets, common ranges may be specified by start and end characters, with a dash in between (e.g., 0-9).

If a hat character appears as the first character within square brackets, the range is inverted so that a match occurs if any character other than the range specified within the square brackets is found.

Within square brackets, most metacharacters revert to their literal meaning. For example, [.] means a literal full stop.

In POSIX regular expressions, common character ranges can be specified using special character sequences of the form [:keyword:]. The advantage of this approach is that the regular expression will work in different languages. For example, [a-z] will not capture all characters in languages that include accented characters, but [[:alpha:]] will.


13.1.2 Modifiers

Modifiers specify how many times a subpattern can occur at once. The modifier relates to the subpattern that immediately precedes it in the regular expression. By default, this is just the previous character, but if the preceding character is a closing parenthesis then the modifier relates to the entire subpattern within the parentheses.

?
 
The question mark means that the subpattern can be missing or it can occur exactly once.

*
 
The asterisk character means that the subpattern can occur zero or more times.

+
 
The plus character means that the subpattern can occur one or more times.

Paul Murrell

Creative Commons License
This document is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.