In some implementations of regular expressions, parentheses are literal and must be escaped in order to have their special meaning.
Pattern alternatives can be made a subpattern within a large regular expression by enclosing the vertical bar and the alternatives within parentheses.
Within square brackets, common ranges may be specified by start and end characters, with a dash in between (e.g., 0-9).
If a hat character appears as the first character within square brackets, the range is inverted so that a match occurs if any character other than the range specified within the square brackets is found.
Within square brackets, most metacharacters revert to their literal meaning. For example, [.] means a literal full stop.
In POSIX regular expressions, common character ranges can be specified using special character sequences of the form [:keyword:]. The advantage of this approach is that the regular expression will work in different languages. For example, [a-z] will not capture all characters in languages that include accented characters, but [[:alpha:]] will.
Modifiers specify how many times a subpattern can occur at once. The modifier relates to the subpattern that immediately precedes it in the regular expression. By default, this is just the previous character, but if the preceding character is a closing parenthesis then the modifier relates to the entire subpattern within the parentheses.
Paul Murrell
This document is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.