Editing Regular expression (section)

====POSIX basic and extended====
In the [[POSIX]] standard, Basic Regular Syntax ('''BRE''') requires that the [[metacharacter]]s <code>(&nbsp;)</code> and <code>{&nbsp;}</code> be designated <code>\(\)</code> and <code>\{\}</code>, whereas Extended Regular Syntax ('''ERE''') does not.

{| class="wikitable"
|-
! Metacharacter
! Description
|- valign="top"
!<code>^</code>
|Matches the starting position within the string. In line-based tools, it matches the starting position of any line.
|- valign="top"
!<code>.</code>
|Matches any single character (many applications exclude [[newline]]s, and exactly which characters are considered newlines is flavor-, character-encoding-, and platform-specific, but it is safe to assume that the line feed character is included). Within POSIX bracket expressions, the dot character matches a literal dot. For example, <code>a.c</code> matches "abc", etc., but <code>[a.c]</code> matches only "a", ".", or "c".
|- valign="top"
!<code>[&nbsp;]</code>
|A bracket expression. Matches a single character that is contained within the brackets. For example, <code>[abc]</code> matches "a", "b", or "c". <code>[a-z]</code> specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: <code>[abcx-z]</code> matches "a", "b", "c", "x", "y", or "z", as does <code>[a-cx-z]</code>.
The <code>-</code> character is treated as a literal character if it is the last or the first (after the <code>^</code>, if present) character within the brackets: <code>[abc-]</code>, <code>[-abc]</code>, <code>[^-abc]</code>. Backslash escapes are not allowed. The <code>]</code> character can be included in a bracket expression if it is the first (after the <code>^</code>, if present) character: <code>[]abc]</code>, <code>[^]abc]</code>.
|- valign="top"
!<code>[^&nbsp;]</code>
|Matches a single character that is not contained within the brackets. For example, <code>[^abc]</code> matches any character other than "a", "b", or "c". <code>[^a-z]</code> matches any single character that is not a lowercase letter from "a" to "z". Likewise, literal characters and ranges can be mixed.
|- valign="top"
!<code>$</code>
|Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line.
|- valign="top"
!<code>( )</code>
|Defines a marked subexpression, also called a capturing group, which is essential for extracting the desired part of the text (See also the next entry, <code>\''n''</code>). ''BRE mode requires {{nowrap|<code>\(&nbsp;\)</code>}}.''
|- valign="top"
!<code>\''n''</code>
|Matches what the ''n''th marked subexpression matched, where ''n'' is a digit from 1 to 9. This construct is defined in the POSIX standard.<ref>{{cite book |section-url=https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_06 |publisher=The Open Group |title=The Open Group Base Specifications Issue 7, 2018 edition |section=9.3.6 BREs Matching Multiple Characters |year=2017 |access-date=December 10, 2023}}</ref> Some tools allow referencing more than nine capturing groups. Also known as a back-reference, this feature is supported in BRE mode.
|- valign="top"
!<code>*</code>
|Matches the preceding element zero or more times. For example, <code>ab*c</code> matches "ac", "abc", "abbbc", etc. <code>[xyz]*</code> matches "", "x", "y", "z", "zx", "zyx", "xyzzy", and so on. <code>(ab)*</code> matches "", "ab", "abab", "ababab", and so on.
|- valign="top"
!{{nowrap|<code>{''m'',''n''}</code>}}
|Matches the preceding element at least ''m'' and not more than ''n'' times. For example, <code>a{3,5}</code> matches only "aaa", "aaaa", and "aaaaa". This is not found in a few older instances of regexes. BRE mode requires <code>{{nowrap|\{''m'',''n''\}}}</code>.

|}

'''Examples:'''
* <code>.at</code> matches any three-character string ending with "at", including "hat", "cat", "bat", "4at", "#at" and " at" (starting with a space).
* <code>[hc]at</code> matches "hat" and "cat".
* <code>[^b]at</code> matches all strings matched by <code>.at</code> except "bat".
* <code>[^hc]at</code> matches all strings matched by <code>.at</code> other than "hat" and "cat".
* <code>^[hc]at</code> matches "hat" and "cat", but only at the beginning of the string or line.
* <code>[hc]at$</code> matches "hat" and "cat", but only at the end of the string or line.
* <code>\[.\]</code> matches any single character surrounded by "[" and "]" since the brackets are escaped, for example: "[a]", "[b]", "[7]", "[@]", "[]]", and "[ ]" (bracket space bracket).
* <code>s.*</code> matches s followed by zero or more characters, for example: "s", "saw", "seed", "s3w96.7", and "s6#h%(>>>m n mQ".

According to Russ Cox, the POSIX specification requires ambiguous subexpressions to be handled in a way different from Perl's. The committee replaced Perl's rules with one that is simple to explain, but the new "simple" rules are actually more complex to implement: they were incompatible with pre-existing tooling and made it essentially impossible to define a "lazy match" (see below) extension. As a result, very few programs actually implement the POSIX subexpression rules (even when they implement other parts of the POSIX syntax).<ref>{{cite web |title=Regular Expression Matching: the Virtual Machine Approach |url=https://swtch.com/~rsc/regexp/regexp2.html |author=Russ Cox |year=2009 |website=swtch.com |quote=Digression: POSIX Submatching}}</ref>