Previous | Home | Terminology | Next
Chapter 19 - Regular Expressions
What are regular expressions?
Let's quote the author of "The Linux Command Line" book William E. Shotts, Jr
"Simply put, regular expressions are symbolic notations used to identify patterns in text. In some ways, they resemble the shell’s wildcard method of matching file and pathnames but on a much grander scale. Regular expressions are supported by many command-line tools and by most programming languages to facilitate the solution of text manipulation problems. However,
to further confuse things, not all regular expressions are the same; they vary slightly from tool to tool and from programming language to language. For our discussion, we will limit ourselves to regular expressions as described in the POSIX standard (which will cover most of the command-line tools), as opposed to many programming languages (most notably Perl ), which use slightly larger and richer sets of notations."
List of Regular Expressions Metacharacters
(can be escaped and treated literally with backslash)
Note:
As we can see, many of the regular-expression metacharacters are also characters that have meaning to the shell when expansion is performed. When we pass regular expressions containing metacharacters on the command line, it is vital that they be enclosed in quotes to prevent the shell from attempting to expand them.
Little explanation:
. matches any single character.
* matches zero or more of characters (including a character specified by a regular expression) that immediately precedes it.
? matches zero or one occurrences of the preceding regular expression.
+ matches one or more occurrences of the preceding regular expression.
^ matches the first character of regular expression, matches the beginning of the line.
$ matches the last character of regular expression, matches the end of the line.
[] matches any one of the class of characters enclosed between the brackets. A circumflex (^) as first character inside brackets reverses the match to all characters except newline and those listed in the class. A hyphen (-) is used to indicate a range of characters. The close bracket (]) as the first character in class is a member of the class. All other metacharacters lose their meaning when specified as members of a class.
[^bg]zip matches zip NOT preceded by bg.
{n} matches the preceding element if it occurs exactly n times.
{n,m} matches the preceding element if it occurs at least n times, but no more than m times.
{n,} matches the preceding element if it occurs n or more times.
{,m} matches the preceding element if it occurs no more than m times.
- is used to denote a range of characters (e.g. [a-z] a through z.
() matches an expression.
| acts as logical or (alternation).
\ treat following metacharacter as literal and NOT metacharacter.
What are regular expressions?
Let's quote the author of "The Linux Command Line" book William E. Shotts, Jr
"Simply put, regular expressions are symbolic notations used to identify patterns in text. In some ways, they resemble the shell’s wildcard method of matching file and pathnames but on a much grander scale. Regular expressions are supported by many command-line tools and by most programming languages to facilitate the solution of text manipulation problems. However,
to further confuse things, not all regular expressions are the same; they vary slightly from tool to tool and from programming language to language. For our discussion, we will limit ourselves to regular expressions as described in the POSIX standard (which will cover most of the command-line tools), as opposed to many programming languages (most notably Perl ), which use slightly larger and richer sets of notations."
List of Regular Expressions Metacharacters
(can be escaped and treated literally with backslash)
. * ? + ^ $ [ ] { } - ( ) | \
Note:
As we can see, many of the regular-expression metacharacters are also characters that have meaning to the shell when expansion is performed. When we pass regular expressions containing metacharacters on the command line, it is vital that they be enclosed in quotes to prevent the shell from attempting to expand them.
Little explanation:
. matches any single character.
* matches zero or more of characters (including a character specified by a regular expression) that immediately precedes it.
? matches zero or one occurrences of the preceding regular expression.
+ matches one or more occurrences of the preceding regular expression.
^ matches the first character of regular expression, matches the beginning of the line.
$ matches the last character of regular expression, matches the end of the line.
[] matches any one of the class of characters enclosed between the brackets. A circumflex (^) as first character inside brackets reverses the match to all characters except newline and those listed in the class. A hyphen (-) is used to indicate a range of characters. The close bracket (]) as the first character in class is a member of the class. All other metacharacters lose their meaning when specified as members of a class.
[^bg]zip matches zip NOT preceded by bg.
{n} matches the preceding element if it occurs exactly n times.
{n,m} matches the preceding element if it occurs at least n times, but no more than m times.
{n,} matches the preceding element if it occurs n or more times.
{,m} matches the preceding element if it occurs no more than m times.
- is used to denote a range of characters (e.g. [a-z] a through z.
() matches an expression.
| acts as logical or (alternation).
\ treat following metacharacter as literal and NOT metacharacter.
The lab in next post!