Timing Diagram Editing and Analysis

B.3 Pattern Matching (Regular Expressions)

B.3 Pattern Matching (Regular Expressions)

Previous topic Next topic  

B.3 Pattern Matching (Regular Expressions)

Previous topic Next topic  

Regular expressions are the most common way to search and change text (strings) in Perl. Regular expressions are patterns that a string can be matched against. They are commonly used in many programs.

The simplest regular expression (or regex) is just a string of alphanumeric text. This will match anything that contains that text anywhere. For example, the regex 'SIG' will match 'SIG0', 'SIGNAL', or 'ASSIGN'.

Certain characters have a special meaning in a regular expression. Here's a table explaining the most common ones:

.

This is a stand-in for any character. For example, 'SIG.' will match 'SIG0' or 'SIG1', but not something that ends in 'SIG' (since it doesn't have any characters after the letter G).

[ ]

Defines a character class, which is a group of character that a regular expression will match any one of. 'p[aeiou]t' will match 'pat', 'pet', 'pit', 'pot', or 'put', but not 'pout' since it only matches one character.

-

Specifies a range of characters within a character class. '[h-o]' is equivalent to '[hijklmno]', and '[a-zA-Z]' will match any single uppercase or lowercase letter.

^

Putting this at the beginning of a character class inverts its meaning: the regular expression will match any character not in the set. 'si[^g]' will match 'silo' but not 'signal'.

*

Allows the preceding character (or character class) to be repeated any number of times, including zero. 'gro*ve' matches 'grve', 'grove', 'groove', or 'groooooove'.

+

Allows the preceding character (or character class) to be repeated any number of times, not including zero. 'es+h' will match 'mesh' and 'governessship' but not 'beehive'.

?

Makes the preceding character (or character class) optional: it can be found either zero or one times. 'b[ou]+y' matches 'by', 'boy', and 'buy', but not 'buoy'.

^

In addition to character class inversion, this matches the very beginning of a string. '^S' will match 'SIG0', but '^I' won't.

$

This matches the very end of a string. '0+$' matches 'SIG0', 'BUS 200', and 'CLK 3000', but not 'CLK 102'.

\

Use this before a special character to remove its meaning and treat it literally. 'p\.S' will match 'top.SIG0'. You can match a literal backslash with '\\'.

Complete documentation of Perl's regular expression engine is beyond the scope of this manual; Perl's documentation contains a great deal of information on regular expressions. Since Perl adds some enhancements to normal regular expressions, you should probably look at this section even if you are already familiar with regular expressions.

Here are a few web pages with more extensive information about regular expressions:

http://search.cpan.org/dist/perl/pod/perlre.pod#Regular_Expressions

http://www.regular-expressions.info/

http://www.developer.com/lang/article.php/3330231

http://www.ternent.com/tech/regexp.html

Note: Often, a complex regular expression is far easier to code than it is to read and interpret it later, so it's a good idea to document the purpose of a regular expression.