Introduction to Pattern Matching in Perl: Concepts and Techniques
Explore the basics of pattern matching in Perl as discussed in Chapter 7. Learn how to scan strings for substrings, utilizing regular expressions and character classes. Discover how to apply quantifiers, alternation, and the greedy/non-greedy matching modes. This chapter covers practical examples and provides insights on using pattern matching in various applications, including file searching and data validation. Mastering these concepts will enhance your programming capabilities in Perl and facilitate effective data manipulation.
Introduction to Pattern Matching in Perl: Concepts and Techniques
E N D
Presentation Transcript
Perl Chapter 7 Pattern Matching
Introduction • Scanning strings for substrings useful in many applications • grep, find files, compilers, … • Pattern matching UNIX (egrep) and awk.. • Basis is regular expressions • from theory of computation? • Patterns are boolean expressions T/F • Patterns remember parts (list)
Syntax • m dl pattern dl [modifiers] • m is the operator • using / .. / as the delimiters makes m optional • Examples m ~pattern~ # ~ if / in pattern or /pattern/
Simple Patterns • Match individual char or character classes • 3 categories • normal chars– which match themselves • metachars, which have special meanings in patterns (\, $, ? , + ) • backslash will turn a meta char into a normal char \? • period • Escape sequences (\t) can appear in a pattern in which case they match themselves, if preceded by the \
Default string to match is $_ if (/snow/) { print “snow in \$_ \n”; } • /snow/ returns T/F • period matches any char expect a newline • /a../ would be an a followed by 2 non-newline chars
Matching Character classes • defined by placing chars in [ ]s • [A-Za-z] • [0-7] octal digit • [aeiou] • [^A-Za-z] chars NOT in char class
Common character classes • \d [0-9] • \D [^0-9] • \w [A-Za-z] a word char • \W [^A-Za-z] • \s [ \r\t\n\f] white space • \S [^ \r\t\n\f]
/[A-Z]”\s/ - matches uppercase letter, a double quote and a whitespace • /[\dA-Fa-f]/ - matches one Hex digit $pattern = “ slkdjfsdf”; if (/$pattern/) { …. }
Quantifiers • {n} - exactly n reps • {m, } – at least m reps • {m,n} - at least m, but not more than n /a{1,3}b}/ - matches ab, aab, aaab /(cats){3}/ - matches catscatscats /[abc]{1,2}/ - matches a, b, c, ab, ac, ba, bc, ca, cb • * 0 or more, including empty string • + 1 or more • ? 0 or 1 • . 1
/\w+/ matches 1 or more word-chars • /\d+\.\d+/ matches 1 or more digits, decimal, 1 or more digits (i.e., a real decimal number) Note \. matches decimal!! • /\$?\d+\.\d\d/ matches a price with or without $ • /ba(ll)*/ matches ba followed by 0 or more occurrences of string ll • /\d{3}-\d{2}-\d{4}/ matches SSN
Questions Assume $_ = “Tommie”; • Which m in Tommie does /m/ match? • What do these match? • /m*/ • /m+/ • /m*i/ • left most • matches empty string at beginning • matches mm • matches mmi
Matching • .* greedy mode (default) matches the max possible non-newline chars $_=“Bob Bobcat Bobolink”; /.*Bob/ will match the Bob in Bobolink Actually .* matches whole string, then backs up one character at a time until it finds a match for the rest of the pattern “Bob”, finding rightmost occurrence. Works that way for all quantified patterns.
Matching $_=“Freddie’s hot dogs are really hot!”; • /Fred+/ Fredd • /Fred+?/ ? minimal mode Fred • /.*hot/ last hot • /.*?hot/ first hot
Alternation • /a|e|i|o|u/ equivalent to /[aeiou]/ • /Fred|Mike|Dracula/ • left to right matching of alternatives • /Tom|Tommie/ never matches Tommie because leftmost pattern matched first • /to|too|two/ never matches too • Can use ( ) • /t(oo?|wo)/ to, too, or two
Precedence • From highest to lowest • () • Quantifiers • char sequence - [belly|belts|bells] • Alternation • Careful mixing alternation with char-class • [belly|belts|bells] eq to [belyts]
Binding operators • pattern can be matched to any string • connect string to pattern • $stringvar =~ /[,;:]/; finds pattern in $stringvar • $string !~ /[,;:]/; finds pattern, but inverts logic
Remembering matches $s = “TD ran for 305 yards today”; $s =~ /(\d+)(\w+)(\w+)/; print “$1 $2 $3 \n”; • prints 305 yards today • Matching parentheses $s =~ /((\d+)(\w+)(\w+))/; • $1 305 yards today • $2 305 • $3 yards • $4 today
Split with a pattern $s = “Betty, Bert, Bart, Bartholomew” @names = split /, /, $s $s = “Betty:778:Bert:222:Bart:43297:Bartholomew” $s =~ /:\d+:/ • $1 = Betty $2-Bert $3=Bart $4=Bartholomew
Substitutions $x = “no more apples!”; $x=~ s /apples/applets/; $x changed to “no more applets!” $x = “12034005”; $x =~ s/0//g; $x changes $x to “12345” • g modifier changes every occurrence
Translating characters • tr /search-list/replacement-list/ • tr /a-z/A-Z/; replaces all LC to UC, returns number replaced • tr /\./\./; replaces all . with ., but returns number of replacements (so in effect counts) $s = “Hello”; $s =~ tr /a-z/A-Z/; changes to HELLO, returns 4 (or true)