1 / 28

Regular Expressions

Regular Expressions. Concepts of Regular Expressions. Allow for fast, flexible and reliable string handling A pattern that matches(or doesn’t match) a string A program within a program - with its own language Can be found in sed, awk, grep, procmail, vi and others. Regular Expressions.

cady
Download Presentation

Regular Expressions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular Expressions

  2. Concepts of Regular Expressions • Allow for fast, flexible and reliable string handling • A pattern that matches(or doesn’t match) a string • A program within a program - with its own language • Can be found in sed, awk, grep, procmail, vi and others

  3. Regular Expressions • Using Simple Patterns $_ = “yabba dabba doo”; if (/abba/) { print “match!”; }

  4. Metacharacters . Matches a single char except newline /bet.y/ matches betty, betsy, bet=y, bet.y but not bety or betsey /bet\.y/ only matches bet.y

  5. Quantifiers * Match preceding item 0 or more times ab*a matches aa aba abba… a.*a matches aa azya a346a… + Match preceding item 1 or more times ab+a matches aba but not aa a.+a matches a1a, aza, but not aa ? Last item is optional. Will match once or not at all ab?a matches aa and aba only.

  6. Parentheses • Use parenthesis for grouping: /fred+/ matches freddddd /(fred)+/ matches fredfredfred /(fred)*/ matches hello world!

  7. Alternatives | means “or” /fred|barney|betty/ matches a string containing either of those 3 names. Example using whitepace: /fred( |\t)barney/

  8. Character Classes • A list of possible characters to match [abc]+ Match any string consisting of abc only. • Use “-” to specify ranges [a-zA-Z] • Generally used as part of a regular expression: if (/HAL-[0-9]+/)

  9. Negating Character Classes • Sometimes it’s easier to specify the characters you don’t want: [^abc] [^n\-z]

  10. Character Class Shortcuts • For frequently used character classes: [0-9] \d [A-Za-z0-9_] \w [\f\t\n\r ] \s [^\d] \D /HAL-\d+/ [\dA-Fa-f]+ [\d\D]

  11. Delimeters • // is actually a shortcut for m// • Like qw//, you can use any pair of delimeters: m(fred) m<fred> m{fred} m!fred! • If you use / as the delimeter, you can omit the “m”: /fred/

  12. Delimeters continued… • Choose a delimeter that doesn’t appear in your pattern. /^http:\/\// m#^http://#

  13. Optional Modifiers • Case insensitive: /fred/i #matches Fred, FRED, fred, fReD • Newline matching: “.” doesn’t normally match \n, use the s modifier to change this: $_ = “fred\nbarney”; if (/fred.*barney/s) { …….

  14. Anchors • Ensures pattern only matches from a certain spot: ^ Start of string $ End of string or newline /^fred/ # matches “frederick” but not “manfred” /rock$/ # matches “rock\n” not “rocks” /^\s*$/ # A line of whitespace

  15. Word Anchors • \b is the word boundary anchor. /\bfred\b/ • Matches once at beginning of word, once at end • Words are \w words - letters chars and underscores:

  16. Word Anchors continued • \B matches anything that is not a word boundary • Word boundaries make sure we don’t find: cat in delicatessen fish in selfishness

  17. The Binding Operator • By default, perl matches regular expressions against $_ • =~ tells it to match against another var: $var =~ /blah/ • This is not an assignment operator! The value of $var is unchanged.

  18. Binding Operator cont my $likes_perl = (<STDIN> =~ /\byes\b/i)

  19. Interpolating Into Patterns • Variables interpolated similar to in a double quoted string: my $what = “fred”; while (<>) { if (/^($what)/) …

  20. More interpolations Watch out for metacharacters: if $what contains “fred(barney” my $what = shift @ARGV; if (/^($what)/) Then regex becomes if (/^(fred(barney)/)

  21. The Match Variables • One match variable for each pair of parentheses in pattern: $1, $2, $3… $_ = “username = alison”; if (/username = (\w+)/) { print $1; }

  22. Persistance • Variables stay around until next successful pattern match • Don’t use the memory variables unless match worked! $wilma = ~/(\w+)/; print $1; OR if ($wilma =~ /(\w+)/) { print $1 } • Use vars within a few lines

  23. The Automatic Match Variables $& Part of string that was matched $` Part of string prior to match $’ Part of string after match if (“Nine Inch Nails”) =~ /\s(\w+)/) $& is “ Inch” $` is “Nine” $’ is “ Nails ”

  24. General Quantifiers • We have seen the quantifiers *, + and ? • For more control over quantifiers, use curly braces: /a{5,15}/ #matches 5 - 15 reps of a /a{2,4}/ # aa, aaa, aaaa • If you omit the second number, there is no upper limit

  25. Precedence in Regular Expressions • Parentheses • Quantifiers (*, +, ?, {x,x}) • Anchors (^, $, \b, \B) • Alternatives ( | )

  26. Examples of Precedence /^fred|barney$/ /^(fred|barney)$/ /(wilma|pebbles?)/ /^(\w+)\s+(\w+)$/ • Use parentheses to clarify precedence, but don’t forget to renumber memory refs!

  27. patterntest.plhttp://www.ece.curtin.edu.au/~rtos204/patterntest.plpatterntest.plhttp://www.ece.curtin.edu.au/~rtos204/patterntest.pl #!/usr/bin/perl while (<>) { chomp; if (/REGULAR EXPRESSION/) { print “Matched: |`<$&>$’|\n”; } }

  28. patterntest.pl in action bauhaus# ./patterntest.pl yabba dabba doo Matched: |y<abba> dabba doo| I just love abba Matched: |I just love <abba>|

More Related