by andrew dougherty n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Regular Expressions in Pearl - Part II PowerPoint Presentation
Download Presentation
Regular Expressions in Pearl - Part II

Loading in 2 Seconds...

play fullscreen
1 / 11

Regular Expressions in Pearl - Part II - PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on

By Andrew Dougherty. Regular Expressions in Pearl - Part II. Overview. Grouping and hierarchical matching Extracting matches Matching repetitions. Grouping & Hierarchical Matching. Grouping allows parts of a regular expression to be treated as a single unit

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Regular Expressions in Pearl - Part II' - olinda


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
overview
Overview
  • Grouping and hierarchical matching
  • Extracting matches
  • Matching repetitions
grouping hierarchical matching
Grouping & Hierarchical Matching
  • Grouping allows parts of a regular expression to be treated as a single unit
    • Example - house (cat | keeper) #matches housecat or housekeeper
  • () – represent grouping
  • | - represent alternatives (or)
  • [] – a set of characters
  • Examples
    • /(a | b) b/ ; # matches ‘ab’ or ‘bb’.
    • /(ac | b) b/ ; # matches ‘acb’ or ‘bb’.
    • /(^a | b ) c/ ; # matches ‘ac’ at start of string or ‘bc’ anywhere in the string
    • /(a | [bc] ) d/ ; #matches ‘ad’, ‘bd’ or ‘cd’
    • /house(cat(s | ) | ) /; # matches ‘housecats’ or ‘housecat’ or ‘house’. Groups can be nested
  • Backtracking – The process of trying one alternative, seeing if it matches, and moving on to the next alternative if it doesn’t.
extracting matches
Extracting Matches
  • () – Also allow for the extraction of the parts of a string that matched.
  • $1, $2, ….$n are used by Perl as variables and store the parts of the regular expression that matched.
    • Example

# extract hours, minutes, seconds

If ($time =~ /(\d\d) : (\d\d) : (\d\d) /) #match hh:mm:ss format

{

$hours = $1;

$minutes = $2;

$seconds = $3;

}

This statement is the same as

($hours, $minutes, $seconds) = ($time =~ /(\d\d) : (\d\d) : (\d\d) /)

extracting matches cont
Extracting Matches Cont.
  • Backreferences - \1, \2, etc. Matching variables that can be used inside a regular expression.
    • Example, finding doubled words in text separated by a space, like ‘the the’.

/(\w \w \w) \s \1/ ; - The three letters and a space are assigned to the \1 variable which matches the occurrence of the same three letters appearing after the space.

    • Finding repeating patterns in 4 letters, 3 letters, 2 letters, and 1 letter.

% simple_grep ‘^(\w \w \w \w | \w \w \w | \w \w | \w) \1$’ /usr/dict/words

beriberi, booboo, coco, aa #all match the grep pattern

  • +[n], -[n] returns the positions of what was matched in the substring

$x = “Mmm...donut, thought Homer”; #String stored in variable x

$x = ~/^(Mmm | Yech) \. \. \. (donut | peas) /; #regular expression to be matched

foreach $expr (1..$#-)

{

print “Match $expr: ‘${$expr}’ at position ($- [$expr], $+[$expr]) \n”;

}

Prints

Match 1: ‘Mmm’ at position (0,3)

Match 2: ‘donut’ at position (6,11)

matching repetitions
Matching Repetitions
  • Quantifier metacharacters – Determine the number of repeats of a portion of a regular expression we consider to be a match.
    • ? a? = match ‘a’ 1 or 0 times.
    • * a* = match ‘a’ 0 or more times. (any number of times)
    • + a+ = match ‘a’ 1 or more times. (at least once)
    • {x, y} a{n, m} = match ‘a’ at least n times, but not more than m times.
    • {x, } a{n, } = match ‘a’ at least n or more times.
    • {x} a{n} = match ‘a’ exactly n times.
      • / [a-z]+ \s+ \d*/; #match a lowercase word, at least some space, and any number of digits
      • /(\w+) \s+ \1/; #match doubled words of arbitrary length (like ‘the the’)
      • /y(es)?/i; #matches ‘y’, ‘Y’, or a case-insensitive ‘yes’
      • $year =~ /\d{4} | \d{2}/; #makes sure year is 2 or 4 digits in length (like 10 or 2010)
matching repetitions cont
Matching Repetitions Cont.
  • Maximal match/greedy quantifier – Quantifiers that grab as much of the string as possible.
    • $x =~ /^ (.*) (cat) (.*)$/; #$1 = ‘the ’

#$2 = ‘cat’

#$3 = ‘ in the hat’

    • $x =~ /^(.*) (at) (.*)$/; #$1 = ‘the cat in the h’

#$2 = ‘at’

#$3 = ‘’ (no match)

matching repetitions cont1
Matching Repetitions Cont
  • Principle 0: Taken as a whole, any regular expression will be matched at the earliest possible position in the string.

$x = “The programming republic of Perl”;

$x =~ /^(.+) (e | r) (.*)$/; #$1 = ‘The programming republic of Pe’

#$2 = ‘r’

#$3 = ‘l’

  • Principle 1: In an alternation (a | b | c…) the leftmost alternative that allows a match for the whole regular expression will be the one used.

$x =~ /(m{1,2}) (.*)$/; #1 = ‘mm’

#2 = ‘ing republic of Pearl’

matching repetition cont
Matching Repetition Cont.
  • Principle 2: The maximal matching quantifiers will in general match as much of the string as possible while still allowing the whole expression to match.

$x =~ /.* (m{1,2}) (.*)$/; # $1 = ‘m’

# $2 = ‘ing republic of Perl’

  • Principle 3: If there are 2 or more elements in a regular expression, the leftmost greedy quantifier, if any, will match as much of the string as possible while still allowing the whole expression to match. The next greedy quantifier will take what’s left and still match and so on until all elements are gone.

$x =~ /(.?)(m{1,2})(.*)$/; # $1 = 'a'

# $2 = 'mm'

# $3 = 'ing republic of Perl'

matching repetitions cont2
Matching Repetitions Cont.
  • Minimal match/non-greedy quantifiers – Match a minimal piece of string.
    • ?? # a?? = match ‘a’ 0 or 1 times. Try 0 first, then 1.
    • *?
    • +?
    • {x, y}?
    • {x, }?
    • {x}?

$x = “The programming republic of Pearl”;

$x =~ /^(.+?) (e | r) (.*)$/; #$1 = ‘Th’

#$2 = ‘e’

#$3 = ‘ programming republic of Perl’

work cited
Work Cited
  • https://www.cs.drexel.edu/~knowak/cs265_fall_2010/perlretut_2007.pdf
  • http://www.cs.tut.fi/~jkorpela/perl/regexp.html