1 / 17

LING/C SC/PSYC 438/538

LING/C SC/PSYC 438/538. Lecture 7 Sandiway Fong. Administrivia. Reminder Perl homework on repeated word detection due Thursday!. Chapter 2: JM. Today Let’s use your newly acquired Perl skills on Regular Expressions (section 2.1 of the textbook) Online tutorials

kirima
Download Presentation

LING/C SC/PSYC 438/538

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING/C SC/PSYC 438/538 Lecture 7 Sandiway Fong

  2. Administrivia • Reminder • Perl homework on repeated word detection due Thursday!

  3. Chapter 2: JM • Today • Let’s use your newly acquired Perl skills on Regular Expressions (section 2.1 of the textbook) • Online tutorials • http://perldoc.perl.org/perlrequick.html • http://perldoc.perl.org/perlretut.html

  4. Pattern Matching JM, Chapter 2, pg 17 Merriam-Webster online

  5. Chapter 2: JM • Perl regular expression (re) matching: • $a =~ /foo/ • /…/ contains a regular expression • will evaluate to true/false depending on what’s contained in $a • Perl regular expression (re) match and substitute: • $a =~ s/foo/bar/ • s/…match… /…substitute… / contains two expressions • will modify $a by looking for a single occurrence of match and replacing that with substitute • s/…match… /…substitute… /gglobal match and substitute

  6. Chapter 2: JM • Most useful with the code template for reading in a file line-by-line: open($txtfile,$ARGV[0]) or die "$ARGV[0] not found!\n"; while ($line = <$txtfile>) { do RE stuff with $line }

  7. Chapter 2: JM character class: Perl lingo

  8. Chapter 2: JM backslash lowercase letter for class Uppercase variant for all but class

  9. Chapter 2: JM

  10. Chapter 2: JM Sheeptalk

  11. Chapter 2: JM

  12. Chapter 2: JM • Precedence of operators • Example: Column 1 Column 2 Column 3 … • /Column [0-9]+ */ • /(Column [0-9]+ *)*/ • /house(cat(s|)|)/ • Perl: • in a regular expression the pattern matched by within the pair of parentheses is stored in designated variables $1 (and $2 and so on) • Precedence Hierarchy: space

  13. Chapter 2: JM http://perldoc.perl.org/perlretut.html returns 1 (true) or “” (empty if false) A shortcut: list context for matching returns a list

  14. Chapter 2: JM • s/([0-9]+)/<\1>/ what does this do? Backreferences give Perl regexps more expressive power than finite state automata (fsa)

  15. Shortest vs. Greedy Matching • default behavior • in Perl RE match: longest possible matching string • aka “greedy matching” • This behavior can be changed, see following slide • RE search is supposed to be fast • but searching is not necessarily proportional to the length of the input being searched • in fact, Perl RE matching can can take exponential time (in length) • non-deterministic • may need to backtrack (revisit) if it matches incorrectly part of the way through linear time time length length exponential

  16. Shortest vs. Greedy Matching from http://www.perl.com/doc/manual/html/pod/perlre.html • Example: $_ = "The food is under the bar in the barn."; if ( /foo(.*)bar/ ) { print ”matched <$1>\n"; } • Notes: • default variable $_ is also the default variable for matching • variable $1 refers to the parenthesized part of the match (.*) • Output: • matched <d is under the bar in the > Default variable implicit $_ =~

  17. Shortest vs. Greedy Matching from http://www.perl.com/doc/manual/html/pod/perlre.html • Example: $_ = "The food is under the bar in the barn."; if ( /foo(.*?)bar/ ) { print ”matched <$1>\n"; } • Notes: • ? immediately following a repetition operator like * makes the operator work in non-greedy mode • Output: • matched <d is under the >

More Related