1 / 12

Regular expressions

Regular expressions. CS201 Fall 2004 Week 11. Problem. input is very untrustworthy stack smashing, for example lots of data display patterns can we combine these two insights? yes- regular expressions. Example. command line: dir *.java Boo.java Fred.java PainfulClass.java

Download Presentation

Regular expressions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular expressions CS201 Fall 2004 Week 11

  2. Problem • input is very untrustworthy • stack smashing, for example • lots of data display patterns • can we combine these two insights? • yes- regular expressions

  3. Example • command line: dir *.java • Boo.java Fred.java PainfulClass.java • Displays all the java programs in the directory • * - Kleene closure

  4. RE and pattern matching • Web searches • email filtering • text-manipulation (Word) • Perl

  5. How do we use it? • import java.util.regex.*; • specify a pattern • compile it • match • iterate

  6. Specifying Patterns • strings: "To: cwm2n@spamgourmet.com" • can match case exactly • or match case insensitive • Range • [01234567] – any symbol inside the [] • [0-9] • [^j] – caret means "anything BUT j" • one symbol: • . – period manys any character • \\d – a digit, e.g.: [0-9] • \\D – a non-digit [^0-9] • \\w – character, part of a word [a-zA-Z_0-9]

  7. Patterns • quantifier- how many times • * - any number of times (including zero) • .* • ? – zero or one time • A? - A zero or one time • + one or more times • A+ - must find at least one A • others (p. 476)

  8. examples • find subject line of email • "Subject: .*" • finds: Subject: weather • finds: Subject: [POSSIBLE SPAM] get a degree! • Problem • also finds • How to be a British Subject: marry into the Royal

  9. Anchors • tell us where to find what we are looking for • ^ - beginning of line • ^Subject: .* • $ - end of line • ^com • others on page 478

  10. Alternation • subject line either SPAM or Rolex • ^Subject:.*(SPAM.* | Rolex.*)

  11. How to use it, really • Form a pattern • Pattern p = Pattern.compile("^Subject: .*"); • Create a Matcher • Matcher m = p.matcher(someBuffer); • iterate while(m.find()) System.out.println("Found text: "+m.group()); • find()- boolean, next occurence found • group() – String that matches

  12. example package edu.virginia.cs.cs201.fall04; import java.util.regex.*; public class Tryout { String text = "A horse is a horse, of course of course.."; String pattern = "horse|course"; public static void main(String args[]) { Tryout t = new Tryout(); t.go(); } public void go() { Pattern p = Pattern.compile(pattern); Matcher m = p.matcher(text); while(m.find()) { System.out.println(m.group()+m.start()); } } } horse2 horse13 course23 course33

More Related