1 / 40

DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

Information Management. DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida. DO NOT BE A RABBIT!. If you don ’ t know how to Do something, Don ’ t hide under a bush. Tell me Or Come see me. Naturphoto.cz. Regular Expressions.

Download Presentation

DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Management DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

  2. DO NOT BEA RABBIT! If you don’t know how to Do something, Don’t hide under a bush. Tell me Or Come see me. Naturphoto.cz

  3. Regular Expressions • A "grammar" for validating input • useful for many kinds of pattern recognition • The basic built-in Boolean function in PHP is called 'preg_match'. • It takes two or three arguments: • the pattern, like "cat" • the test string, like "catastrophe" • and an (optional) array variable, • which we can ignore for now • It returns TRUE if the pattern matches the test string.

  4. POSIX Regular Expressions Always begin with "/ and end with /" (for today's lesson) $instring = "catastrophe"; if (preg_match("/cat/",$instring)) { print "I found a cat!"; } else { print "No cat here."; }

  5. Regular Expressions $instring = "catastrophe"; if (preg_match("/cat/",$instring)) { print "I found a cat!"; } else { print "No cat here."; } I found a cat!

  6. PRACTICE 1: "/cat/"  that is the regular expression Make up a Regular Expression to recognize Not the word cat, but rather the word dog. Write it on your paper, now.

  7. PRACTICE 1: "/cat/"  that is the regular expression Make up a Regular Expression to recognize Not the word cat, but rather the word dog. Write it on your paper, now. Yes, I mean YOU. Where is your paper and pencil? (You can use your laptop if that’s what you have…)

  8. PRACTICE 1: "/cat/"  that is the regular expression Make up a Regular Expression to recognize Not the word cat, but rather the word dog. Write it on your paper, now. Answer: "/dog/" Yep, it’s that simple. But I gotta get you STARTED.

  9. Regular Expressions Wild cards: period . matches any single character $instring = "cotastrophe"; if (preg_match("/c.t/",$instring)) { print "I found a c.t!"; } else { print "No c.t here."; }

  10. Regular Expressions Wild cards: period . matches any single character $instring = "cotastrophe"; if (preg_match("/c.t/",$instring)) { print "I found matching string!"; } else { print "No c.t here."; } I found a matching string!

  11. Regular Expressions Wild cards: a* matches any number of a characters (or the "null character"!) $instring = "caaaatastrophe"; if (preg_match("/ca*t/",$instring)) { print "I found a match!"; } else { print "No ca*t here."; } I found a match!

  12. Regular Expressions Wild cards: .* matches any string of characters (or the "null character"!) $instring = "cotastrophe"; if (preg_match("/c.*t/",$instring)) { print "I found a c.*t!"; } else { print "No c.*t here."; } I found a c.*t!

  13. Regular Expressions Wild cards: .* matches any string of characters (or the "null character"!) $instring = "cflippingmonstroustastrophe"; if (preg_match("/c.*t/",$instring)) { print "I found a c.*t!"; } else { print "No c.*t here."; }

  14. Regular Expressions Wild cards: .* matches any string of characters (or the "null character"!) $instring = "cflippingmonstroustastrophe"; if (preg_match("/c.*t/",$instring)) { print "I found a c.*t!"; } else { print "No c.*t here."; } I found a c.*t!

  15. PRACTICE 2: "/c.t/"  that is a model RE for you "/c.*t/"  that is a model RE for you "/ca*t/"  that is a model RE for you Make up a Regular Expression to recognize Rob or Rb or Roob or Rooob, etc. But to REJECT Reb and Rab and Rats and Mike … .

  16. PRACTICE 2: "/c.t/"  that is a model RE for you "/c.*t/"  that is a model RE for you "/ca*t/"  that is a model RE for you Answer: ”/Ro*b/”

  17. Quantification Multiple copies of something: a+ means ONE OR MORE a’s Example: "/fa+ther/" matches father, faather, faaather, etc. a* means ZERO OR MORE a’s Example: "/fa*ther/" matches fther, father, faather, etc. a? means ZERO OR ONE a Example: "/flavou?r/" will match flavor AND flavour. a{33} means 33 instances of a

  18. Quantification Example a+ means ONE OR MORE a’s Example: "/fa+ther/" matches father, faather, faaather, etc. a* means ZERO OR MORE a’s Example: "/fa*ther/" matches fther, father, faather, etc. a? means ZERO OR ONE a Example: "/flavou?r/" will match flavor AND flavour. a{33} means 33 instances of a How to recognize “Rob” or “Robb”?

  19. Quantification Example a+ means ONE OR MORE a’s Example: "/fa+ther/" matches father, faather, faaather, etc. a* means ZERO OR MORE a’s Example: "/fa*ther/" matches fther, father, faather, etc. a? means ZERO OR ONE a Example: "/flavou?r/" will match flavor AND flavour. a{33} means 33 instances of a How to recognize “Rob” or “Robb”? ”/Robb?/"

  20. Quantification Example a+ means ONE OR MORE a’s Example: "/fa+ther/" matches father, faather, faaather, etc. a* means ZERO OR MORE a’s Example: "/fa*ther/" matches fther, father, faather, etc. a? means ZERO OR ONE a Example: "/flavou?r/" will match flavor AND flavour. a{33} means 33 instances of a How to recognize “Rob” or “Robb”? Another way: ”/Rob{1,2}/"

  21. Escaping Backslash means "don't interpret this:" \. is just a dot \* is just an asterisk.

  22. The concept: Would $t="/a{3}\.b{1,4}/"; $s= "aaa.bbb"; this would or would not be accepted? preg_match($t,$s) – true or false?

  23. The concept: Would $t="/a{3}\.b{1,4}/"; $s= "aaa.bbb"; this would or would not be accepted? preg_match($t,$s) – true or false? TRUE, because $s matches the pattern string $t. three a, one dot, and between one and four b characters.

  24. The concept: Would $t="/a{3}\.b{1,4}/"; $s= "aaa.bbbbb"; this would or would not be accepted? preg_match($t,$s) – true or false?

  25. The concept: Would $t="/a{3}\.b{1,4}/"; $s= "aaa.bbbbb"; this would or would not be accepted? preg_match($t,$s) – true or false? Perhaps surprisingly, TRUE: because $s contains three a and 4 b.

  26. The concept: Would $t="/a{3}\.b{1,4}/"; $s= "aaa.bbbbb"; this would or would not be accepted? preg_match($t,$s) – true or false? Perhaps surprisingly, TRUE: because $s contains three a and 4 b. If you have $1.00 and I asked you “do you have 75 cents?” the answer would be YES.

  27. The concept: Would $t="/a{3}\.b{1,4}/"; $s= "aaa.bbbbb"; this would or would not be accepted? preg_match($t,$s) – true or false? Perhaps surprisingly, TRUE: because $s contains three a and 4 b. If you wanted an EXACT match, I'll show you how In a bit.

  28. Grouping Multiple copies of something: (abc)+ means ONE OR MORE string abc’s (abc)* means ZERO OR MORE string abc’s like abcabcabc SETS: [0-9] matches any single integer character [A-Z] matches any uppercase letter [AZ] matches A or Z [AZ]? (i.e. 0 or 1 of the previous) matches null, A or Z

  29. Starting and Ending preg_match("/cat/","abunchofcats") is TRUE but preg_match("/^cat/","abunchofcats") is FALSE because ^ means the RE must match the first letter. preg_match("/cats$/","abunchofcats") is TRUE but preg_match("/cats$/","mycatsarelazy") is FALSE So, ^ marks the head and $ marks the tail.

  30. Exact Matching with ^ and $ $t="/^a{3}\.b{1,4}$/"; $s= "aaa.bbbbb"; this would or would not be accepted? preg_match($t,$s) – true or false? FALSE, because the ending $ in the pattern says "no more input is acceptable" but more stuff comes. This would also reject $s="aaa.bbbbAndMoreText"; 30

  31. Alternatives - the 'or' mark | $t="/flav(o|ou)r/"; This will match 'flavor' and 'flavour'. And (yes!) there are often more than one way to do things; for instance our good old ? Mark. "/flavou?r/" 31 31

  32. Sets - Examples [A-E]{3} matches AAA, ABA, ADD, ... EEE [PQX]{2,4} matches PP, PQ, PX ... up to XXXX [A-Za-z]+ matches any alphabetic string with 1 or more characters [A-Z][a-z]* matches any alpha string with first letter capitalized. [a-z0-9]+ matches any string of lowercase letters and numerals

  33. Practice in class Write a RE that recognizes any string that begins with "sale". Here's an example for you to look at, help you remember ^cat From now on, the RE is just ^cat. You don't need to write the other stuff (preg_match, "/, etc.)

  34. Practice 1) Write a RE that recognizes any string that begins with "sale". Answer: ^sale

  35. Practice 1) Write a RE that recognizes any string that begins with "sale". Answer: ^sale 2) Write a RE that recognizes a string that begins with "smith" and a two digit integer, like smith23 or smith99. Here's an example from your recent past: a{3}\.b{1,4}

  36. Practice 1) Write a RE that recognizes any string that begins with "sale". Answer: ^sale 2) Write a RE that recognizes a string that begins with "smith" and a two digit integer, like smith23 or smith99. Answer: ^smith[0-9]{2}

  37. 3) Write a RE that recognizes Social Security numbers in the form like 123-45-6789 Helpers from the recent past: ^smith[0-9]{2} a{3}\.b{1,4} 37

  38. 3) Write a RE that recognizes Social Security numbers in the form like 123-45-6789 Answer: [0-9]{3}\-[0-9]{2}\-[0-9]{4} 38 38

  39. 3) Write a RE that recognizes Social Security numbers in the form like 123-45-6789 Answer: [0-9]{3}\-[0-9]{2}\-[0-9]{4} NOTE: That's a conservative answer. It turns out that the dash character is not a special symbol outside sets, and so you could also write [0-9]{3}-[0-9]{2}-[0-9]{4} But I don't like to remember stuff, so I use \ a lot. 39 39 39

  40. How to study this stuff? • Practice making up RE for problems like these: • The UCF NID • French telephone numbers like (+33 5 23 46 22 91) • Dollars and cents, like $942.73 • A field that may contain only lowercase strings with • exactly ONE vowel. • How do you know if they're good? If you know PHP • You can test them. Otherwise, check out each others' work. • (OR come see me in office hours!)(Or by appointment!) • 407 694 6763 40

More Related