1 / 21

CSC 4630

CSC 4630. Meeting 21 April 4, 2007. Return to Perl. Where are we? What is confusing? What practice do you need?. Ray’s Problem. Given a string of the form: 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 b 9 = 100 replace the 8 b’s with one plus sign two minus signs

laken
Download Presentation

CSC 4630

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 4630 Meeting 21 April 4, 2007

  2. Return to Perl • Where are we? • What is confusing? • What practice do you need?

  3. Ray’s Problem Given a string of the form: 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 b 9 = 100 replace the 8 b’s with • one plus sign • two minus signs • five empty strings, signifying close up the spacing to make a number and find which replacements yield a true statement.

  4. Ray’s Problem (2) Thoughts on the answer: • 1234-56-78+9 = 100 is an example • How many possible strings are there? • Proof by exhaustion may be the best

  5. Regular Expressions Revisited Returning to a fundamental structure • Theoretically defined • Implemented in grep, egrep, • Implemented in awk, gawk, nawk • Implemented in Perl

  6. RE(2) • Theoretically a RE defines a set of strings on an alphabet • In implementation matching with a RE checks whether the current string is an element of a set of strings that is constructed from the strings defined theoretically.

  7. RE(3) • A single character c • Theoretically defines the set of strings {c} • Which generates the set of matching lines {ScT}, where S and T are arbitrary, possibly empty strings. • In implementation, • grep c somelines returns ______________ • awk “/c/” somelines returns ______________ • if (/c/) print {$_;} returns ______________

  8. RE(4) so grep c somelines is equivalent to perl re1 <somelineswhere re1 is the Perl program while <STDIN> { if (/c/) {print $_;} }

  9. RE(5) • Theoretically if r and s are regular expressions defining languages L and M respectively, then • rs defines the language LM, meaning concatenate a string in L with a string in M • Hence, • grep abc somelines • awk “/abc/” somelines • while <STDIN> { if (/abc/) {print $_;}}

  10. RE(6) all return the lines that are contained in the set {SabcT} where S and T are arbitrary, possibly empty strings. Details: /a/ defines {a}, /b/ defines {b}, /c/ defines {c} /abc/ defines {abc} by concatenation Lines matching /abc/ are in {SabcT}

  11. RE(7) • The * operator shows that the previous simple regular expression is repeated 0 or more times. • /ab*c/ defines the language formed as the union of the languages defined by /ac/, /abc/, /abbc/, /abbbc/, etc. This is the set {abnc | n = 0,1,2, …} (an infinite set) • Hence /ab*c/ matches any string of the form SabncT

  12. RE(8) • The symbol . designates any character in the alphabet (What is the alphabet we’re using?) except \n which stands for newline. (A Perl definition, check for the various shells and the various awks). • Thus . defines the language A-{\n} • And . matches any line that contains at least one character. Officially an empty line looks like \n and every line ends with \n

  13. RE(9) Exercise: Construct all possible lines of text that will not be matched by /a./ Exercise: Construct all possible lines of text that will be matched by /.a.b./ Exercise: Regardless of their content, what lines of text will not be matched by /.a.b./

  14. RE(10) Character Classes • Any set of characters enclosed in brackets • The vowels [aeiou] • Any range of consecutive ASCII coded characters enclosed in brackets • The lower case letters [a-z] • The digits [0-9] • The hex digits [0-9A-F]

  15. RE(12) • Including special characters in the set • To get ], use \] or []a-z] (Think about reading this string character by character to learn its meaning.) • To get -, use \- or [a-z-] • Complementing (not complimenting) a set • Use ^ as leading character, [^0-9] or [^aeiou] • More special characters • To get ^, use \^ or place it away from the first position [a-z^_]

  16. RE(13) The Matching Game: • [0123456789] • [0-9] • [0-9\-] • [a-z0-9] • [a-zA-Z0-9_] • [^0-7] • [^A-M.,;] • [^\^] • [0 - 9] • [.]

  17. RE(14) Short character set names • \d means [0-9] • \D means [^0-9] • \w means [a-zA-Z0-9_] (identifier characters) • \W means [^a-zA-Z0-9_] • \s means [ \r\t\n\f] • \S means [^ \r\t\n\f]

  18. RE(15) More repetition symbols • b* means zero or more repetitions of b, as does b{0,} • b+ means one or more repetitions of b, as does b{1,} • b? means zero or one repetitions of b, as does b{0,1} • b{5,8} means five, six, seven or eight repetitions of b • b{4} means exactly four repetitions of b

  19. RE(16) • Splitting a string split(/:/,$line) divides $line into substrings at the colons and places the substrings in a list (array) Note: Two adjacent colons :: produce an empty string. split(/:+/,$line) divides $line into nonempty substrings

  20. Andy’s Problem Lines from a text file look like • 105028|Adam Mrugalski|AJM Residential|1067 Shoecraft rd|Webster|NY|14580||||||ajmresidential@yahoo.com||No||No|||Thu Dec 21 21:23:23 2006| • 105029|robert ritchey|robert industries|po box 472|crockett |ca|94525|510-787-7290|||||send2rr@gmail.com||No||No|||Fri Dec 22 02:54:54 2006| • 105030|Jack Still|WISE TV|PO BOX 280|Coeburn|VA|24230|2763959339|||||wisetv19@msn.com||No||No||9feet 1inch floor to floor. Connects to balcony. Need oak 4 feet round with landing at top. Send me a quote. J. Still WISE TV |Fri Dec 22 03:18:19 2006|

  21. Andy (2) The lines need to be cleaned and parsed into several reports: • Phone contact information • Email contact information • Address labels • Full data base, checking for unique entries

More Related