1 / 21

CSE467/567 Computational Linguistics

CSE467/567 Computational Linguistics. Carl Alphonce cse-467-alphonce@cse.buffalo.edu Computer Science & Engineering University at Buffalo. Levels of processing. phonetics/phonology – sounds morphology – word structure syntax – sentence structure semantics – meaning

tahlia
Download Presentation

CSE467/567 Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE467/567Computational Linguistics Carl Alphonce cse-467-alphonce@cse.buffalo.edu Computer Science & Engineering University at Buffalo

  2. Levels of processing • phonetics/phonology – sounds • morphology – word structure • syntax – sentence structure • semantics – meaning • pragmatics – goals of language use • discourse – utterances in context CSE 467/567

  3. Words: the building blocks of sentences CSE 467/567

  4. Words have internal structure • readable = read + able • readability = read + able + ity • the structure of words can be described using a regular grammar CSE 467/567

  5. Chomsky hierarchy CSE 467/567

  6. Problem • I often need to find an e-mail, but I have thousands of e-mails in my various folders. Suppose I want to find an e-mail about geese. The e-mail may mention “geese” or “goose”; also, if it appears at the start of a sentence, its initial letter will be capitalized. Need to match “goose”, “geese”, “Goose” or “Geese”. CSE 467/567

  7. Regular expressions (in Perl) “a regular expression is an algebraic notation for characterizing a set of strings” [p. 22] Regular expressions are commonly used to specify search strings. For example, the UNIX utility program grep lets the user specify a pattern to search for in files. CSE 467/567

  8. Sequences of characters Matching a sequence of characters /…/ Examples: /a/ matches the character ‘a’ /fred/ matches the string ‘fred’ Note: /fred/ does not match the string ‘Fred’! In other words, patterns are case-sensitive. CSE 467/567

  9. Character disjunction(character classes) Square brackets are used to indicate disjunction of characters. Examples: /[Ff]/ matches either ‘f’ or ‘F’ /[Ff]red/ matches either ‘fred’ or ‘Fred’ This form of disjunction applies only at the character level. A set of characters in square brackets are sometimes referred to as a character class. CSE 467/567

  10. Ranges Sometimes it is useful to specify “any digit” or “any letter”. “Any digit” can be written as /[0123456789]/, since any of the ten digits satisfies the pattern. An alternative is to use a special range notation: /[0-9]/ Any letter can be specified as /[A-Za-z]/ Range notation does not extend the power of regular expressions, but gives us a convenient way to express them. CSE 467/567

  11. Complementing character classes To search for a character that is not in a character class, use the caret (^) in front of the character class that is enclosed in square brackets. Examples: /[^a]/ matches anything except ‘a’ /[^0-9]/ matches anything except a digit CSE 467/567

  12. Matching 0 or 1 occurrence The ‘?’ matches zero or one occurrences of the preceding expression. Examples: /a?/ matches ‘a’ or ‘’ (nothing) /cats?/ matches ‘cat’ or ‘cats’ Note that the “preceding expression”, in these examples, is a single letter. We’ll see how to form longer expressions later. CSE 467/567

  13. The Kleene star and plus The Kleene star (*) matches zero or more occurrences of the preceding expression. Examples: /a*/ matches ‘’, ‘a’, ‘aa’, ‘aaa’, etc. /[ab]*/ matches ‘’, ‘a’, ‘b’, ‘aa’, ‘ab’, ‘ba’, ‘bb’, etc. + matches one or more occurrences + is not necessary: /[ab]+/ is equiv. to /[ab][ab]*/ CSE 467/567

  14. Wildcard The period (.) matches any single character except the newline (\n). CSE 467/567

  15. Anchors Anchors are used to restrict a match to a particular position within a string. ^ anchors to the start of a string $ anchors to the end of a string /[Ff]red/ matches both ‘Fred’ and ‘Fred is home’ /^[Ff]red$/ matches ‘Fred’ but not ‘Fred is home’ \b anchors to a word boundary \B anchors to a non-boundary CSE 467/567

  16. Conjunction Two regular expressions are conjoined by juxtaposition (placing the expressions side by side). Examples: /a/ matches ‘a’ /m/ matches ‘m’ /am/ matches ‘am’ but not ‘a’ or ‘m’ alone CSE 467/567

  17. Disjunction We have already seen disjunction of characters using the square bracket notation General disjunction is expressed using the vertical bar (|), also called the pipe symbol. This form of disjunction allows us to match any one of the alternative patterns, not just characters like the [ ] disjunction form. CSE 467/567

  18. Grouping • Parentheses, ‘(’ and ‘)’, are used to group subpatterns of a larger pattern. • Ex: /[Gg](ee)|(oo)se/ CSE 467/567

  19. Replacement In addition to matching, we can do replacements when a match is found: Example: To replace the British spelling of color with the American spelling, we can write: s/colour/color/ CSE 467/567

  20. Registers – saving matches • To save a match from part of a pattern, to reuse it later on, Perl provides registers • Registers are named \#, where # is the number of the register • Ex. DE DO DO DO DE DA DA DA IS ALL I WANT TO SAY TO YOU /(D[AEO].)*/ will match the first line /(D[AEO])(.D[AEO]) \2 \2\s \1 (.D[AEO]) \3 \3/ matches it more specifically This pattern also matches strings like DA DE DE DE DA DO DO DO \s matches a whitespace character CSE 467/567

  21. For more information • PERL Regular Expression TUTorial • http://perldoc.perl.org/perlretut.html • PERL Regular Expression reference page • http://perldoc.perl.org/perlre.html CSE 467/567

More Related