1 / 19

The Simplest NL Applications: Text Searching and Pattern Matching

The Simplest NL Applications: Text Searching and Pattern Matching. Read J & M Chapter 2. Searching for a Single String Using a Nondeterministic FSM. c o c o n u t.

leola
Download Presentation

The Simplest NL Applications: Text Searching and Pattern Matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2

  2. Searching for a Single StringUsing a Nondeterministic FSM c o c o n u t 1 2 3 4 5 6 7 8  

  3. Searching for a Single String Using the Boyer Moore Algorithm

  4. Searching for Multiple Strings  o c o s 2 3 4 5 6 l c o c o n u t 1 2 3 4 5 6 7 8   Example: lococonut

  5. Converting to a Deterministic FSM  o c o s 2 3 4 5 6 l c o c o n u t 1 2 3 4 5 6 7 8  

  6. Regular Expressions • Two different (but related) uses of the term: • Expressions that define all and only the regular languages • (aa ab  ba  bb)* • Expressions in a useful pattern language Matching ip addresses: S!<emphasis> ([0-9]+ (\ . [0-9]+) {3}) </emphasis> ! <inet> $1 </inet>! Finding doubled words: \< ([A-Za-z]+) \s+ \1 \>

  7. REs: Syntax and Semantics Syntax The regular expressions over an alphabet  are all strings over the alphabet  {(, ), , , *} that can be obtained as follows: 1.  and each member of  is a regular expression. 2. If  ,  are regular expressions, then so is . 3. If  ,  are regular expressions, then so is . 4. If  is a regular expression, then so is *. 5. If  is a regular expression, then so is (). 6. Nothing else is a regular expression.

  8. REs: Syntax and Semantics Regular expressions define languages via a semantic interpretation function we'll call L: 1. L() =  and L(a) = {a} for each a  2. If  ,  are regular expressions, then L() = L() L() = all strings that can be formed by concatenating to some string from L() some string from L(). 3. If  ,  are regular expressions, then L() = L()  L() 4. If  is a regular expression, then L(*) = L()* 5. If () is a regular expression, then L( () ) = L() A language is regular if and only if it can be described by a regular expression. Note: Lis compositional.

  9. The Importance of Compositionality What is the meaning of: Mary cooked the yujutes. Mary tyroked the yujutes.

  10. Morphological Analysis • Read J & M Chapter 3 • Recognize words • Parse words

  11. Morphological Parsing Goal: to represent the facts declaratively so that a single representation can be used for both recognition and generation. Note: ^ marks morpheme boundaries. # marks word boundaries.

  12. From Lexical to Intermediate Note: All the transducers in the book are described as lexical:intermediate, but they can run the other direction.

  13. Where Did reg-noun-stem Come From?

  14. We Can Cascade or Compose

  15. From Intermediate to Surface For text, we need spelling rules. x   e / s ^ ___ s # z Read this as “Replace  as e in the context after the /.

  16. Turning the Rule into a Transducer foxes xerox fox#sat

  17. Disambiguation - Local Local ambiguities: # s# asses luxury

  18. Disambiguation - Harder Sometimes additional knowledge is necessary: foxes: fox +N + PL or fox +V +SG Can we think of nouns that cannot also be verbs?

  19. Search • For FSMs, we can build a deterministic machine. • In other cases, we will have to search: • Depth-first • Breadth-first – chart parsing S S VP VP NP PP NP NP V V PR N det N PREP DET N I hit the boy with a bat.

More Related