1 / 20

Applications of Sequence Learning CMPT 825 Mashaal A. Memon

Applications of Sequence Learning CMPT 825 Mashaal A. Memon. What We Know of Sequence Learning. Part Of Speech (POS) Tagging is a sequence learning problem. 3 approaches to solving the problem:. Noisy-Channel Classification Rule-Based. What We Know About POS Tagging.

derora
Download Presentation

Applications of Sequence Learning CMPT 825 Mashaal A. Memon

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applications of Sequence LearningCMPT 825 Mashaal A. Memon

  2. What We Know of Sequence Learning • Part Of Speech (POS) Tagging is a sequence learning problem. • 3 approaches to solving the problem: • Noisy-Channel • Classification • Rule-Based

  3. What We Know About POS Tagging • A part of speech (POS) explains not what the word is, but how it is used. • Problem: Which POS does each word represent? • Tags: POS tags (i.e. NN = Noun, VB = Verb, etc…) • Training: Words sequences with corresponding POS tags. • Input: Word sequences.

  4. What We Know About POS Tagging Continued… • Examples: Anoop is a great professor . NN VBZ DT JJ NN . I am kissing butt right now . PRP VBP RB NN RB RB .

  5. What Is My Point? • Other interesting and important problems can be represented as tagging problems. • The same three approaches can be used. • 4 such applications will be briefly introduced: • Chunking • Named Entity Recognition • Cascaded Chunking • Word Segmentation

  6. (1) Chunking • A chunk is a syntactically correlated part of a language (i.e. noun phrase, verb phrase, etc.) • Problem: Which type of chunk does each word or group of words belong to? • Note: Chunks of the same type can sometimes kiss each other.

  7. (1) Chunking Continued… Noun-Phrase (NP) Chunking • Only look for noun phrase chunks. • Tags: B = beginning noun phrase • I = in noun phrase • O = other • Training: Word sequences with corresponding POS and NP tags. • Input: Word sequences and POS tags.

  8. (1) Chunking Continued… Noun-Phrase (NP) Chunking • Examples: The student talked to Anoop . B I O O B O The guy he talked to was smelly . B I B O O O O O

  9. (1) Chunking Continued… General Chunking • Look for other syntactical constructs as well as noun phrases. • Tags: - B or I prefix to each chunk type • - chunk types (NP = noun phrase, VP = verb phrase, PP = prepositional phrase, O = other) • Training: Word sequences with corresponding POS and chunk tags. • Input: Word sequences and POS tags.

  10. (1) Chunking Continued… General Chunking • Examples: Anoop should give me an A+ . B-NP B-VP I-VP B-NP B-NP I-NP O His presentation is boring me to death . B-NP I-NP B-PP B-VP B-NP B-PP B-VP O

  11. (2) Named Entity Recognition • A named entity is a phrase that contains names of persons, organizations or locations • Problem: Does a word or group of words represent a named entity or not? • Tags: - B or I prefix to each NE type • - NE types (PER = person, ORG = organization, LOC = location, O = other) • Training: Word sequences with corresponding POS and NE tags. Sometimes lists of NE data are used (Cheating!!) • Input: Word sequences with POS tags.

  12. (2) Named Entity Recognition Continued… • Examples: The United States of America O B-LOC I-LOC I-LOC I-LOC has an intelligent leader in D.C. O O O O O B-LOC , Dick Cheney of Halliburton . O B-PER I-PER O B-ORG O

  13. (3) Cascaded Chunking • Cascaded chunking gives us the parse tree of the sentence back. • Can think of it as chunker taking initial input and then continues to work on its OWN output until no more changes are made to input. • Difference: Chunks may contain other chunks and POS

  14. (3) Cascaded Chunking Continued… CHUNKER (W = {w1..wn}, T = {t1..tn}) → T’ = {t’1..t’n}; CASCADE (W = {w1..wn}, T = {t1..tn}) { OutputBefore = {Ø}; OutputAfter = CHUNKER (W,T); while (OutputBefore != OutputAfter) do { OutputBefore = OutputAfter; OutputAfter = CHUNKER(W, OutputBefore); /* Output result of current iteration */ } }

  15. (3) Cascaded Chunking Continued… • Example: The effort to establish such a conclusion is unnecessary . DT NN TO VB PDT DT NN VBZ JJ . ______ __ ________ __________ ___________ DT NP IP VP PDTDT NP AP __________ ____________ __________________ ______________ DP CP DP CP ... ___________________________________________________________ S • Chunking is an intermediate step to a full parse

  16. (4) Word Segmentation • When written, some languages like Chinese don’t have obvious word boundries. • Problem: Find whether a character or group ofcharacters is a single word? • Tags: B = beginning of word • I = in word • Training: Character sequences with corresponding WS tags. • Input: Character sequences.

  17. (4) Word Segmentation Continued… • Example: 參賽者並未參加任何賓大語料之競賽 B I I B I B I B I B B B I B B I

  18. Conclusion • All problems are different in their goals, but with the same type of representation, they all can be solved with the same approaches. • We all LOVE sequence learning  THE END

  19. Questions?!

  20. References • Manning D., H. Schultze. Foundations of Statistical Natural Language Processing. 1999. • CoNLL shared task on Chunking 2000. Website: (http://cnts.uia.ac.be/conll2000/chunking/) • CoNLL shared task on NER 2003. Website: (http://cnts.uia.ac.be/conll2003/ner/) • CoNLL shared task on NER 2002. Website: (http://cnts.uia.ac.be/conll2002/ner/) • Abney, S.. Parsing By Chunks. In Journal of Psychological Research, 18(1), 1989. • Chinese Word Segmentation Bakeoff 2003. Website: (http://www.sighan.org/bakeoff2003)

More Related