slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Programme PowerPoint Presentation


141 Views Download Presentation
Download Presentation


- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Language Technology(A Machine Learning Approach)Antal van den Bosch & Walter

  2. Programme • 24/2 [Antal] Intro ML for NLP & WEKA / Decision Trees • 3/3 [Walter] ML for shallow parsing • 10/3 [Antal] ML for morphology and phonology • 17/3 [Antal] ML for Information extraction • 24/3 [Antal] ML for discourse • 31/3 [Véronique] ML for coreference • 21/4 [Antal] Memory & Representation • 28/4 [Antal] Modularity / More Data • 5/5 [Walter] ML for document classification

  3. Text Applications LT Components Lexical / Morphological Analysis OCR Spelling Error Correction Tagging Grammar Checking Chunking Information retrieval Grammatical Relation Finding Syntactic Analysis Document Classification Information Extraction Word Sense Disambiguation Summarization Named Entity Recognition Question Answering Semantic Analysis Ontology Extraction and Refinement Reference Resolution Dialogue Systems Discourse Analysis Machine Translation Meaning

  4. Outline • Shallow Parsing • Formalisms / shallow versus deep parsing • POS Tagging • Chunking • Relation-finding • Demo • Assignment 1 Upload service:

  5. Formalisms for Computational Linguistics Orthography finite-state spelling rules Phonology finite-state text to speech Morphology finite-state synthesis / analysis context-free compounds Syntax context-free parsing + extensions Semantics FOPC / CD interpretation Pragmatics

  6. Classes of grammars are differentiated by means of a number of restrictions on the type of production rule • Type-0-grammar (unrestricted rewrite system). Rules have the form    • Type-1-grammar (context-sensitive). Rules are of the type   , where ||  || • Type-2-grammar (context-free). Rules are of the form A  , where   e • Type-3-grammar (regular, finite). Rules are of the form A  a or A  aB • A grammar generates strings of L(G), an automaton accepts strings of L(M). Structure may be assigned as a side-effect.

  7. Why parsing? • Structure of sentence determines (partially) its meaning John eats pizza with a spoon Name(John) Pizza(p) Spoon(s) Eat-action(a1) Agent(a1,John) Patient(a1,p) Instrument(a1,s) S VP PNP NP NP NP PNP VBZ NN IN DT NN John eats pizza with a spoon

  8. HPSG: CFG: S  NP VP NP  DET N VP  V NP Bottom up / top down DF / BF backtracking Chart parsing Lexicon Grammar Search method Input string Tree

  9. The problem with full parsing • Vicious trade-off coverage - ambiguity • The larger the grammar (more coverage), the more spurious ambiguity

  10. Shallow parsing • Approximate expressive power of CFG and feature-extended CFG by means of a cascade of simple transformations • Advantages • deterministic (no recursion) • efficient (1600 words per second vs. 1 word per second for a typical comparison) • accurate • robust (unrestricted text, partial solutions) • can be learned

  11. Shallow Parsing • Steve Abney 1991 (FST) • • Ramshaw & Marcus 1995 (TBL) • CoNLL Shared tasks 1999, 2000, 2001 • • JMLR special issue 2002 •

  12. Cascade • POS tagging • NP chunking • XP chunking • Grammatical relation assignment • Function assignment • Parsing

  13. Deductive CASS-parser (Abney, 1991) Fidditch (Hindle, 1994) Inductive Ramshaw & Marcus, 1995 Daelemans/Buchholz/Veenstra, 1999; Tjong Kim Sang, 2000 Approaches Finite-State Transformation Rules Rule-based Memory-based

  14. Abney (1991): CASS-parser • Chunk = maximal, continuous, non-recursive syntactic segment around a head • Comparable to morphologically complex word in synthetic languages • Motivation • Linguistic (incorporate syntactic restrictions) • Psycholinguistic • Prosodic (phonological phrases)

  15. Levels and Transformations Levels • words and their part of speech tags • chunks (kernel NP, VP, AP, AdvP) • NP  D? N* N • VP  V-tns | Aux V-ing • simple phrases (transforming embedding to iteration) • PP  P NP • complex phrases • S  PP* NP PP* VP PP*

  16. S S NP PP VP NP VP NP P NP VP NP VP D N P D N N V-tns Pro Aux V-ing The woman in the lab coat thought you were sleeping L3 T3 L2 T2 L1 T1 L0

  17. Pattern = category + regular expression • Regular expression is translated into FSA • For each Tiwe take the union of the FSAs to construct a recognizer for level Li • In case of more than one end state for the same input, choose the longest • In case of blocking, advance one word • “Easy-first parsing” (islands of certainty) • Extensions: add features by incorporating actions into FSAs

  18. MBLP Cascade: shallow parsing classification disambiguation segmentation tagging chunking PP attachment relation finding

  19. Memory-Based Learning • Basis: k nearest neighbor algorithm: • store all examples in memory • to classify a new instance X, look up the k examples in memory with the smallest distance D(X,Y) to X • let each nearest neighbor vote with its class • classify instance X with the class that has the most votes in the nearest neighbor set • Choices: • similarity metric • number of nearest neighbors (k) • voting weights

  20. Memory-Based Learning Advantages: • Easy combination of different features • Robustness against overfitting. • Fast training and tagging with igtree But: • Weighting does not look at feature correlations, and averages over all feature values • No global optimization (yet) • Trade-off between speed and accuracy

  21. POS tagging • Assigning morpho-syntactic categories (Parts-of-speech) to words in context: • Disambiguation: a combination of lexical and “local” contextual constraints.

  22. POS tagging: what for? • shallow processing (abstraction from words: >recall) • basic disambiguation (choose form: >precision) • robustness, coverage, speed. • good enough for many applications: • text mining: information retrieval/extraction • corpus queries (linguistic annotation) • terminology acquisition • text-to-speech • spelling correction

  23. POS: remaining errors • last 10-3% is hard: • long distance dependencies • genuine ambiguities • annotation errors • unknown words • not enough information in the features • more features are needed, but this has an exponential effect on data sparseness. • generalization to general text is poor: 97%  75%. • some languages: large tag sets & small corpora.

  24. Memory-Based POS Tagger • Case base for known words. Features: • Case base for unknown words. Features: tag-2, tag-1, lexfocus, word(top100)focus, lex+1, lex+2 POS tag tag-2, tag-1, pref, cap, hyp, num, suf1, suf2, suf3, lex+1, lex+2  POS tag

  25. Memory-Based POS Tagger • Experimental results:

  26. NP Chunking as tagging [NP Pierre Vinken NP] , [NP 61 years NP] old , [VP will join VP] [NP the board NP] of [NP directors NP] as [NP a non-executive director NP] [NP Nov 29 NP] Pierre/I Vinken/I ,/O 61/I years/I old/O ,/O will/O join/O the/I board/I of/O directors/I as/O a/I non-executive/I director/I Nov/B 29/I ./O I Inside chunk O Outside chunk B Between chunks

  27. Memory-Based XP Chunker Assigning non-recursive phrase brackets (Base XPs) to phrases in context: Convert NP, VP, ADJP, ADVP, PrepP, and PP brackets to classification decisions (I/O/B tags) (Ramshaw & Marcus, 1995). Features: POS -2, IOBtag-2, word -2, POS -1, IOBtag-1, word -1, POS focus, wordfocus, POS +1, word +1, POS +2, word +2,  IOB tag

  28. Memory-Based XP Chunker • Results (WSJ corpus) • One-pass segmentation and chunking for all XP • Useful for: Information Retrieval, Information Extraction, Terminology Discovery, etc.

  29. Finding subjects and objects • Problems • One sentence can have more than one subject/object in case of more than one VP • One VP can have more than one subject/object in case of conjunctions • One NP can be linked to more than one VP • subject/verb or verb/object can be discontinuous

  30. Task Representation • From tagged and chunked sentences, extract • Distance from verb to head in chunks • Number of VPs between verb and head • Number of commas between verb and head • Verb and its POS • Two words/chunks context to left, word + POS • One word/chunk context to right • Head itself

  31. Memory-Based GR labeling Assigning labeled Grammatical Relation links between words in a sentence: GR’s of Focus with relation to Verbs (subject, object, location, …, none) Features: Focus: prep, adv-func, word+1, word0, word-1, word-2, POS+1, POS0, POS-1, POS-2, Chunk+1, Chunk0, Chunk-1, Chunk-2. Verb: POS, word, Distance: words, VPs, comma’s  GRtype

  32. Memory-Based GR labeling • Results (WSJ corpus) • Subjects: 83%, Objects: 87%, Locations:47%, Time:63% • Completes shallow parser. Useful for e.g. Question Answering, IE etc.

  33. TEXT More about these projects: Applications Shallow parsing ANALYZED CORPUS ANALYZED SENTENCES Clustering and Pattern Matching Q-A Alignment Deletion Rules IE Rules Ontology Extraction Question Answering Summar-ization Bio-medical IE

  34. Tokenizer (Perl) MBSP (Perl) Text In TiMBL server Lemmatizer MBT server POS Tagger TiMBL server Known words TiMBL server Relation Finder TiMBL server Unknown words MBT server Concept Tagger Timbl server Phrase Chunker TiMBL 5.0 MBT 2.0 TiMBL server Unknown words TiMBL server Known words

  35. Tokenizer (Perl) MBSP (Perl) Text In TiMBL server Lemmatizer MBT server POS Tagger TiMBL server Known words TiMBL server Relation Finder TiMBL server Unknown words MBT server Concept Tagger Timbl server Phrase Chunker TiMBL 5.0 MBT 2.0 TiMBL server Unknown words TiMBL server Known words

  36. What should be in? • Shallow parsing (tagging, chunking, grammatical relations) • Semantic roles • Domain semantics (NER / concept tagging) • Negation, modality, quantification can be solved as classification tasks?

  37. Conclusions • Text Mining tasks benefit from linguistic analysis (shallow understanding). • Understanding can be formulated as a flexible heterarchy of classifiers. • These classifiers can be trained on annotated corpora.

  38. Tagging with WEKA • Data: • Cases: _ _ _ The cafeteria remains closed DT _ _ The cafeteria remains closed PERIOD NN _ The cafeteria remains closed PERIOD _ VBZ The cafeteria remains closed PERIOD _ _ JJ cafeteria remains closed PERIOD _ _ _ PERIOD

  39. Assignment 1 •

  40. An/DT/I-NP/O/NP-SBJ-1/an online/NN/I-NP/O/NP-SBJ-1/online demo/VBZ/I-VP/O/VP-1/demo can/MD/I-VP/O/VP-1/can be/VB/I-VP/O/VP-1/be found/VBN/I-VP/O/VP-1/find here/RB/I-ADVP/O/O/here ;/:/O/O/O/; the/DT/I-NP/O/NP-SBJ-2/the page/NN/I-NP/O/NP-SBJ-2/page features/VBZ/I-VP/O/VP-2/feature more/JJR/I-NP/O/NP-OBJ-2/more references/NNS/I-NP/O/NP-OBJ-2/reference and/CC/I-NP/O/NP-OBJ-2/and explanations/NNS/I-NP/O/NP-OBJ-2/explanation of/IN/I-PP/B-PNP/O/of the/DT/I-NP/I-PNP/O/the tagsets/NNS/I-NP/I-PNP/O/tagset and/CC/I-NP/I-PNP/O/and color/NN/I-NP/I-PNP/O/color codes/NNS/I-NP/I-PNP/O/code used/VBN/I-VP/O/VP-3/use to/TO/I-VP/O/VP-3/to encode/VB/I-VP/O/VP-3/encode the/DT/I-NP/O/NP-OBJ-3/the syntactic/JJ/I-NP/O/NP-OBJ-3/syntactic information/NN/I-NP/O/NP-OBJ-3/information ././O/O/O/. WSJ MBSP output

  41. <S cnt="s3"> <NP rel="SBJ" of="s3_1"> <W pos="DT">An</W> <W pos="NN">online</W> </NP> <VP id="s3_1"> <W pos="NN">demo</W> <W pos="MD">can</W> <W pos="VB">be</W> <W pos="VBN">found</W> </VP> <ADVP> <W pos="RB">here</W> </ADVP> <W pos="colon">;</W> <NP rel="SBJ" of="s3_2"> <W pos="DT">the</W> <W pos="JJ">page</W> </NP> <VP id="s3_2"> <W pos="NNS">features</W> </VP> <NP rel="OBJ" of="s3_2"> <W pos="RBR">more</W> <W pos="NNS">references</W> <W pos="CC">and</W> <W pos="NNS">explanations</W> </NP> <PNP> <PP> <W pos="IN">of</W> </PP> <NP> <W pos="DT">the</W> <W pos="NNS">tagsets</W> <W pos="CC">and</W> <W pos="NN">color</W> <W pos="VBZ">codes</W> </NP> </PNP> <VP id="s3_3"> <W pos="VBN">used</W> <W pos="TO">to</W> <W pos="VB">encode</W> </VP> <NP rel="OBJ" of="s3_3"> <W pos="DT">the</W> <W pos="JJ">syntactic</W> <W pos="NN">information</W> </NP> <W pos="period">.</W> </S> BioMinT MBSP output