1 / 12

Applications of SuperTagging

Applications of SuperTagging. Raman Chandrasekar. SuperTagging: Applications. Information Filtering: SuperTagging used to increase retrieval precision Text Simplification: SuperTagging used to induce rules for text simplification Word Sense Disambiguation Machine Translation

jaimie
Download Presentation

Applications of SuperTagging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applications of SuperTagging Raman Chandrasekar Industry Day

  2. SuperTagging: Applications • Information Filtering: SuperTagging used to increase retrieval precision • Text Simplification: SuperTagging used to induce rules for text simplification • Word Sense Disambiguation • Machine Translation • Information Extraction • Noun Phrase Chunking

  3. Glean: Document Filtering • Problem: to access only relevant information • Current approaches: • Information retrieval (IR) systems use keywords, boolean operators etc. • Problems due to synonymy and polysemy • Most Web search engines tend to • maximize recall (coverage) • emphasize speed of retrieval • but sacrifice precision (`accuracy’ of result) • Our approach: Use syntactic information to increase precision.

  4. Glean: The Basics • Underlying ideas: • meaning of a word decided by how it is used • much information latent in text • good to use post-processing filter model • Use SuperTagging to get syntactic labeling • Part-of-Speech tags are not as useful[RIAO ‘97]

  5. Glean: Architecture

  6. Glean: Query by Example • Input: • Search Engine Query Expression+work +IRCS +”natural language processing” +learning • Concept/word of Interestwork • Prototypical usage:She has been working on problems related to aspect.He works in the area of Information Retrieval.She works on statistical mechanics.Recently he has been working in the area of quantum computing. • Interpretation: • get all documents satisfying the query expression, • check if they contain sentences with a variant of work, • check that these are `relevant’, i.e. structurally similar to the context around work in the prototypical sentences.

  7. Glean: Inducing a Pattern • Prototypical usage: • Shehas been working on problems related to aspect. • Chunked, supertagged version: • She/A_NXN has/B_Vvx~been/B_Vvx~working/A_nx0V on/B_vxPnx problems/A_NXN related/A_nx1V to/B_vxPnx aspect/A_NXN ./B_sPU • Context around word of interest: • She/A_NXN has/B_Vvx~been/B_Vvx~working/A_nx0V on/B_vxPnx … • Generalized pattern: • */NP*working/A_nx0V*on/B_vxPnx • This pattern also matches, for example:“We are also workingon type systems for data and knowledge bases.”

  8. The Glean system • Implemented (mainly in PERL) with • HTML Form-interfaces, with a variety of options • a SuperTagger server • Results • 97 % recall and 88 % precision in filtering outirrelevant material in a small test. • Large scale evaluation in progress. • Demo available Research collaboration between the National Centre for Software Technology, Bombay, Institute for Research in Cognitive Science &Center for Advanced Study of India, University of Pennsylvania.

  9. SuperTagging: Benefits • Right level of granularity • Rich tag set, suitable for a variety of applications • Accurate: over 92% accuracy • Fast: 31 - 57 words/sec (interpreted PERL) • Can be easily retrained, if required • Many more applications possible

  10. Automatic Text Simplification • Basic Idea: To process complex text • create better tools or • simplify the text to be processed! • Initial Prototype of Simplification System (Bombay) • Based on Finite State Grammars • Rules on strings to map complex sentences to simpler ones • To simplify sentences of the form:Talwinder Singh, who masterminded the Air India sabotage,was killed in a shoot-out with police ... • we use a rule such as:Segment1/NP, who Segment2, Segment3=> Segment1 Segment3. Segment1 Segment2. • to get :Talwinder Singh was killed in a shoot-out with police….Talwinder Singh masterminded the Air India sabotage.

  11. Automatic Text Simplification • SuperTagging is better [Coling96] • Constituent spans easier to identify • Simplification rules more expressive • Rules can now be induced automatically [KBCS96 , KBS] • Data: Parallel (aligned) corpus of complex and simple text • Induction Procedure: • Data tagged using SuperTagging and LDA • Aligned labeled trees for complex & simple trees compared • Tree-to-trees transformations identified • Reduced to a normal form to get simplification rules.

  12. Noun-Phrase Chunking • Variety of approaches (Hindle, Marcus & Ramshaw, Voutilainen) for Noun-Phrase Chunking • Depending on application, we may need • maximal noun phrases • basal noun phrases • all derivable noun phrases • SuperTagging provides mechanisms for application-specific noun phrase chunking • Can form part of (or basis for) a variety of tools

More Related