1 / 11

Human Language Technology

Human Language Technology. Part of Speech (POS) Tagging II Rule-based Tagging. Acknowledgment. Most slides taken from Bonnie Dorr’s course notes: www.umiacs.umd.edu/~bonnie/courses/cmsc723-03 Jurafsky & Martin Chapter 5. Bibliography.

helmut
Download Presentation

Human Language Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Human Language Technology Part of Speech (POS) Tagging II Rule-based Tagging

  2. Acknowledgment • Most slides taken from Bonnie Dorr’s course notes:www.umiacs.umd.edu/~bonnie/courses/cmsc723-03 • Jurafsky & Martin Chapter 5 CLINT Lecture IV

  3. Bibliography • A. Voutilainen, Morphological disambiguation, in Karlsson, Voutilainen, Heikkila, Anttila (eds) Constraint Grammar pp165-284, Mouton de Gruyter, 1995. See [e-book] CLINT Lecture IV

  4. EngCG Rule-Based Tagger (Voutilainen 1995) • Rules based on English Constraint Grammar • Two stage design • Uses ENGTWOL Lexicon • Hand written disambiguation rules CLINT Lecture IV

  5. ENGTWOL Lexicon • Based on TWO-Level morphology of English (hence the name) • 56,000 entries for English word stems • Each entry annotated with morphological and syntactic features CLINT Lecture IV

  6. Sample ENGTWOL Lexicon CLINT Lecture IV

  7. Examples of constraints (informal) • Discard all verb readings if to the left there is an unambiguous determiner, and between that determiner and the ambiguous word itself, there are no nominals (nouns, abbreviations etc.). • Discard all finite verb readings if the immediately preceding word is to. • Discard all subjunctive readings if to the left, there are no instances of the subordinating conjunction that or lest. • The first constraint would discard the verb reading (next slide) • There are about 1,100 constraints altogether CLINT Lecture IV

  8. Actual Constraint Syntax Given input: “that”If (+1 A/ADV/QUANT) (+2 SENT-LIM) (NOT -1 SVOC/A)Then eliminate non-ADV tagsElse eliminate ADV tag • this rule eliminates the adverbial sense of that as in “it isn’t that odd” CLINT Lecture IV

  9. ENGCG Tagger • Stage 1: Run words through morphological analyzer to get all parts of speech. • E.g. for the phrase “the tables”, we get the following output:"<the>" "the" <Def> DET CENTRAL ART SG/PL "<tables>" "table" N NOM PL "table" <SVO> V PRES SG3 VFIN • Stage 2: Apply constraints to rule out incorrect POSs CLINT Lecture IV

  10. Example WORD TAGS Pavlov PVLOV N NOM SG PROPER had HAVE V PAST VFIN SVO HAVE PCP2 SVO shown SHOW PCP2 SVOO SVO SV that ADV PRON DEM SG DET CENTRAL SEM SG CS (subord. conj) salivation N NOM SG CLINT Lecture IV

  11. Performance • Tested on examples from Wall St Journal, Brown Corpus, Lancaster-Oslo-Bergen Corpus • After application of the rules 93-97% of all words are fully disambiguated, and 99.7% of all words retain correct reading. • At the time, this was superior performance to other taggers • However, one should not discount the amount of effort needed to create this system

More Related