1 / 9

Word morphology

Word morphology. Teaching computers to read. Research papers. Book cites: “Viewing Morphology as an Inference Process” by Krovetz , SIGIR 1993 This is cited by: “ Guessing Morphology from Terms and Corpora” by Jacquemin , SIGIR 1997 When are different words the same word?.

fairly
Download Presentation

Word morphology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Word morphology Teaching computers to read

  2. Research papers • Book cites: “Viewing Morphology as an Inference Process” by Krovetz, SIGIR 1993 • This is cited by: “Guessing Morphology from Terms and Corpora” by Jacquemin, SIGIR 1997 • When are different words the same word?

  3. Porter stemming Multi-step process to remove word suffixes • Stem • Stems • Stemmed • Stemming • -ology • -ize • -ship

  4. Stemming problems Derivation - Meaning • Doe • Donut • Paste • Pastafarian Inflection – Syntax • Do • Doing • Done • Past (n) • Past (v)

  5. inflectional stemming Afflictional suffixes are safe to remove… usually • Plural: s, es, ies 57% • Tense: ed 22% • Aspect: ing 21%

  6. Derivational stemming Words that change meaning if they are stemmed. • Appreciate v Appreciation • 2/3rds of derivational variants appear in the dictionary • Krovetz’s solution is to leave dictionary words alone

  7. Inferring morphology • Jacquemin asserts morphology can be derived from the corpus • Word truncation • Multi-word term conflation • Classification & filtering • Clustering

  8. Statistical weighting • Different segments of terms are given different statistical weights

  9. Word classification • Jacquemin’s algorithm allows error in conflation • Errors are filtered statistically • Rare and domain-specific terms are conflated • Gene rearrangement / Genetic rearrangement • Artificial ventilation / Artificially ventilated • North Africa / Northern Africa • Cirrhosis / Cirrhosia • Pulsating flow / pulsatile flow • The algorithm acts like a snap-to-grid for text

More Related