1 / 41

Alexander Gelbukh Gelbukh

Special Topics in Computer Science Advanced Topics in Information Retrieval Lecture 10: Natural Language Processing and IR. Syntax and structural disambiguation. Alexander Gelbukh www.Gelbukh.com. Previous Chapter: Conclusions.

carrington
Download Presentation

Alexander Gelbukh Gelbukh

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Special Topics in Computer ScienceAdvanced Topics in Information RetrievalLecture 10: Natural Language Processing and IR. Syntax and structural disambiguation Alexander Gelbukh www.Gelbukh.com

  2. Previous Chapter: Conclusions • Tagging, word sense disambiguation, andanaphora resolution are cases of disambiguation ofmeaning • Useful in translation, information retrieval, and textundertanding • Dictionary-based methods • good but expensive • Statistical methods • cheap and sometimes imperfect... but not always (if verylarge corpora are available)

  3. Previous Chapter: Research topics • Too many to list • New methods • Lexical resources (dictionaries) • = Computational linguistics

  4. Contents • Language levels • Syntax • Dependency approach • Constituency-based approach • Head-driven approach • Grammars and parsing • Ambiguity and disambiguation

  5. Language levels • Letters are built up into words • Words into sentences • Sentences into <...> text • Each level has its own representation • This allows for modular processing • A module describes one levelor transforms from one level to another

  6. Source of language complexity: 1-D

  7. Source of language complexity: 1-D

  8. Linguistic processortranslates between representations

  9. General scheme of text processing • Linguistic processor uses linguistic knowledge • Applied system uses other types of knowledge(e.g., Artificial Intelligence)

  10. Language levels • Morphological: words • Syntactic: sentences • Semantic: meaning • Pragmatic: intention • ...?

  11. Fine structure of linguistic processor

  12. Example of text “Science is important for our country. The Government pays it much attention.”

  13. Textual representation Text is a sequence of letter. S c i e n c e i s i m p o r t a n t f o r o u r c o u n t r y . T h e G o v e r n m e n t p a y s i t m u c h a t t e n t i o n .

  14. Morfological analysis Morphologicalanalysis

  15. Morphological representation A sequence of words.

  16. Syntactic parsing Syntacticparsing

  17. Syntactic representation A sequence of syntactic trees.

  18. Syntactic representation • What happened? • With whom happened? • ... their details

  19. Semantic analysis Semanticanalysis Next lecture...

  20. Syntax • The structure describing the relationships between words in a sentence • Describes the relationships implied by grammatical characteristics • not by meaning • Often allows for simple paraphrasing • John reads the book • The book is read by John

  21. Early approach: Dependency syntax • Tree • Nodes: words • Arcs: modified by • Modifies means adds details,clarifies, chooses of many...makes more specific • Arcs are typed • Types are: subject, object, attribute, ... Recipient Subject Object Attribute

  22. ... Dependency syntax • General situation: pay • More specifically: the onewhere: • who pays is government • what is paid is attention • to whom it is paid is it • More specifically: attention that is much Recipient Subject Object Attribute

  23. Advantages/disadvantages of Dependency Syntax Advantages • Solid linguistic base • Rather direct translation into semantics • Easily applicable to languages with free word order • Korean? Russian, Latin • This is why solid linguistic base: good for classical languages! Disadvantages • No nice mathematical base • No simple algorithms

  24. Most popular approach: Constituency (Phrase Structure grammars) • Tree • Nodes: nested segments of the phrase • Cannot intersect, only nested • Usually are labeled with part-of-speech names • Arcs: nesting • In classical approach, arcs are not labeled [[Our Government ][pays [ much attention][to it ]]]

  25. Constituency [[Our Government ][pays [ much attention][to it ]]] Our Government pays much attention to it

  26. Constituency [[OurR GovernmentN]NP [paysV[ muchA attentionN]NP[toP itR]PP]VP]S R: pronoun NP: noun phrase N: noun VP: verb phrase V: verb PP: prepositional phrase A: adjective S: sentence

  27. Constituency: graphical representation [[Our Government ]NP[pays [ much attention]NP[to it ]PP]VP]S S VP NP NP PP NP VP NP NP R N V A N P R Our Government pays much attention to it

  28. Phrase structure grammar • Enumerates possible configurations at nodes • Usually recursive S  NP VP NP  A NP NP  R NP NP  P NP NP  N VP  VP NP PP VP  V S VP NP NP PP NP VP NP NP R N V A N P R Our Government pays much attention to it

  29. Context-independency hypothesis • A configuration is possible or not,regardless of where it is used • Wherever you find VP NP PP, it can be VP • Wherever you find NP VP, it can be S • If you can put together S that covers all the sentence,it is a grammatically correct description • With this, given a suitable grammar, you can • List all sentences of a language • List only correct sentences of that language • List all and only correct structures • Correctness means a native speaker’s intuition

  30. Generative idea • Find a grammar to list all and only correct sentences (with their structures) of a language • This is a complete description of that language! • How can be useful in analysis? • Reverse the grammar

  31. Parsing • Given a grammar and a sentence • Find all possible structures • That describe this sentence with this grammar • Many methods. Not discussed today.A lot of research. Very fast algorithms • Complexity: cubic in the number of words in the sentence (there are better methods, up to 2.8) • Problem: combinatorics of variants

  32. Advantages and disadvantages of consitituency approach Advantages • Nice mathematics, very well understood • Efficient analysis algorithms, very well-elaborated • Good for languages with fixed word order • English. Chinese? Disadvantages • Difficult translation into semantics • Bad when it comes to freer word order • Even in English! Worse in other languages

  33. Head-driven approaches • Combine some advantages of dependency-based and constituency-based approaches • Syntax is still fixed-order. But word dependency information is added • Easier translation into semantics • More linguistically-based • How? • In each constituent, the main word (head) is marked • It modifies the head of the larger constituent [[Our Government][pays[ much attention][to it]]]

  34. Syntactic ambiguity • I see a cat with a telescope • I see [a cat][with a telescope] • I use a telescope to see a cat • I see [a cat [with a telescope]] • I see a cat that has a telescope • Nearly any preposition causes ambiguity • Dozens, thousands, millions of variants for a sentence! • Because their numbers multiply • I see a cat with a telescope in a garden at the shore of a river

  35. Ambiguity resolution • Syntactic means are not enough • Is telescope more related to see or to cat? • Statistical methods: is it used with see or cat? • Dictionary-based methods: does it share more meaning with see or cat? • Path length in a dictionary of semantic relationships • Ideally, context should be analyzed, and reasoning applied: • I see a cat with a telescope. It keeps the telescope in its left paw. • Now no good methods for this.

  36. Shallow parsing • Due to the HUGE problems in resolving ambiguity • Do not resolve it! • Do what you can de well I see [a cat][with a telescope][in a garden][at the shore][of a river] • Better than nothing • Can be done well

  37. Evaluation • PARSEVAL international contents • A practical parser usually gives only one variant • Implies disambiguation! • Manually built corpora (treebanks) • Compare what the program did with what humans did

  38. One of the uses in IR:Lexical ambiguity resolution • Syntactic analysis helps in POS disambiguation: • Oil is used well in Mexico. • Oil well is used in Mexico. • Well = ? • But does not help in WSD: • I deposited my money in an international bank. • I live on a beautiful bank of Han river.

  39. Research topics • Faster algorithms • E.g. parallel • Handling linguistic phenomena not handled bycurrent approaches • Ambiguity resolution! • Statistical methods • A lot can be done

  40. Syntax structure is one of intermediate representationsof a text for its processing Helps text understanding Thus reasoning, question answering, ... Directly helps POS tagging Resolves lexical ambiguity of part of speech But not WSD-type ambiguities A big science in itself, with 50 (2000?) years of history Conclusions

  41. Thank you! Till June 8? 6 pm Semantics

More Related