1 / 30

Natural Language Processing

Natural Language Processing. Vasile Rus http://www.cs.memphis.edu/~vrus/nlp. Outline. Announcements Syntax Context Free Grammars (CFG) English Phrases Issues in CFG. Announcements. MIDTERM after the break!. Syntax. Syntax = rules describing how words can connect to each other

jaredlopez
Download Presentation

Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language Processing Vasile Rus http://www.cs.memphis.edu/~vrus/nlp

  2. Outline • Announcements • Syntax • Context Free Grammars (CFG) • English Phrases • Issues in CFG

  3. Announcements • MIDTERM after the break!

  4. Syntax • Syntax = rules describing how words can connect to each other * that and after year last [incorrect] I saw you yesterday colorless green ideas sleep furiously • the kind of implicit knowledge of your native language that you had mastered by the time you were 3 or 4 years old without explicit instruction • not necessarily the type of rules you were later taught in school

  5. Syntax • Why should you care? • Grammar checkers • Question answering • Information extraction • Machine translation

  6. Context-Free Grammars • Capture constituency and ordering • Ordering is easy • What are the rules that govern the ordering of words and bigger units in the language • What’s constituency? • How do words group into units and what can we say about how the various kinds of units behave

  7. CFG Examples S -> NP VP NP -> Det NOMINAL NOMINAL -> Noun VP -> Verb Det -> a Noun -> flight Verb -> left • these rules are defined independent of the context where they might occur -> CFG

  8. CFGs • S -> NP VP • This says that there are units called S, NP, and VP in this language • That an S consists of an NP followed immediately by a VP • Doesn’t say that that’s the only kind of S • Nor does it say that this is the only place that NPs and VPs occur • Recognition vs Generativity • Generate strings in the language • Reject strings not in the language • Impose structures (trees) on strings in the language

  9. Derivations • A derivation is a sequence of rules applied to a string that accounts for that string • Covers all the elements in the string • Covers only the elements in the string

  10. Derivations as Trees I prefer a morning flight

  11. Parsing • Parsing is the process of taking a string and a grammar and returning a (many?) parse tree(s) for that string

  12. Other Options • Regular languages (expressions) • Too weak (cannot deal with recursion ) • Context-sensitive or Turing equiv • Too powerful / computationally intractable

  13. Context? • The notion of context in CFGs is not the same as the ordinary meaning of the word context in language • All it really means is that the non-terminal on the left-hand side of a rule is out there all by itself • A -> B C • Means that I can rewrite an A as a B followed by a C regardless of the context in which A is found

  14. Key Constituents (English) • Sentences • Noun phrases • Verb phrases • Prepositional phrases

  15. Sentence-Types • Declaratives: A plane left • S -> NP VP • Imperatives: Leave! • S -> VP • Yes-No Questions: Did the plane leave? • S -> Aux NP VP • WH Questions: When did the plane leave? • S -> WH Aux NP VP

  16. Recursion • We’ll have to deal with rules such as the following where the non-terminal on the left also appears somewhere on the right (directly). • NP -> NP PP [[The flight] [to Boston]] • VP -> VP PP [[departed Miami] [at noon]]

  17. Recursion • Of course, this is what makes syntax interesting • flights from Denver • Flights from Denver to Miami • Flights from Denver to Miami in February • Flights from Denver to Miami in February on a Friday • Flights from Denver to Miami in February on a Friday under $300 • Flights from Denver to Miami in February on a Friday under $300 with lunch

  18. The Point • If you have a rule like • VP -> V NP It only cares that the thing after the verb is an NP. It doesn’t have to know about the internal affairs of that NP

  19. The Point • VP -> V NP • I hate • flights from Denver • Flights from Denver to Miami • Flights from Denver to Miami in February • Flights from Denver to Miami in February on a Friday • Flights from Denver to Miami in February on a Friday under $300 • Flights from Denver to Miami in February on a Friday under $300 with lunch

  20. Conjunctive Constructions S -> S and S • John went to NY and Mary followed him NP -> NP and NP VP -> VP and VP … In fact the right rule for English is X -> X and X

  21. Potential Problems in CFG • Agreement • Subcategorization • Movement

  22. *This dogs *Those dog *This dog eat *Those dogs eats Agreement • This dog • Those dogs • This dog eats • Those dogs eat

  23. Subcategorization • Sneeze: John sneezed • Find: Please find [a flight to NY]NP • Give: Give [me]NP[a cheaper fare]NP • Help: Can you help [me]NP[with a flight]PP • Prefer: I prefer [to leave earlier]TO-VP • Told: I was told [United has a flight]S • … • *John sneezed the book • *I prefer United has a flight • *Give with a flight Subcat expresses the constraints that a predicate (verb for now) places on the number and type of the argument it wants to take

  24. So? • So the various rules for VPs overgenerate • They permit the presence of strings containing verbs and arguments that don’t go together • For example: • VP -> V NP therefore • Sneezed the book is a VP since “sneeze” is a verb and “the book” is a valid NP Subcategorization frames can fix this problem (“slow down” overgeneration)

  25. Movement • Core example • [[My travel agent]NP [booked [the flight]NP]VP]S i.e. “book” is a straightforward transitive verb. It expects a single NP within the VP as an argument, and a single NP arg as the subject.

  26. Movement • What about? • Which flight do you want me to have the travel agent book? • The direct object argument to “book” isn’t appearing in the right place. It is in fact a long way from where its supposed to appear • And note that it is separated from its verb by 2 other verbs

  27. Formally… • To put all previous discussions/examples in a formal definition for CFG: A context free grammar has four parameters: • A set of non-terminal symbols N • A set of terminal symbols T • A set of production rules P, each of the form A  a, where A is a non-terminal, and a is a string of symbols from the infinite set of strings (T  N)* • A designated start symbol S

  28. Grammar equivalence and normal form • Strong equivalence: • two grammars are strongly equivalent if: • they generate the same set of strings • they assign the same phrase structure to each sentence • two grammars are weakly equivalent if: • they generate the same set of strings • they do not assign the same phrase structure to each sentence • Normal form • Restrict the form of productions • Chomsky Normal Form (CNF) • Right hand side of the productions has either two non-terminals, or one terminal • e.g. A -> BC A -> a • Any grammar can be translated into a weakly equivalent CNF • A -> B C D <=> A-> B X X -> C D

  29. Summary • Syntax • Context Free Grammars (CFG) • English Phrases • Issues in CFG

  30. Next Time • Context Free Grammars (CFG) and Parsing Algorithms

More Related