1 / 55

Inteligenta Artificiala

Inteligenta Artificiala. Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea http://turing.cs.pub.ro/ia_2005. Curs nr. 12. Prelucrarea limbajului natural (Natural Language Processing). 2. Defining Languages with Backus-Naur Form (BNF).

sibyl
Download Presentation

Inteligenta Artificiala

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inteligenta Artificiala Universitatea Politehnica BucurestiAnul universitar 2003-2004 Adina Magda Florea http://turing.cs.pub.ro/ia_2005

  2. Curs nr. 12 Prelucrarea limbajului natural (Natural Language Processing) 2

  3. Defining Languages with Backus-Naur Form (BNF) • A formal language is defined as a set of strings, where each string is a sequence of symbols • All the languages consist of an infinite set of strings  need a concise way to characterize the set  use a grammar • Terminal Symbols • Symbols or words that make up the strings of the language Example • Set of symbols for the language of simple arithmetic expressions • {0,1,2,3,4,5,6,7,8,9,+,-,*,/,(,)}

  4. Components in a BNF Grammar • Nonterminal Symbols • Categorize subphrases of the language • Example • The nonterminal symbol NP (NounPhrase) denotes an infinite set of strings, including “you” and “the big dog”

  5. Components in a BNF Grammar • Start Symbol • Nonterminal symbol that denotes the complete strings of the language • Set of rewrite rules or productions • LHS  RHS • LHS is a nonterminal • RHS is a sequence of zero or more symbols (either terminal or nonterminal)

  6. Example: BNF Grammar for Simple Arithmetic ExpressionsExp  Exp Operator Exp | (Exp) | NumberNumber  Digit | Number DigitDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9Operator  + | - | * | /

  7. The Component Steps of Communication • A typical communication, in which the speaker S wants to transmit the proposition P to the hearer H using words W, is composed of 7 processes. • 3 take place in the speaker • 4 take place in the hearer

  8. Processes in the Speaker • Intention • S wants H to believe P (where S typically believes P) • Generation • S chooses the words W (because they express the meaning P) • Synthesis • S tells the words W (usually addressing them to H)

  9. Processes in the Hearer • Perception • H perceives W’ (ideally W’ = W, but misperception is possible) • Analysis • H infers that W’ has possible meanings P1,…,Pn(words and phrases can have several meanings)

  10. Processes in the Hearer • Disambiguation • H infers that S intended to express Pi(where ideally Pi= P, but misinterpretation is possible) • Incorporation • H decides to believe Pi (or rejects it if it is out of line with what H already believes)

  11. Observations • If the perception refers to spoken expressions, this is speech recognition • If the perception refers to hand written expressions, this is recognition of hand writing • Neural networks have been successfully used to both speech recognition and to hand writing recognition

  12. Observations • The analysis, disambiguation and incorporation form natural language understanding are relying on the assumption that the words of the sentence are known • Many times, recognition of individual words may be driven by the sentence structure, so perception and analysis interact, as well as analysis, disambiguation, and incorporation

  13. Defining a Grammar • Lexicon - list of allowable vocabulary words, grouped in categories (parts of speech): • open classes - words are added to the category all the time (natural language is dynamic, it constantly evolves) • closed classes - small number of words, generally it is not expected that other words will be added

  14. Example - A Small LexiconNoun  stench | breeze | wumpus ..Verb  is | see | smell ..Adjective  right | left | smelly …Adverb  here | there | ahead …Pronoun  me | you | I | itRelPronoun  that | whoName  John | Mary Article  the | a | an Preposition  to | in | on Conjunction  and | or | but

  15. The Grammar Associated to the Lexicon • Combine the words into phrases • Use nonterminal symbols to define different kinds of phrases • sentence S • noun phrase NP • verb phrase VP • prepositional phrase PP • relative clause RelClause

  16. Example - The Grammar Associated to the LexiconS  NP VP | S Conjunction SNP  Pronoun | Noun | Article Noun | NP PP | NP RelClauseVP  Verb | VP NP | VP Adjective | VP PP | VP AdverbPP  Preposition NPRelClause  RelPronoun VP

  17. Syntactic Analysis (Parsing) • Parsing is the problem of constructing a derivation tree for an input string from a formal definition of a grammar. • Parsing algorithms may be divided into two classes: • top-down parsing • bottom-up parsing

  18. Top-Down Parsing • Start with the top-level sentence symbol and attempt to build a tree whose leaves match the target sentence's words (the terminals) • Better if many alternative terminal symbols for each word • Worse if many alternative rules for a phrase

  19. Example for Top-Down Parsing"John hit the ball" 1. S 2. S  NP, VP 3. S  Noun, VP 4. S  John, Verb, NP 5. S  John, hit, NP 6. S  John, hit, Article, Noun 7. S  John, hit, the, Noun 8. S  John, hit, the, ball

  20. Bottom-Up Parsing • Start with the words in the sentence (the terminals) and attempt to find a series of reductions that yield the sentence symbol • Better if many alternative rules for a phrase • Worse if many alternative terminal symbols for each word

  21. Example for Bottom-Up Parsing 1. John, hit, the, ball 2. Noun, hit, the, ball 3. Noun, Verb, the, ball 4. Noun, Verb, Article, ball 5. Noun, Verb, Article, Noun 6. NP, Verb, Article, Noun 7. NP, Verb, NP 8. NP, VP 9. S

  22. Definite Clause Grammar (DCG) • Problems with BNF Grammar • BNF only talks about strings, not meanings • Want to describe context-sensitive grammars, but BNF is context-free • Introduce a formalism that can handle both of these problems • Use the first-order logic to talk about strings and their meanings

  23. Definite Clause Grammar (DCG) • We are interested in using language for communication  need some way of associating a meaning with each string • Each nonterminal symbol becomes a one-place predicate that is true of strings that are phrases of that category • Example • Noun(“ball”) is a true logical sentence • Noun(“the”) is a false logical sentence

  24. Definite Clause Grammar (DCG) • A definite clause grammar (DCG) is a grammar in which every sentence must be a definite clause. • A definite clause is a type of Horn clause that, when written as an implication, has exactly one atom in the conclusion and a conjunction of zero or more atoms in the hypothesis, for example A1 A2  …  C1

  25. Example 1In BNF notation, we have: S  NP VPIn First-Order Logic notation, we have:NP(s1)  VP(s2)  S(Append(s1, s2))We read: If there is a string s1 that is a noun phrase and a string s2 that is a verb phrase, then the string formed by appending them together is a sentence

  26. Example 2In BNF notation, we have: Noun  ball | bookIn First-Order Logic notation, we have:(s = “ball”  s = “book”)  Noun(s)We read: If s is the string “ball” or the string “book”, then the string s is a noun

  27. Rules to Translate BNF in DCG

  28. Augmenting the DCG • Extend the notation to incorporate grammars that can not be expressed in BNF • Nonterminal symbols can be augmented with extra arguments

  29. Augmenting the DCG Add one argument for semantics • In DCG, the nonterminal NP translates as a one-place predicate where the single argument is a string: NP(s) • In the augmented DCG, we can write NP(sem) to express “an NP with semantics sem”. This gets translated into logic as the two-place predicate NP(sem, s)

  30. DCG FOPL PROLOG S(sem)  NP(sem1) VP(sem2) {compose(sem1, sem2, sem)} NP(s1, sem1)  VP(s2, sem2)  S(append(s1, s2)), compose(sem1, sem2, sem) See later on Augmenting the DCG Add one argument for semantics

  31. Semantic Interpretation • Compositional semantics - the semantics of any phrase is a function of the semantics of its subphrases; it does not depend on any other phrase before, after, or encompassing the given phrase • But natural languages does not have a compositional semantics for the general case.

  32. sentence(S, Sem) :- np(S1, Sem1), vp(S2, Sem2), append(S1, S2, S), Sem = [Sem1 | Sem2]. np([S1, S2], Sem) :- article(S1), noun(S2, Sem). vp([S], Sem) :- verb(S, Sem1), Sem = [property, Sem1]. vp([S1, S2], Sem) :- verb(S1), adjective(S2, color, Sem1), Sem = [color, Sem1]. vp([S1, S2], Sem) :- verb(S1), noun(S2, Sem1), Sem = [parts, Sem1].

  33. Problems with Augmented DCG • The previous grammar will generate sentences that are not grammatically correct • NL is not a context free language • Must deal with • cases • agreement between subject and main verb in the sentence (predicate) • verb subcategorization: the complements that a verb can accept

  34. Solution • Augment the existing rules of the grammar to deal with context issues • Start by parameterizing the categories NP and Pronoun so that they take a parameter indicating their case

  35. CASES Nominative case (subjective case) + agreement I take the bus Je prends l’autobus Eu iau autobuzul You take the bus Tu prends l’autobus Tu iei autobuzul He takes the bus Il prend l’autobus El ia autobuzul Accusative case (objective case) He gives me the book Il me donne le livre El imi da cartea Dative case You are talking to me Il parle avec moi El vorbeste cu mine

  36. Example - The Grammar Using Augmentations to Represent Noun CasesS  NP(Subjective) VPNP(case)  Pronoun (case) | Noun | Article NounPronoun(Subjective)  I | you | he | shePronoun(Objective)  me | you | him | her

  37. sentence(S) :- np(S1,subjective), vp(S2), append(S1, S2, S).np([S], Case) :- pronoun(S, Case).np([S], _ ) :- noun(S).np([S1, S2], _ ) :- article(S1), noun(S2).pronoun(i, subjective).pronoun(you, _ ).pronoun(he, subjective).pronoun(she, subjective).pronoun(me, objective).pronoun(him, objective).pronoun(her, objective).

  38. Verb Subcategorization • Augment the DCG with a new parameter to describe the verb subcategorization • The grammar must state which verbs can be followed by which other categories. This is the subcategorization information for the verb • Each verb has a list of complements

  39. Integrate Verb Subcategorization into the Grammar • A subcategorization list is a list of complement categories that the verb accepts • Augment the category VP to take a subcategorization argument that indicates the complements that are needed to form a complete VP

  40. Integrate Verb Subcategorization into the Grammar • Change the rule for S to say that it requires a verb phrase that has all its complements, and thus a subcategorization list of [ ] • Rule S  NP(Subjective) VP([ ]) • The rule can be read as “A sentence can be composed of a NP in the subjective case, followed by a VP which has a null subcategorization list “

  41. Integrate Verb Subcategorization into the Grammar • Verb phrases can take adjuncts, which are phrases that are not licensed by the individual verb, but rather may appear in any verb phrase • Phrases representing time and place are adjuncts, because almost any action or event can have a time or a place VP(subcat)  VP(subcat) PP | VP(subcat) Adverb I smell the wumpus now

  42. VP(subcat)  VP([NP | subcat]) NP(Objective) | VP([Adjective | subcat]) Adjective | VP ([PP | subcat]) PP | Verb(subcat) | VP(subcat) PP | VP(subcat) AdverbThe first line can be read as “A VP, with a given subcategorization list, subcat, can be formed by a VP followed by a NP in the objective case, as long as that VP has a subcategorization list that starts with the symbol NP and is followed by the elements of the list subcat ”

  43. give [NP, PP] give the gold in box to me [NP, NP] give me the goldsmell [NP] smell a wumpus [Adjective] smell awfull [PP] smell like a wumpusis [Adjective] is smelly [PP] is in box [NP] is a pitdied [] diedbelieve [S] believe the wumpus is dead

  44. VP(subcat)  VP([NP | subcat]) NP(Objective) | VP([Adjective | subcat]) Adjective | VP ([PP | subcat]) PP | Verb(subcat) | VP(subcat) PP | VP(subcat) Adverbvp(S, [np | Subcat]) :- vp(S1, [np | Subcat]), np(S2, objective), append(S1, S2, S).vp(give, [np, pp]).vp(give, [np, np]).vp(smell, [np]).vp(smell,[adjective]).vp(smell,[pp]).

  45. But dangerous to translateVP(subcat)  VP(subcat) PPSolutionvp(S, Subcat) :- vp1(S1, Subcat), pp(S2), append(S1, S2, S).

  46. Generative Capacity of Augmented Grammars • The generative capacity of augmented grammars depends on the number of values for the augmentations • If there is a finite number, then the augmented grammar is equivalent to a context-free grammar

  47. Semantic Interpretation • The semantic interpretation is responsible for getting all possible interpretations, and disambiguation is responsible for choosing the best one. • Disambiguation is done starting from the pragmatic interpretation of the sentence.

  48. Pragmatic Interpretation • Complete the semantic interpretation by adding information about the current situation • Pragmatics shows how the language is used and its effects on the listener • Pragmatics will tell why it is not appropriate to answer "Yes" to the question "Do you know what time it is?"

  49. Indexicals • Indexical - phrase that refer directly to the current situation • Example • I am in Bucharest today.

  50. Anaphora • Anaphora - the occurrence of phrases referring to objects that have been mentioned previously • Example • John was hungry. He entered a restaurant. • The ball hit the house. It broke the window.

More Related