Aljoscha Burchardt, Alexander Koller, Stephan Walter, Universität des Saarlandes,

Computational Semanticshttp://www.coli.uni-sb.de/cl/projects/milca/esslliDay II: A Modular Architecture Aljoscha Burchardt, Alexander Koller, Stephan Walter, Universität des Saarlandes, Saarbrücken, Germany ESSLLI 2004, Nancy, France

Computing Semantic Representations • Yesterday: • -Calculus is a nice tool for systematic meaning construction. • We saw a first, sketchy implementation • Some things still to be done • Today: • Let’s fix the problems • Let’s build nice software

Yesterday: -Calculus • Semantic representations constructed along the syntax tree: How to get there? By using functional application • s help to guide arguments in the right place on -reduction: x.love(x,mary)@john love(john,mary)

Yesterday’s disappointment Our first idea for NPs with determiner didn’t work out: “A man” ~> z.man(z) „A man loves Mary“ ~> *love(z.man(z),mary) But what was the idea after all? Nothing! z.man(z) just isn‘t the meaning of „a man“. If anything, it translates the complete sentence „There is a man“ Let‘s try again, systematically…

A solution What we want is: „A man loves Mary“ ~> z(man(z)  love(z,mary)) What we have is: “man” ~> y.man(y) “loves Mary” ~> x.love(x,mary) How about: z(man(z)  love(z,mary)) z(y.man(y)(z) x.love(x,mary)(z)) z(y.man(y)(z)  love(z,mary)) z(y.man(y)(z) x.love(x,mary)(z)) z(man(z)  love(z,mary)) z(y.man(y)(z) love(z,mary)) Remember: We can use variables for any kind of term. So next: P(Q.z(P(z) Q(z))) P( Q. z(y.man(y)(z) x.love(x,mary)(z)) z(y.man(y)(z) x.love(x,mary)(z)) P Q(z)) x.love(x,mary) )y.man(y) <~ “A”

But… “A man … loves Mary” P(Q.z(P(z) Q(z)))@ y.man(y) @ x.love(x,mary) z.man(z)  x.love(x,mary)(z) Q.z(man(z)Q(z)) man(z)  love(z,mary) P(Q.z(P(z)Q(z)))@y.man(y) @ x.love(x,mary) fine! “John … loves Mary” x.love(x,mary) @ john not systematic! john @ x.love(x,mary) not reducible! @  x.love(x,mary) better! x.love(x,mary)@john P.P@john love(john,mary) So: “John” ~> P.P(john)

Transitive Verbs What about transitive verbs (like "love")? "loves" ~> yx.love(x,y) ??? won't do: "Mary" ~> Q.Q(mary)  x.love(x,Q.Q(mary)) "loves Mary" ~> yx.love(x,y)@Q.Q(mary) How about something a little more complicated: "loves" ~> Rx(R@y.love(x,y)) The only way to understand this is to see it in action...

x(P.P(mary)@y.love(x,y)) P.P(john) @ ( @ ) @ "John loves Mary" again... love(john,mary) love(john,mary) x.love(x,mary)(john) love(john,mary) x(y.love(x,y)(mary)) x.love(x,mary) Rx(R@y.love(x,y)) P.P(mary) P.P(john) John loves Mary

Summing up • nouns: “man” ~> x.man(x) • intransitive verbs: „smoke“~>x.smoke(x) • determiner: „a“ ~> P(Q.z(P(z) Q(z))) • proper names: „mary“~>P.P(mary) • transitive verbs: “love” ~> Rx(R@y.love(x,y))

Today‘s first success What we can do now (and could not do yesterday): • Complex NPs (with determiners) • Transitive verbs … and all in the same way. Key ideas: • Extra λs for NPs • Variables for predicates • Apply subject NP to VP

Yesterday’s implementation s(VP@NP) --> np(NP),vp(VP). np(john) --> [john]. np(mary) --> [mary]. tv(lambda(X,lambda(Y,love(Y,X)))) --> [loves], {vars2atoms(X),vars2atoms(Y)}. iv(lambda(X,smoke(X))) --> [smokes], {vars2atoms(X)}. iv(lambda(X,snore(X))) --> [snorts], {vars2atoms(X)}. vp(TV@NP) --> tv(TV),np(NP). vp(IV) --> iv(IV). % This doesn't work! np(exists(X,man(X))) --> [a,man], {vars2atoms(X)}. Was this a good implementation?

A Nice Implementation What is a nice implementation?It should be: • Scalable: If it works with five examples, upgrading to 5000 shouldn’t be a great problem (e.g. new constructions in the grammar, more words...) • Re-usable: Small changes in our ideas about the system shouldn’t lead to complex changes in the implementation (e.g. a new representation language)

Solution: Modularity • Think about your problem in terms of interacting conceptual components • Encapsulate these components into modules of your implementation, with clean and abstract pre-defined interfaces to each other • Extend or change modules to scale / adapt the implementation

Another look at yesterday’s implementation • Okay, because it was small • Not modular at all: all linguistic functionality in one file, packed inside the DCG • E.g. scalability of the lexicon: Always have to write new rules, like: tv(lambda(X,lambda(Y,visit(Y,X)))) --> [visit], {vars2atoms(X),vars2atoms(Y)}. • Changing parts for Adaptation? Change every single rule! Let's modularize!

smoke(j) “John smokes” Semantic Construction:Conceptual Components Black Box

Semantic Construction:Inside the Black Box Syntax Semantics Black Box DCG Phrases (combinatorial) combine-rules Words (lexical) lexicon-facts

DCG The DCG-rules tell us what phrases are acceptable (mainly). Their basic structure is: s(...) --> np(...), vp(...), {...}. np(...) --> det(...), noun(...), {...}. np(...) --> pn(...), {...}. vp(...) --> tv(...), np(...), {...}. vp(...) --> iv(...), {...}. (The gaps will be filled later on)

combine-rules The combine-rules encode the actual semantic construction process. That is, they glue representations together using @: combine(s:(NP@VP),[np:NP,vp:VP]). combine(np:(DET@N),[det:DET,n:N]). combine(np:PN,[pn:PN]). combine(vp:IV,[iv:IV]). combine(vp:(TV@NP),[tv:TV,np:NP]).

Lexicon The lexicon-facts hold the elementary information connected to words: lexicon(noun,bird,[bird]). lexicon(pn,anna,[anna]). lexicon(iv,purr,[purrs]). lexicon(tv,eat,[eats]). lexicon(tv,eat,[eats]). lexicon(tv,eat,[eats]). lexicon(tv,eat,[eats]). • Their slots contain: • syntactic category • constant / relation symbol (“core” semantics) • the surface form of the word.

Interfaces Syntax Semantics Phrases (combinatorial) DCG combine-rules combine-calls Semantic macros lexicon-calls Words (lexical) lexicon-facts

Interfaces in the DCG • Lexical rules are now fully abstract. We have one for each category (iv, tv, n, ...). The DCG uses lexicon-calls and semantic macros like this: iv(IV)--> {lexicon(iv,Sym,Word),ivSem(Sym,IV)}, Word. pn(PN)--> {lexicon(pn,Sym,Word),pnSem(Sym,PN)}, Word. • In the combinatorial rules, using combine-calls like this: vp(VP)--> iv(IV),{combine(vp:VP,[iv:IV])}. s(S)--> np(NP), vp(VP), {combine(s:S,[np:NP,vp:VP])}. Information is transported between the three components of our system by additional calls and variables in the DCG:

Sym = smoke Interfaces: How they work iv(IV)--> {lexicon(iv,Sym,Word),ivSem(Sym,IV)}, Word. When this rule applies, the syntactic analysis component: • looks up the Word found in the string, ... (e.g. “smokes”)  • ... checks that its category is iv, ... lexicon(iv, smoke, [smokes]) lexicon(iv, smoke, [smokes]) lexicon(iv, smoke, [smokes]) • ... and retrieves the relation symbol Sym to be used in the semantic construction. So we have: Word = [smokes] Sym = smoke

IV = lambda(X, smoke(X)) Interfaces: How they work II iv(IV)--> {lexicon(iv,Sym,Word),ivSem(Sym,IV)}, Word. Then, the semantic construction component: • takes Sym ... Sym = smoke • ... and uses the semantic macro ivSem ... ivSem(Sym,IV) ivSem(smoke,IV) ivSem(smoke,lambda(X, smoke(X))) • ... to transfer it into a full semantic representation for an intransitive verb. The DCG-rule is now fully instantiated and looks like this: iv(lambda(X, smoke(X)))--> {lexicon(iv,smoke,[smokes]), ivSem(smoke, lambda(X, smoke(X)))}, [smokes].

What’s inside a semantic macro? Semantic macros simply specify how to make a valid semantic representation out of a naked symbol. The one we’ve just seen in action for the verb “smokes” was: ivSem(Sym,lambda(X,Formula)):- compose(Formula,Sym,[X]). compose builds a first-order formula out of Sym and a new variable X: Formula = smoke(X) This is then embedded into a  - abstraction over the same X: lambda(X, smoke(X)) Another one, without compose: pnSem(Sym,lambda(P,P@Sym)). john  lambda(P,P@john)

Syntax Semantics s(S)--> np(NP), vp(VP) ,{combine(s:S,[np:NP,vp:VP])}. Phrases (combinatorial) np(NP) --> …,pn(PN) vp(VP) --> …,iv(IV) NP = lambda(P,P@john) VP = lambda(X,smoke(X)) pn(PN) --> …,[john] iv(IV) --> …,[smokes] PN = lambda(P,P@john) IV = lambda(X,smoke(X)) pnSem(Sym,PN) Sym = john ivSem(Sym,IV) Sym = smoke Word =[john] Word = [smokes] Words (lexical) lexicon(pn,john,[john]). lexicon(iv,smoke,[smokes]). “John smokes”

A look at combine combine(s:NP@VP,[np:NP,vp:VP]). S = NP@VP NP = lambda(P,P@john) VP = lambda(X,smoke(X)) So: S = lambda(P,P@john)@lambda(X,smoke(X)) That’s almost all, folks… betaConvert(lambda(P,P@john)@lambda(X,smoke(X), Converted) Converted = smoke(john)

Little Cheats A few “special words” are dealt with in a somewhat different manner: • Determiners: ("every man") • No semantic Sym in the lexicon: • lexicon(det,_,[every],uni). • Semantic representation generated by the macro alone: • detSem(uni,lambda(P,lambda(Q, • forall(X,(P@X)>(Q@X))))). • Negation – same thing: ("does not walk") • No semantic Sym in the lexicon: • lexicon(mod,_,[does,not],neg). • Representation solely from macro: • modSem(neg,lambda(P,lambda(X,~(P@X)))).

The code that's online(http://www.coli.uni-sb.de/cl/projects/milca/esslli) • lexicon-facts have fourth argument for any kind of additional information: lexicon(tv,eat,[eats],fin). • iv/tv have additional argument for infinite /fin.: iv(I,IV)--> {lexicon(iv,Sym,Word,I),…}, Word. • limited coordination, hence doubled categories: vp2(VP2)--> vp1(VP1A), coord(C), vp1(VP1B), {combine(vp2:VP2,[vp1:VP1A,coord:C,vp1:VP1B])}. vp1(VP1)--> v2(fin,V2), {combine(vp1:VP1,[v2:V2])}. e.g. fin/inf, gender e.g. "eat" vs. "eats" e.g. "talks and walks"

A demo lambda :- readLine(Sentence), parse(Sentence,Formula), resetVars, vars2atoms(Formula), betaConvert(Formula,Converted), printRepresentations([Converted]).

Evaluation Our new program has become much bigger, but it's… • Modular: everything's in its right place: • Syntax in englishGrammar.pl • Semantics (macros + combine) in lambda.pl • Lexicon in lexicon.pl • Scalable: E.g. extend the lexicon by adding facts to lexicon.pl • Re-usable: E.g change only lambda.pl and keep the rest for changing the semantic construction method (e.g. to CLLS on Thursday)

What we‘ve done today • Complex NPs, PNs and TVs in λ-based semantic construction • A clean semantic construction framework in Prolog • Its instantiation for -based semantic construction

Ambiguity • Some sentences have more than one reading, i.e. more than one semantic representation. • Standard Example: "Every man loves a woman": • Reading 1: the women may be different x(man(x) -> y(woman(y)  love(x,y))) • Reading 2: there is one particular woman y(woman(y)  x(man(x) -> love(x,y))) • What does our system do?

Excursion: lambda, variables and atoms • Question yesterday: Why don't we use Prolog variables for FO-variables? • Advantage (at first sight): -reduction as unification: betaReduce(lambda(X, F)@X,F). Now: X = john, F = walk(X) ("John walks") betaReduce(lambda(X, F)@X,F). betaReduce(lambda(john,walk(john))@john, walk(john)) F = walk(john) Nice, but…

Problem: Coordination "John and Mary" (X. Y.P((X@P)  (Y@P))@ Q.Q(john))@R.R(mary) P((Q.Q(john)@P)  (R.R(mary)@P)) P(P(john)  P(mary)) "John and Mary walk" x.walk(x)@john x.walk(x)@mary P(P(john)  P(mary))@ x.walk(x) lambda(X,walk(X))@john & lambda(X,walk(X))@mary -reduction as unification: X = john X = mary 

Aljoscha Burchardt, Alexander Koller, Stephan Walter, Universität des Saarlandes,