English Parsers: Some Information Based Observations

English Parsers: Some Information Based Observations

English Parsers: Some Information Based Observations

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

1. English Parsers: Some Information Based Observations Amba Kulkarni Department of Sanskrit Studies, University of Hyderabad, Hyderabad, India apksh@uohyd.ernet.in Akshar Bharati Sriram Chaudhury Language Technologies Research Center, I.I.I.T., Hyderabad, India sriram_c@research.iiit.ac.in

2. Statistical Charniak Lexparser Collins Re-ranking Enju (prob HPSG) English Parsers Rule Based • Link (Dependency) • Minipar (minimalism) • LKB (HPSG) • XTAG (TAG)

3. PARSER OUTPUTS Sankar saw two men putting a heavy bag into the jeep. Enju Parser Stanford-Parser nsubj(saw-2, Shankar-1) num(men-4, two-3) dobj(saw-2, men-4) partmod(men-4, putting-5) det(bag-8, a-6) amod(bag-8, heavy-7) dobj(putting-5, bag-8) det(jeep-11, the-10) prep_into(putting-5, jeep-11) ROOT (S (NP (NNP Shankar)) (VP (VBD saw) (NP (NP (CD two) (NNS men)) (VP (VBG putting) (NP (DT a) (JJ heavy) (NN bag)) (PP (IN into) (NP (DT the) (NN jeep)))))) (. .))) (S (NP Shankar) (VP saw (NP (NP two men) (VP putting (NP a heavy bag) (PP into (NP the jeep))))) .) Link-Parser [(LEFT-WALL)(Shankar)(saw.v)(two)(men.n)(putting.v)(a)(heavy.a)(bag.n)(into) (the)(jeep.n)(.)] [[0 12 4 (Xp)][0 1 0 (Wd)][1 2 0 (Ss)][2 4 1 (Op)][3 4 0 (Dmc)][4 5 0 (Mg)][5 9 3 (Pp)] [5 8 2 (Os)][6 8 1 (Ds)][7 8 0 (A)][9 11 1 (Js)][10 11 0 (Ds)]] [0]

4. Problems • Special training is required to read and understand • the parser output • None of the parsers have satisfactory performance • o Correctness of the parse: 30-40% • o Correctness of the relations: 80-90% • Difficult to compare the performance mechanically • Need to have a common representation

5. Current Trend • Dependency Output preferred over the constituency • Evaluation point of view • Suitability for a wide range of NLP tasks

6. Current Trend contd ... • However, No consensus concerning • Names of the relations • Number of relations • Parser Relations used • Lexparser 47 • Minipar 59 • Link parser 106

7. Paninian Grammar The First Dependency Formalism A Dependency Relation: An asymmetric binary relation mapping a modifier to the modified. A word can modify ONLY ONE word. But it can have MORE THAN ONE modifiers.

8. Parse Tree: Modifier-Modified Tree Shankar returned home on his bicycle after a football match.

9. We resort to Paninian Grammar for arriving at Standards to represent a Parse-tree

10. ISSUES INVOLVED * How to treat auxiliaries and prepositions? o As content words? o As function words? ==> Serious effect on Number of Content Words Number of relations in a sentence.

11. ISSUES INVOLVED contd ... # How to handle IMPLICITLY encoded relations? * Modifiers modifying more than one words o I sawthe man you love. o Ramwent home and slept. * Information encoded in the lexical item o Ram persuaded Mohanto study well. o Ram promised Mohan to study well.

12. ISSUES INVOLVED contd ... #What should be the level of analysis? * Syntactic roles? (subj, obj,..) * Semantic / theta roles? (agent, theme, ...)

13. ISSUES INVOLVED contd ... #How to analyse sentences with copula verbs? * Ram is good. * Ram is a doctor.

14. ISSUES INVOLVED contd ... #Should the heads be decided syntactically or semantically? * cup of tea * growth of an industry

15. Paninian Grammar • Where are the relations coded? • How are the relations coded? • How much content did the relation code?

16. Paninian Grammar • Where are the relations coded? • Position • No accusative marker • --> subject position sacrosanct • Preposition

17. How are the relations coded? Explicitly: Ram dropped the melon. Implicitly: Ram dropped the melon and burst. Who/what burst?

18. How much information is coded? Ram opened the lock with this key. (Agent) This key opened the lock.(Instrument) The lock opened.(Goal) Thematic Role KAARAKA ROLE: KARTAA (Syntactico-semantic relation)

19. Pada: one which ends with nominal / verbal suffix. Pada = nominal root + nominal suffix = verbal root + verbal suffix Examples: boys = boy + s+ subj went= go + ed are going = go + are_ing to him = he + to

20. ISSUES INVOLVED * How to treat auxiliaries and prepositions? o As content words? o As function words?

21. Positions and prepositions mark the relations between different 'padas' ===> Prepositions mark the relations and hence are not content words. Auxiliaries are part of a 'pada'. ===> Auxiliaries are part of suffix and hence are not content words.

22. ISSUES INVOLVED contd ... # How to handle IMPLICITLY encoded relations? * Modifiers modifying more than one words o I sawthe man you love. o Ramwent home and slept.

23. * Modifiers modifying more than one words o I sawthe man you love. o Ramwent home and slept. IMPLICIT relations: man: obj(love) Ram: subj(slept) Language Convention ===> Should be made EXPLICIT in the parsed structure.

24. ISSUES INVOLVED contd ... # How to handle IMPLICITLY encoded relations? * Information encoded in the lexical item o Ram persuaded Mohanto study well. o Ram promised Mohan to study well. Make IMPLICIT information EXPLICIT.

25. ISSUES INVOLVED contd ... # What should be the level of analysis? *Syntactic roles? (subj, obj,..) * Semantic / theta roles? (agent, theme, ...)

26. What should be the level of analysis? The maximum semantics one can extract is the SYNTACTICO-SEMANTIC (kaaraka) relations and NOT the thematic roles.

27. ISSUES INVOLVED contd ... #How to analyse sentences with copula verbs? * Ram is good. * Ram is a doctor.

28. #How to analyse sentences with copula verbs? * Ram is good. * Ram is a doctor. Phrase Structures are DIFFERENT But Semantic Content is SAME Hence treat them ALIKE.

29. ISSUES INVOLVED contd ... #Should the heads be decided syntactically or semantically? * cup of tea * growth of an industry

30. #Should the heads be decided syntactically or semantically? * cup of tea Syntactic Head Semantic Head *growth of an industry