1 / 33

by Aylin Koca December 7, 2004

Semantic Argument Classification and Semantic Categorization of Turkish Existential Sentences Using Support Vector Learning. by Aylin Koca December 7, 2004. OUTLINE. Introduction On Turkish Existential Sentences Categories Due to Senses of var and yok Corpus and Semantic Annotation

odeda
Download Presentation

by Aylin Koca December 7, 2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Argument Classification and Semantic Categorization of Turkish Existential Sentences Using Support Vector Learning by Aylin Koca December 7, 2004

  2. OUTLINE • Introduction • On Turkish Existential Sentences • Categories Due to Senses of var and yok • Corpus and Semantic Annotation • Abstract Thematic Roles • Methodology • Shallow Semantic Parsing • Classifier: SVM • Features • Sentence Categorization • Experimentation & Results • Without Semantic Information • With Semantic Information • Concluding Remarks

  3. INTRODUCTION • Three types of sentences • Verbal sentences (e.g. She read the book.) • Copulative sentences (e.g. The book is on the table.) • Existential sentences (e.g. There is a book on the table.) • Overview of system • Shallow semantic parsing for defining the predicate-argument relationships in a Turkish existential sentence on a word-by-word basis via support vector learning • Accurately categorizing these sentences accordingly • Improving the system • Incorporating semantic information

  4. TURKISH EXISTENTIAL SENTENCES • A somewhat “overlooked” sentence type • Controversial status : meaning & category • Minimally characterized by two particles • Var: ‘there is/are’ • Yok: ‘there is/are no’ • In [Sezer, 2003]*: • “Sence aşk yok mu?” (As far as you are concerned, doeslove not exist?) • “İçimde bir şüphe var.” (There is a doubt in me.) • “Bizim bir şikâyetimiz yok.”(We don’thave a complaint.) • “İçeride Müdür Bey var.” (Mr. Director is inside.) • “Biz o toplantıda vardık.”(We were present at that meeting.) * On Syntactic and Semantic Properties of Turkish Existential Sentences. Harvard University.

  5. Bare Existentials • Overt subject • Sence aşk yok mu? according to you love NE Q “As far as you are concerned, does love not exist?” • Bugün su var. today water E “Today there is water.” • Hiç şüphe yok. no doubt NE “There is no doubt.”

  6. Case Existentials • Case information (i.e. locative, ablative, dative, instrumental) • Şehir-de güzel evler var. city-LOC nice houses E “There are nice houses in the city.” • Anne-m-den haber yok. mother-P1SG-ABL news NE “There is no news from my mother.” • Siz-e bir mektup var. you-DAT a letter E “There is a letter to you.” • Göz-üm-le ilgili bir derd-im yok-tu. eye-P1SG-INS about a problem-P1SG NE-APAST “I did not have a problem with my eye.”

  7. Existential Possession • Due to the lack of a verb meaning ‘to have’ in Turkish • Ad-ınız yok mu? name-P2PL NE Q “Don’t you have a name?” • Kim-in silgi-si var? who-GEN eraser-P3SG E “Who has an eraser?”

  8. Other Categories… • Initial definite subjects in existential sentences assign <participant> role on their subject and <scene> role on their locative NP: • O bu komite-de var. s/he this committee-LOC E “S/he is on this committee.” • Picture existentials • Ayşe bu dosya-da yok. Ayşe this file-LOC NE “Ayşe is not in this file.” • Compound tense existentials • Kimse-ye kız-dığ-ım yok. anyone-LOC angry-PASTPART-P1SG NE “I am not angry at anyone.”

  9. CORPUS • March 2004 release of the Turkish Treebank* [Oflazer et al., 2003] • 7262 sentences • var: 187 occurrences • yok: 105 occurrences • 232 of 292 sentences taken as existential sentences for manual semantic annotation *METU-Sabancı Turkish Treebank (www.ii.metu.edu.tr/~corpus/treebank.html)

  10. Semantic Annotation • Manual annotation of semantic arguments of existential sentences • [POSSESSOROnun] [LOCATIONbu evde] [POSSESSEDyeri] [predicateyok] [NULLartık]. • Further annotation on a word-by-word basis using IOB representation: • [B-POSSESSOROnun] [B-LOCATIONbu][I-LOCATIONevde] [B-POSSESSEDyeri][predicateyok] [Oartık]. • IOB2* representation: • I means word is inside a chunk • O means word is outside a chunk • B means word is the beginning of a chunk * E. Sang and J. Veenstra. Representing Text Chunks. Proc. of EACL, 1999.

  11. Abstract Thematic Roles • Type of semantic knowledge required: • <THEME> = overt subject† of predicate var/yok • <LOCATION> = place in which subject is situated • <SOURCE> = entity from which subject originates • <GOAL> = entity towards which subject heads • <RELATION> = entity with which subject shares • <POSSESSOR> = referent of subject that possesses • <POSSESSED> = entity that is possessed † Overt subject should not be marked with possession information.

  12. “Refined” Corpus • Added semantic SEM tags • Eliminated LEM and MORPH tags

  13. SHALLOW SEMANTIC PARSING • Process of assigning a simple structure to sentences in text: • WHO did WHAT to WHOM, WHEN, WHERE, WHY, HOW, etc. • Technically: • Group sequences of words together (identification) • Assign labels to these semantic arguments (classification)

  14. Classifier • Chunking and labeling as a classification-based learning task • Support Vector Machines (SVMs) • Capable of handling a large number of features with strong generalization properties [Joachims, 1998†; Kudoh and Matsumoto, 2000‡] • Binary classifiers • But, semantic parsing is a multi-class classification problem † Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Proc. of ECML, 1999. ‡ Use of Support Vector Learning for Chunk Identification. Proc. ofCONLL-2000 and LLL-2000, 2000.

  15. Classifier (cont’d) • “One class vs. all others” (OVA) approach • For K classes, build K classifiers that separate one class from among all others • “Pairwise” (OVO) approach • For K classes, build K(K-1)/2 classifiers, considering all pairs of classes • Tradeoff: • Number of classifiers to be trained • Amount of data used in training each classifier

  16. Features • Used in assigning semantic roles • Represent various aspects of: • Syntactic structure of sentence • Lexical information • Features: POS category of the word, the POS category of the word that this word has a relation with in the sentence, the name of this relation, and whether this word appears before or after the predicate [, predicted semantic labels of previous words within context].

  17. context Current prediction Features and Context 5-wordcontext and features used to classify a word

  18. SENTENCE CATEGORIZATION • Thematic hierarchy <LOCATION> <SOURCE> <GOAL> <RELATION> <POSSESSOR> <POSSESSED> > > <THEME> > > POSSESSION EXISTENTIALS CASE EXISTENTIALS BARE EXISTENTIALS

  19. SENTENCE CATEGORIZATION (cont’d) • Performance evaluation based on: • Precision • Recall • Overall accuracy Fβ, where β = 1

  20. EXPERIMENTATION & RESULTS • LIBSVM* software • Standard package for the OVO approach • One of its multi-class classification tools for the OVA approach * http://www.csie.ntu.edu.tw/~cjlin/libsvm/

  21. Experiments • Cross Validation • In v-fold cross-validation, the train set is divided into v subsets of equal size. Sequentially one subset is tested using the classifier trained on the remaining v-1 subsets. Thus, each instance of the whole train set is predicted once, and the cross-validation accuracy is the percentage of data that are correctly classified. • Classification (9 vs. 1 split) • Sentence Categorization

  22. Without Semantic Information • Cross Validation

  23. Without Semantic Information • Classification MSE = Mean Squared Error SCC = Squared Correlation Coefficient

  24. Without Semantic Information

  25. With Semantic Information • Cross Validation

  26. With Semantic Information • Classification

  27. With Semantic Information

  28. CONCLUDING REMARKS • A novel way of utilizing Turkish Treebank to do domain-independent shallow semantic parsing of Turkish ES by recognizing their predicate-argument structures • Automatic categorization • Thematic role hierarchy • Semantic annotation and refining of Turkish ESs of the Treebank

  29. CONCLUDING REMARKS (cont’d) • Evaluation of results: • Incorporation of semantic information to the input files of the SVM • promise for applications in various natural language tasks in Turkish • Results of the task of ES categorization did not seem to get affected in any way by the incorporation of semantic information • Word-level vs. sentence-level feature

  30. CONCLUDING REMARKS (cont’d) • Future work: • More consistently and accurately annotated corpus • Enhancing size of the data • Research scope: • Issues of various strands of linguistics and computer science such as natural language processing, and machine learning

  31. CONCLUDING REMARKS (cont’d) • Big picture: • Results can play a major role in tasks like Information Extraction, Question Answering, and Summarization • Also an intermediate step in machine translation • Can always be extended to cover phonology and speech processing, if we decide to base this system on speech rather than text, hence better serving for the field of Artificial Intelligence

  32. Selected Bibliography • Joachims T. 1998. “Text Categorization with Support Vector Machines: Learning with Many Relevant Features”. In Proceedings of ECML. • Kudoh T., Matsumoto Y. 2000. “Use of Support Vector Learning for Chunk Identification”. In Proceedings of the 4th Conference on CONLL-2000 and LLL-2000, pp. 142-144. • Oflazer K., Say B., Hakkani-Tür D. Z., Tür G. 2003. “Building a Turkish Treebank” In A. Abeille (ed.) Building and Exploiting Syntactically Annotated Corpora, pp. 1-18. Kluwer Academic Publishers. • Pradhan, S., Ward, W., Hacioglu, K., Martin, J., Jurafsky, D. 2004. “Support Vector Learning for Semantic Argument Classification” to appear in Journal of Machine Learning, Center for Spoken Language Research, Boulder, CO. • Sang, E., Veenstra, J. 1999. “Representing Text Chunks” In Proceedings of EACL, pp. 173-179, Bergen, Norway. • Sezer, E. 2003. “On Syntactic and Semantic Properties of Turkish Existential Sentences” Harvard University.

  33. QUESTIONS & COMMENTS

More Related