1 / 1

METU Turkish Discourse Bank Browser

METU Turkish Discourse Bank Browser. Utku Şirin 1 , Ruket Çakıcı 1 , Deniz Zeyrek 2 Computer Engineering Department 1 , Informatics Institute 2 Middle East Technical University, Ankara, Turkey 1,2 utkusirin@gmail.com, ruken@ceng.metu.edu.tr, dezeyrek@metu.edu.tr. Introduction

rollo
Download Presentation

METU Turkish Discourse Bank Browser

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. METU TurkishDiscourse Bank Browser Utku Şirin1,Ruket Çakıcı1, Deniz Zeyrek2 Computer Engineering Department1, Informatics Institute2 Middle East Technical University, Ankara, Turkey1,2 utkusirin@gmail.com, ruken@ceng.metu.edu.tr, dezeyrek@metu.edu.tr Introduction TheMiddle East Technical University (METU) TurkishDiscourse Bank (TDB) projectextendsthe METU TurkishCourpus (MTC) from a sentence-levelresourceto a dşscourse-levellanguageresource (Zeyrek et al.,2008). The TDB aimstocapturediscourserelationstotheextentthattheyareinstantiatedbyexplicitdiscourseconnectives. Theannotationsarecretedbythe DATT (Aktaş et al., 2010) whichgenerates a layer of annotation data in XML format bymeans of characterindexes. The METU TDB browser usestheseannotationfilesandtheindexescreatedbythe DATT toserve as a clearinterfacefortheannotations in the TDB toeffectivelyidentifyandexploitvariousaspects of TurkishDiscourse. Search Quicksearchfiltersthetext file listbyonlytheconnectiveand thegenresuchthatthereare onlytextfileswhichinclude at leastonerelationwhose connective is thespecified connectiveandtextfileswhose genre is thespecifiedgenre. General searchis performed withing a selectedtext file. After Specifyingthestring, user can seeall of thematching (sub)strings in thetext file. Relationfilterfilterstherelations thatarelisted in therelationlist. Afterspeficying a prefixstring, user can seetheconnectives whoseprefixesmatchesexactly tothegivenprefixstring. Advanced searchfacilityprovidewiderrange of searchoptions. One can perform a stringsearch in any element of a relationeitherby a regular expressionorbasictextsearch. The discontinuity of any element of a relationandthe adjacencyinformationforarguments can be retrievedthroughadvancedsearch. Beside, genre, author, publisherandpublishingdate of texts can also be specified. Longadvanced searchqueriesmayalso be savedpermanently, ortemporarily. Advanced searchquery resultsareshown in a separatewindow, which providestoperform multipleadvancedsearch queriesandusethe resultsconcurrently. Discontinuityconcernseither a connectiveor an argument. A discontinousconnectivewould be either … oranditsequivalentya … ya da in Turkish. Discontinuity of andargumentmeansthatthere is interveningmaterial inside an argument. Adjacency, on theotherhand, is a relationshipbetweenthe ARG1 of a connectiveandits ARG2. WhenARG1 and ARG2 spansareconsecutivewithonlypunctuationmarks, thecorrespondingconncetive, itsmodifier, orsharedargumentsintervening, then ARG1 is defined as adjacentto ARG2. In (1) ARG2 is a discontinousargumentand ARG1 is nonadjacentto ARG2 duetothephrase ‘as weexplainedearlier’. (1) Bakan açıklama yapmadı. Daha önce dediğimiz gibi, bu durum aslındabeklenmedik değil. Theminister has not made an explanation. Indeed, as weexplainedearlier, this is not somethingwe do not expect. Figure 3: Quick Search Technical Characteristics The METU TDB browser is written in Java SE 6 with NetBeans6.9. Therearethreedifferentversions of the METU TDB browser forthreedifferentplatforms, Mac, Ubuntu, and Windows. The browser is licencedby LGPL. Itssourcecode is publiclyavailable at https://sourceforge.net/projects/tdbbrowser. Figure 7 showsthe software architecture of the METU TDB borwser. The browser is initializedbythree file paths, theannotationdirectory, thetextfilesdirectoryandthetag file. Aftertheinitializationthe browser stepsinto Main Windowcomponentandthroughthiscomponent, user can usethe browser forbrowsing, searchingetc… User Evaluations One of theconnectives, i.e. için ‘because/to’ has differentsensesdepending on thesuffixits ARG2 takes. Browser providestoidentifysuchdifferences. Secondly, discontinousandnonadjacenarguments of a specificconnective can be identified, which is particularlyimportant in understandingwhatconnectives do. Zeyrek et al. (in print) findthatayrıca ‘besides’ has morenonadjacent ARG1s thanoysa ‘whereas’ andfakat ‘but’, whichmay be usedtoprovethehypothesisthatayrıca s a discourseadverbial (i.e. a connectivewhosemeaning is not necessarilyderivedbytheadjacency of itsarguments). TurkishDiscourseStructure Connectivesestablishdiscourserelationsmainlybytakingtwoarguments, i.e., textspansthattheconnectiverelates. The argument-connective-argument structure is the basis of the TDB formalism enhanced with some other elements of Turkish discourse. In the TDB browser, we abbreviate the first argument of a discourse connective as ARG1 and the second one as ARG2. In all the examples in the paper, ARG1 is rendered in italics, ARG2 in bold. The connective is underlined. Discourse connectives are shown as CONN. In addition to these basic categories, text spans that supplement ARG1 and ARG2 are also annotated, respectively called SUPP1 and SUPP 2. Finally, modifiers of the connectives, abbreviated as MOD, and grammatical elements that are shared by two arguments, abbreviated as SHARED. Figure 4: Relation Filter Figure 7: Software Architecture of the Browser METU TDB Browser Therearetwobasicfeatures of METU TDB browser, browsingandmoreimporantantly, searching. The browser can alsoexplorethestructuralaspects of theargumentsandconnectivessuch as discountiuityandadjacency. Browsing Therearethree main parts in browsingwindowtext file listat theleft, theselectedtext file at themiddle, andtherelationlist in theselectedtext file at theright. Selectedannotationsarehighlighted in themiddlewindowwithrespecttothecolors in Figure 2. Figure 5: Advanced Search Conclusion In this paper, we have introduced the METU TDB browser. As future work, we will add some more parameters to the quick search option and use METU TDB browser for some statistical analyses over annotated MTC corpus. REFERENCES Berfin Aktaş, Cem Bozşahin, and Deniz Zeyrek. 2010. Discourse Relation Configurations in turkish and an Annotation Environment. In Proc. of the 4th Lingusitc Annotation Workshop, ACL 2010, pages 202-206. Deniz Zeyrek and Bonnie Weber. 2008. A Discourse Resource for Turkish: Annotating Discourse Connectives in the METU Corpus. In Proc. Of the 6th Workshop on Asian Language Resources, The 3rd IJCNLP. Deniz Zeyrek, Ümit Deniz Turan, Işın Demirşahin, and Ruket Çakıcı. (in print). Differential Properties of Three Discourse Connective in Turkish: A Corpus-based Analysis of Fakat, Yoksa, Ayrıca. In Antın Benzi Peter Khlein, Manfred Stede (Eds.) Constraints in Discourse III. Figure 6: Advanced Search Resışts Figure 1: Browsing Figure 2: Colors

More Related