Czech verbs of communication and the extraction of their frames
This presentation is the property of its rightful owner.
Sponsored Links
1 / 14

Czech Verbs of Communication and the Extraction of their Frames PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on
  • Presentation posted in: General

Czech Verbs of Communication and the Extraction of their Frames. Václava Benešová and Ondřej Bojar. Introduction. 1. VALLEX, Valency Lexicon of Czech Verbs 2. Automatic Identification of Verbs of Communication 3. Frame Suggestion 4. Conclusion.

Download Presentation

Czech Verbs of Communication and the Extraction of their Frames

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Czech verbs of communication and the extraction of their frames

Czech Verbs of Communication and the Extraction of their Frames

Václava Benešová and Ondřej Bojar

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Introduction

Introduction

  • 1. VALLEX, Valency Lexicon of Czech Verbs

  • 2. Automatic Identification of Verbs of Communication

  • 3. Frame Suggestion

  • 4. Conclusion

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Czech verbs of communication and the extraction of their frames

  • 1. Valency lexicon of Czech Verbs, VALLEX 1.x, and its Verb Classes

  • Verb Classes in VALLEX

  • Verbs of Communication

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Vallex

VALLEX

Theoretical background:

Functional Generative Description (FGD)

Valency: “ability of lexical units to bind other lexical units”

Versions: 1.0, internal 1.5, 2.0 (autumn 2006) (almost 4300 entries)

Corpus coverage (Czech National corpus):

● about 10% verbs occurrences with low corpus frequency,not covered(cca 28000 lemmas)

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Verb entry in vallex

Verb Entry in VALLEX

Verb Entry: set of valency frame(s)

  • Valency frame: sequence of slots (functor, morphemic realization, type of complement)

  • Attributes of valency frames: gloss, example, … class

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Verb classes in vallex

Verb Classes in VALLEX

  • Classification:

    • in progress

    • built from below

    • emphasis on syntactic criteria

    • communication, mental action, perception, psych verb, exchange, change, phase verbs, phase of action, modal verbs, motion, transport, location, …

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Communication verbs in vallex

Communication verbs in VALLEX

‘a speaker conveys information to a recipient’

ACT ADDR PAT/EFF

{nom} {gen/dat/acc} {dc,...}

simple information: {říci: say, informovat: inform, …} + THAT: že→ verbs of announcement

question: {ptát se: ask, …} + WHETHER, IF:zda, jestli→ interrogative verbs

commands, bans, warning, …: {nakázat: order, zakázat: prohibit, …} + IN ORDER TO, LET:aby,ať→ imperative verbs

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Czech verbs of communication and the extraction of their frames

  • 2. Automatic Identification of Verbs Communication

  • Evaluation VALLEX vs. FrameNet

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Automatic identification of v erbs communication

Automatic Identification of Verbs Communication

Search corpus for V+N234+subord{aby,zda,že} marks each as a communication verb if enough occurrences are found.

weak points:1.eliminates nominal structures:

‘He said the truth about the killer.’

‘He gave her many presents.’ (verb of

exchange)

2.ignoresexamples where acomplement

was not expressed on the surface layer:

‘He said that …’

3. homonymy of conjunctions:

že (that) and aby (in order to)

‘He has done it in order to make money…’

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Evaluation against vallex and framenet

Evaluation against VALLEX and FrameNet

  • golden standards: VALLEX 1.0, VALLEX 1.5, FrameNet 1.2

  • ROC curves

    TP … true positives (communication verbs according to a golden standard and above the threshold)

    FP … false positives (non communication verbs and above the given threshold)

    TPR = TP / P (P the total number of communication verbs)… true positive rate

    TNR= TN / N (N the total number of verbs with no sense of communication)

    40 – 50 % communication verbs identified correctly (for both VALLEX and FrameNet)

    20% falsely marked

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Czech verbs of communication and the extraction of their frames

  • 3. Frame Suggestion

  • Frame Edit Distance and Verb

    Entry Similarity

  • Experimental Results

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Frame edit distance and verb entry similarity

Frame Edit Distance and Verb Entry Similarity

  • FED(number of edit operations: insert, delete, replace necessary to convert a hypothesized frame to a correct frame)

  • ES (entry similarity or expected saving)

    min FED(G,H)

    ES=1-

    FED(G,Ø)+FED(H,Ø)

    G…golden verb entries of this base lemma

    H…hypothesized entries

    Ø…blank verb entry

    ES 0% (suggesting nothing), ES 100% (golden frames)

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Experimental results with es

Experimental Results with ES

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Conclusion

Conclusion

  • Automatic identification of communication verbs according to the proposed pattern V+N234+subord{aby,zda,že} performs satisfactorily (40-50% true positives against VALLEX and FrameNet, 20% false positives)

  • FED reveals that more lexicographic labour could be saved by suggesting more than one frame per verb -> need to focus on other classes, too

Institute of Formal and Applied Linguistics, {benesova,[email protected]


  • Login