Czech verbs of communication and the extraction of their frames
Download
1 / 14

Czech Verbs of Communication and the Extraction of their Frames - PowerPoint PPT Presentation


  • 137 Views
  • Uploaded on

Czech Verbs of Communication and the Extraction of their Frames. Václava Benešová and Ondřej Bojar. Introduction. 1. VALLEX, Valency Lexicon of Czech Verbs 2. Automatic Identification of Verbs of Communication 3. Frame Suggestion 4. Conclusion.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Czech Verbs of Communication and the Extraction of their Frames ' - fairly


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Czech verbs of communication and the extraction of their frames

Czech Verbs of Communication and the Extraction of their Frames

Václava Benešová and Ondřej Bojar

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Introduction
Introduction Frames

  • 1. VALLEX, Valency Lexicon of Czech Verbs

  • 2. Automatic Identification of Verbs of Communication

  • 3. Frame Suggestion

  • 4. Conclusion

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Institute of Formal and Applied Linguistics, {benesova,[email protected]


Vallex
VALLEX Frames

Theoretical background:

Functional Generative Description (FGD)

Valency: “ability of lexical units to bind other lexical units”

Versions: 1.0, internal 1.5, 2.0 (autumn 2006) (almost 4300 entries)

Corpus coverage (Czech National corpus):

● about 10% verbs occurrences with low corpus frequency,not covered(cca 28000 lemmas)

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Verb entry in vallex
Verb Entry in VALLEX Frames

Verb Entry: set of valency frame(s)

  • Valency frame: sequence of slots (functor, morphemic realization, type of complement)

  • Attributes of valency frames: gloss, example, … class

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Verb classes in vallex
Verb Classes in VALLEX Frames

  • Classification:

    • in progress

    • built from below

    • emphasis on syntactic criteria

    • communication, mental action, perception, psych verb, exchange, change, phase verbs, phase of action, modal verbs, motion, transport, location, …

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Communication verbs in vallex
Communication verbs in VALLEX Frames

‘a speaker conveys information to a recipient’

ACT ADDR PAT/EFF

{nom} {gen/dat/acc} {dc,...}

simple information: {říci: say, informovat: inform, …} + THAT: že→ verbs of announcement

question: {ptát se: ask, …} + WHETHER, IF:zda, jestli→ interrogative verbs

commands, bans, warning, …: {nakázat: order, zakázat: prohibit, …} + IN ORDER TO, LET:aby,ať→ imperative verbs

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Institute of Formal and Applied Linguistics, {benesova,[email protected]


Automatic identification of v erbs communication
Automatic Identification of Frames Verbs Communication

Search corpus for V+N234+subord{aby,zda,že} marks each as a communication verb if enough occurrences are found.

weak points:1.eliminates nominal structures:

‘He said the truth about the killer.’

‘He gave her many presents.’ (verb of

exchange)

2.ignoresexamples where acomplement

was not expressed on the surface layer:

‘He said that …’

3. homonymy of conjunctions:

že (that) and aby (in order to)

‘He has done it in order to make money…’

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Evaluation against vallex and framenet
Evaluation against VALLEX and FrameNet Frames

  • golden standards: VALLEX 1.0, VALLEX 1.5, FrameNet 1.2

  • ROC curves

    TP … true positives (communication verbs according to a golden standard and above the threshold)

    FP … false positives (non communication verbs and above the given threshold)

    TPR = TP / P (P the total number of communication verbs)… true positive rate

    TNR= TN / N (N the total number of verbs with no sense of communication)

    40 – 50 % communication verbs identified correctly (for both VALLEX and FrameNet)

    20% falsely marked

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Institute of Formal and Applied Linguistics, {benesova,[email protected]


Frame edit distance and verb entry similarity
Frame Edit Distance and Verb Entry Similarity Frames

  • FED(number of edit operations: insert, delete, replace necessary to convert a hypothesized frame to a correct frame)

  • ES (entry similarity or expected saving)

    min FED(G,H)

    ES=1-

    FED(G,Ø)+FED(H,Ø)

    G…golden verb entries of this base lemma

    H…hypothesized entries

    Ø…blank verb entry

    ES 0% (suggesting nothing), ES 100% (golden frames)

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Experimental results with es
Experimental Results with ES Frames

Institute of Formal and Applied Linguistics, {benesova,[email protected]


Conclusion
Conclusion Frames

  • Automatic identification of communication verbs according to the proposed pattern V+N234+subord{aby,zda,že} performs satisfactorily (40-50% true positives against VALLEX and FrameNet, 20% false positives)

  • FED reveals that more lexicographic labour could be saved by suggesting more than one frame per verb -> need to focus on other classes, too

Institute of Formal and Applied Linguistics, {benesova,[email protected]


ad