1 / 28

Interlingua Design for Spoken Language Translation

Interlingua Design for Spoken Language Translation. March 28, 2003 Presented by Lori Levin Interlingua Team: Lori Levin, Donna Gates, Dorcas Wallace, Kay Peterson, Alon Lavie, Chad Langley. What is an interlingua?. Representation of meaning or speaker intention.

briner
Download Presentation

Interlingua Design for Spoken Language Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interlingua Design for Spoken Language Translation March 28, 2003 Presented by Lori Levin Interlingua Team: Lori Levin, Donna Gates, Dorcas Wallace, Kay Peterson, Alon Lavie, Chad Langley

  2. What is an interlingua? • Representation of meaning or speaker intention. • Sentences that are equivalent for the translation task have the same interlingua representation. The room costs 100 Euros per night. The room is 100 Euros per night. The price of the room is 100 Euros per night.

  3. Multilingual Translation with an Interlingua Chinese (input sentence) San1 tian1 qian2, wo3 kai1 shi3 jue2 de2 tong4 French Italian Analyzers English German Japanese Catalan give-information+onset+body-state (body-state-spec=pain, time=(interval=3d, relative=before)) Korean Spanish Arabic Interlingua Arabic Spanish Catalan Korean Chinese (paraphrase) wo3 yi3 jin1 tong4 le4 san1 tian1 French Italian Generators Japanese English (output sentence) The pain started three days ago. German

  4. Advantages of Interlingua • Add a new language easily • get all-ways translation to all previous languages by adding one grammar for analysis and one grammar for generation • Mono-lingual development teams. • Paraphrase • Generate a new source language sentence from the interlingua so that the user can confirm the meaning

  5. Challenges for Interlingua • “Meaning” is arbitrarily deep. • What level of detail do you stop at? • If it is too simple, meaning will be lost in translation. • If it is too complex, analysis and generation will be too difficult. • Should be applicable to all languages. • Human development time.

  6. Instructions: • Delete sample document icon and replace with working document icons as follows: • Create document in Word. • Return to PowerPoint. • From Insert Menu, select Object… • Click “Create from File” • Locate File name in “File” box • Make sure “Display as Icon” is checked. • Click OK • Select icon • From Slide Show Menu, Select Action Settings. • Click “Object Action” and select “Edit” • Click OK Design Principles of the Interchange Format • Based on speaker’s intent, not literal meaning • Can you pass the salt is represented only as a request for the hearer to perform an action, not as a question about the hearer’s ability. • Abstract away from the peculiarities of any particular language • Why not go to the meeting? • Kaigi ni itte mittara doo? meeting to going try-if how

  7. Formulaic Utterances • Good night. • tisbaH cala xEr waking up on good • Romanization of Arabic from CallHome Egypt

  8. Same intention, different syntax • rigly bitiwgacny my leg hurts • candy wagac fE rigly I have pain in my leg • rigly bitiClimny my leg hurts • fE wagac fE rigly there is pain in my leg • rigly bitinqaH calya my leg bothers on me Romanization of Arabic from CallHome Egypt.

  9. Domain Actions: Extended, Domain-Specific Speech Acts give-information+existence+body-state It hurts. give-information+onset+body-object The rash started three days ago. request-information+availability+room Are there any rooms available? request-information+personal-data What is your name?

  10. Language Neutrality • Comes from representing speaker intentionrather than literal meaning for formulaic and task-oriented sentences. How about … suggestion Why don’t you… suggestion Could you tell me… request info. I was wondering… request info.

  11. Domain Action Interlingua and Lexical Semantic Interlingua • and how will you be paying for this • Domain Action representation: • a:request-information+payment (method=question) • Lexical Semantic representation: predicate: pay time: future agent: hearer product: distance: proximate, type: demonstrative manner: question

  12. Instructions: • Delete sample document icon and replace with working document icons as follows: • Create document in Word. • Return to PowerPoint. • From Insert Menu, select Object… • Click “Create from File” • Locate File name in “File” box • Make sure “Display as Icon” is checked. • Click OK • Select icon • From Slide Show Menu, Select Action Settings. • Click “Object Action” and select “Edit” • Click OK Components of the Interchange Format speakera: (agent) speech actgive-information concept*+availability+room argument*(room-type=(single & double), time=md12)

  13. Components of IF • 74 speech actsgive-information • domain independent, • 20 are dialog managing • 140 conceptsavailability, accommodation • mostly domain dependent • 450 argumentsroom-type, time • domain dependent and independent • Thousands of values single, double, 12th

  14. Instructions: • Delete sample document icon and replace with working document icons as follows: • Create document in Word. • Return to PowerPoint. • From Insert Menu, select Object… • Click “Create from File” • Locate File name in “File” box • Make sure “Display as Icon” is checked. • Click OK • Select icon • From Slide Show Menu, Select Action Settings. • Click “Object Action” and select “Edit” • Click OK Examples • no that’s not necessary • c:negate • yes I am • c:affirm • my name is alex waibel • c:give-information+personal-data (person-name=(given-name=alex, family-name=waibel)) • and how will you be paying for this • a:request-information+payment (method=question) • I have a mastercard • c:give-information+payment (method=mastercard)

  15. The Interchange Format Database d.u.sdu olang X lang Y Prv Z sdu in language Y on one line d.u.sdu olang X lang Z Prv Z sdu in language Z on one line d.u.sdu IF Prv Z IF on-one-line d.u. sdu comments: your comments d.u. sdu comments: go here 61.2.3 olang I lang I Prv IRST “telefono per prenotare delle stanze per quattro colleghi” 61.2.3 olang I lang E Prv IRST “I’m calling to book some rooms for four colleagues” 61.2.3 IF Prv IRST c:request-action+reservation +features+room (for-whom=(associate, quantity=4)) 61.2.3 comments: dial-oo5-spkB-roca0-02-3

  16. NESPOLE! Database • Over 12,000 tagged sentences, in English, Italian, and German

  17. Tools and Resources IF specifications (available on the web) http://www.is.cs.cmu.edu/nespole/db/index.html IF discussion board http://peace.is.cs.cmu.edu/ISL/get/if.html C-STAR and NESPOLE! Data Bases http://www.is.cs.cmu.edu/nespole/db/index.html IF Checker (web interface) http://tcc.itc.it/projects/xig/xig-on-line.html IF test suite http://tcc.itc.it/projects/xig/xig-ts.html IF emacs mode

  18. Measuring Coverage • No-tag rate: • Can a human expert assign an interlingua representation to each sentence? • C-STAR II no-tag rate: 7.3% • NESPOLE no-tag rate: 2.4% • 300 more sentences were covered in the C-STAR English database • End-to-end translation performance: Measures recognizer, analyzer, and generator performance in combination with interlingua coverage.

  19. Example of failure of reliability Input: 3:00, right? Interlingua: verify (time=3:00) Poor choice of speech act name: does it mean that the speaker is confirming the time or requesting verification from the user? Output: 3:00 is right.

  20. Measuring Reliability: Cross-site evaluations • Compare performance of: • Analyzer  interlingua  generator • Where the analyzer and generator are built at the same site (or by the same person) • Where the analyzer and generator are built at different sites (or by different people who may not know each other) • Comparable end-to-end performance within sites and across sites.

  21. Intercoder agreement: average of percent agreeent pairwise

  22. Comparison of four databases(travel domain, role playing, spontaneous speech) Same data, different interlingua • DB-1: C-STAR II English database tagged with IF-1 • 2278 sentences • DB-2: C-STAR II English database tagged with IF-2 • 2564 sentences • DB-3: NESPOLE English database tagged with IF-2 • 1446 sentences • Only about 50% of the vocabulary overlaps with the C-STAR database. • DB-4: Combined database tagged with IF-2 • 4010 sentences Significantly larger domain

  23. Measuring Scalability: Coverage Rate What percent of the database is covered by the top n most frequent domain actions?

  24. Measuring Scalability: Number of domain actions as a function of database size • Sample size from 100 to 3000 sentences in increments of 25. • Average number of unique domain actions over ten random samples for each sample size. • Each sample includes a random selection of frequent and infrequent domain actions.

  25. Comparison of four databases(travel domain, role playing, spontaneous speech) Same data, different interlingua • English database 1 tagged with interlingua 1: 2278 sentences • English database 1 tagged with interlingua 2: 2564 sentences • English database 2 tagged with interlingua 2: 1446 sentences • Only about 50% of the vocabulary overlaps with the English database 1. • Combined databases tagged with interlingua 2: 4010 sentences Significantly larger domain

  26. Conclusions • An interlingua based on domain actions is suitable for task-oriented dialogue: • Reliable • Good coverage • Scalable without explosion of domain actions • It is possible to evaluate an interlingua for • Realiability • Expressivity • Scalability

  27. How to have success with an interlingua in a multi-site project • Keep it simple. • Periodically check for intercoder agreement. • Good documentation • Discussion board for developers • Know your language typology.

More Related