1 / 25

Cross-lingual Information Extraction System Evaluation

Cross-lingual Information Extraction System Evaluation. Kiyoshi Sudo Satoshi Sekine Ralph Grishman. New York University. Outline. Introduction Cross-lingual IE system Translation-based QDIE system Cross-lingual QDIE system Experiment Discussion Conclusion. Information Extraction.

ivy
Download Presentation

Cross-lingual Information Extraction System Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cross-lingual Information Extraction System Evaluation Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University NYCNLP (COLING 2004)

  2. Outline • Introduction • Cross-lingual IE system • Translation-based QDIE system • Cross-lingual QDIE system • Experiment • Discussion • Conclusion NYCNLP (COLING 2004)

  3. Information Extraction • Identifying entities from source text and mapping from source text to pre-defined table. “A smiling Palestinian suicide bomber triggered a massive explosion in the heavily policed heart of downtown Jerusalem today, …” today Date: (Terrorism Activity) downtown Jerusalem Location: A … suicide bomber Perpetrator: NYCNLP (COLING 2004)

  4. Local Context • Local contexts provides a useful information to identify entities. “A smiling Palestinian suicide bomber triggered a massive explosion in the heavily policed heart of downtown Jerusalem today, …” today Date: downtown Jerusalem Location: A … suicide bomber Perpetrator: NYCNLP (COLING 2004)

  5. Extraction Patterns • Extraction patterns have been widely used as an effective means to extract entities. • Pre-defined template (Riloff 1993): (kidnapped in <x>) • Predicate-Argument (Yangarber et al. 2000): (<org>, appoint, <person>) • Dependency Tree (Sudo et al. 2003): (trigger (OBJ: explosion) (ADV: <date>))) • Because of the cost in portability of IE system, automatic pattern discovery technique has become important. • application of bootstrapping method (Riloff and Jones 1999, Yangarber et al. 2000) NYCNLP (COLING 2004)

  6. Pattern Discovery (Sudo et al. 2003) (3) Use pattern matching keyword narrative IR query ….. (2) Score pattern candidates based on TF/IDF Any subtree that contains at least one NE instance (1) Get relevant documents Source document Preprocess source documents (NE-tagging, Dependency parsing) QDIE = query-driven information extraction NYCNLP (COLING 2004)

  7. Cross-lingual IE • Assume we have • Machine Translation System • Basic linguistic tools for source and target language • Morphological analyzer, parser, NE-tagger, IR system E-QDIE query English MT system Japanese J-QDIE Source document NYCNLP (COLING 2004)

  8. Outline • Introduction • Cross-lingual IE system • Translation-based QDIE system • Cross-lingual QDIE system • Experiment • Discussion • Conclusion NYCNLP (COLING 2004)

  9. Translation-based QDIE system (2) Use English QDIE system query Source document …... English Japanese Source document (1) Translate the source documents NYCNLP (COLING 2004)

  10. Cross-lingual QDIE system (3) Translate the extracted table (1) Translate the user’s query query English Japanese query Source document …... (2) Use Japanese QDIE system NYCNLP (COLING 2004)

  11. Comparison of two systems • Translation-based QDIE • No source-language-specific tools are necessary except MT system. • Tools for E-QDIE system were customized into English (not output of MT system) • Cross-lingual QDIE • MT for short sentences or phrases (for query and extracted entities) • Tools for J-QDIE system were customized into Japanese. NYCNLP (COLING 2004)

  12. Experiment • Management Succession Extraction Task (simple version of MUC-6 task) • Identify the entities involved in a succession event. • Person, Post, Organization • Test document • 100 articles (61 relevant, 39 irrelevant) accumulated from Yomiuri Newspaper 1999 (Japanese) • Person(173/651), Post(210/626), Organization(111/709) • Source document and tools • 130,000 articles from Yomiuri Newspaper 1998 (Japanese) • MT system: “King of Translation” (IBM) • NE tagger: (Sekine and Nobata 2004). • Extraction performance is measured by recall/precision of extracted entities. NYCNLP (COLING 2004)

  13. Cross-lingual QDIE does better • Maximum recall: • crosslingual system: 60% • translation-based system: 41% NYCNLP (COLING 2004)

  14. Translation QDIE suffers fromNE recognition errors • NE tagger was customized for English (WSJ) • many of the Japanese NEs do not occur in WSJ. • [ Kansai Economic Federation ] ORG → [ Kansai ] LOC[ Economic Federation ] ORG • Translation errors • result in fewer and noisier pattern candidates Translation / Cross-lingual • Person: 4543 / 12096 • Post: 3924 / 14986 • Organization: 4014 / 11812 NYCNLP (COLING 2004)

  15. NE tagging by Cross-language Projection (inspired by Riloff et al. 2002) • used Giza++ (Och et al. 2003) to make word alignments between original Japanese sentences and MT-ed English sentences. • doubled the number of pattern candidates. (= Yoshikuni Mizuno, professor at Juntendo Univ.) 順天堂 大 の 水野 美邦 教授 Japanese: 大 = abbreviation of 大学(=Univ.) Frequently mistranslated as “Large” MT output: Professor Mizuno 美邦 of 順天堂 large

  16. Still Cross-lingual QDIE does better • Maximum recall: • crosslingual system: 60% • translation-based system with NE projection 52% • translation-based system: 41% NYCNLP (COLING 2004)

  17. Problems in Translation • Incorrect dependency structure caused by MT translation errors. NYCNLP (COLING 2004)

  18. Correct Translation: On the sixth, since the financial reports for the fiscal year that ended in February, 1999 will end in a deficit, "Okajima" (Marunouchi, Kofu- city), the leading department store in the prefecture, announced that six of the thirteen full-time directors, including President Hiroyuki Okajima (40), two executive directors and a managing director, submitted the resignation letter and will formally resign at the general meeting of shareholders of the company. NYCNLP (COLING 2004)

  19. MT Output: From Muika the term settlement of accounts ended February , 99 having become the prospect of the first deficit settlement of accounts after the war etc. , six of President Hiroyuki Okajima ( 40 ) , two managing directors , one managing directors , the full-time directors that are 13 persons submitted the resignation report , “Okajima” of Marunouchi , Kofu-shi who is the major departmentstore within the prefecture announced that he resigns formally by the fixed general meeting of shareholders of the company planned at the end of this month . NYCNLP (COLING 2004)

  20. Problems in Translation • Structural difference • multiple translations of a single source language expression make pattern discovery more difficult on MT output be appointed to <post> <post>に就任する。 assume <post> be inaugurated as <post> (translation error) NYCNLP (COLING 2004)

  21. Related Work • Riloff et al. 2002 • showed how CLIE systems can be developed with IE learning tools, bitext alignment and an MT system. • conducted experiments on relatively close language pair: English and French • “achieved roughly the same level of performance as the source-language IE system” • We expect that the perforamnce gap between translation-based IE and Cross-lingual IE is more pronounced with a more divergent language pair like Japanese and English. NYCNLP (COLING 2004)

  22. Conclusion • We discussed the difficulty in cross-lingual information extraction caused by the translation of the source text. • Cross-lingual QDIE performs better • Translation-based QDIE suffers from NE recognition errors. • Structural errors and incorrect dependency analysis in MT output caused fewer and noisier pattern candidates NYCNLP (COLING 2004)

  23. Further Discussions • Linguistic tools necessary for QDIE systems are available for major languages. • Speculation from TIDES Surprise Language Exercise: development of tools in a new language • Machine Translation • Cross-lingual Information Retrieval • Named Entity tagger • (dependency/shallow/full) parser needs more work • Additional performance gain for Cross-lingual QDIE may be achieved by the techniques for query translation + query expansion. NYCNLP (COLING 2004)

  24. NYCNLP (COLING 2004)

  25. NE tagging by Cross-language Projection (inspired by Riloff et al. 2002) • used Giza++ (Och et al. 2003) to make word alignments between original Japanese sentences and MT-ed English sentences. • doubled the number of pattern candidates. 秋山社長が関西経済連合会の次期会長に就任する。 秋山社長が関西経済連合会の次期会長に就任する。 President Akiyama is inaugurated as the following chairman of Kansai Economic Federation. NYCNLP (COLING 2004)

More Related