1 / 12

Ontology-Aware Information Extraction gate.ac.uk/ Hamish Cunningham, Kalina Bontcheva

Ontology-Aware Information Extraction http://gate.ac.uk/ Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb 4, SIG 5, 2002. GATE, a General Architecture for Text Engineering GATE is….

forest
Download Presentation

Ontology-Aware Information Extraction gate.ac.uk/ Hamish Cunningham, Kalina Bontcheva

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology-Aware Information Extraction http://gate.ac.uk/ Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb 4, SIG 5, 2002

  2. GATE, a General Architecture for Text Engineering • GATE is…. • An architectureA macro-level organisational picture for LE software systems. • A frameworkFor programmers, GATE is an object-oriented class library that implements the architecture. • A development environmentFor language engineers, computational linguists et al, GATE is a graphical development environment bundled with a set of tools for doing e.g. Information Extraction. • Free software (LGPL). Mature robust software (in development since 1995). Download at http://gate.ac.uk/download • Comes with… • Some free components... ...and wrappers for other people's components • Tools for: evaluation; visualise/edit; persistence; IR; IE; dialogue; ontologies; etc. 2(12)

  3. Applications; languages • GATE has been used for a variety of applications, including: • MUMIS: automatic creation of semantic indexes for multimedia programme material • MUSE: a multi-genre IE system • EMILLE: a 70 million word corpus of Indic languages • Metadata for Medline (at Merck) • Creation of metadata for Semantic Web Services; documentation using NLG • HSE: summarisation of health and safety information from company reports • OldBaileyIE: NE recognition on 17th century Old Bailey Court reports. • AKT: language technology in knowledge management • AMITIES: call centre automation • Digital libraries / e-philology for ancient languages researchers • Various Medical Informatics and database technology projects • IE in Romanian, Bulgarian, Greek, Bengali, Spanish, Swedish, German, Italian, and French (Arabic, Chinese and Russian next year) 3(12)

  4. Some users… At time of writing a representative fraction of GATE users includes: • Longman Pearson publishing, UK; • BT Exact Technologies, UK; • Merck KgAa, Germany; • Canon Europe, UK; • Knight Ridder (the second biggest US news publisher); • BBN Technologies, US; • Sirma AI Ltd., Bulgaria; • Resco AB, Sweden/Finland/Germany; • Glaxo Smith Kline Plc: drug-based navigation of Medline abstracts • Master Foods NV: extraction of commodities events from news • the American National Corpus project, US; • Imperial College, London, the University of Manchester, Queen Mary College, UMIST, the University of Karlsruhe, Vassar College, ISI / the University of Southern California and a large number of other UK, US and EU Universities; • the Perseus Digital Library project, Tufts University, US. 4(12)

  5. Scientific method and HLT • How do we really know that this stuff works?! • Open source systems make experimental repeatability easier and therefore cut down on site-specific skew effects. • GATE's IE tools have competed in MUC, TREC (QA), ACE, and DUC. TIDES Surprise Language exercise next year. • GATE includes markup and automated evaluation tools: easier quantitative evaluation. 5(12)

  6. Collaboration opportunities • Interoperation, integration, not re-invention: collaboration not competition • Take the code, do what you like with it, perhaps contribute something back • Involve us in your 6th Framework projects • Join KITShare: a network of excellence in Knowledge and Interface Tool Sharing. 6(12)

  7. The Holy Grail • Problem: gap between many current IE tools and SemWeb needs 7(12)

  8. What is needed? • Content, not Information Extraction • Identify the ontological reference, not just the class • Maintain referential integrity (coreference) • Ontology-aware IE tools • Use instances already in the ontology • React to changes in the ontology • Support experienced users to change the IE tools 8(12)

  9. GATE and Content Extraction ANNIE - Open-source IE system in GATE, providing modules needed for content extraction • Pre-processing • Named entity recognition • Coreference resolution • ANNIE handles proper names, pronouns, and nominals • Easy-to-use pattern-action rule language to enable customisation and postprocessing of the IE results 9(12)

  10. Populating Ontologies with ANNIE 10(12)

  11. Ontologies as explicit IE resources • Reuse, not reinvention: • Protégé for ontology maintenance • Sesame/KAON for storage and reasoning • Ontology-aware gazetteers • Provide the ontological class of each entry • Use instances from the ontology for IE 11(12)

  12. Ontology-aware IE • The IE tools can use available formal knowledge and reasoning • Ontology-based anaphora resolution • G. Bush, G. Brown, the president • The correct ontological classes are assigned to the recognised entities • Changes in the ontology available to the IE tools 12(12)

More Related