1 / 20

Survey of Semantic Annotation Platforms

SAC 2005. Survey of Semantic Annotation Platforms. Lawrence Reeve Hyoil Han. Semantic Annotation. Creating semantic labels within documents for the Semantic Web Used to support: Advanced searching (e.g. concept) Information Visualization (using ontology) Reasoning about Web resources

rich
Download Presentation

Survey of Semantic Annotation Platforms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAC 2005 Survey of Semantic Annotation Platforms Lawrence ReeveHyoil Han

  2. Semantic Annotation • Creating semantic labels within documents for the Semantic Web • Used to support: • Advanced searching (e.g. concept) • Information Visualization (using ontology) • Reasoning about Web resources • Converting syntactic structures into knowledge structures (humanmachine)

  3. Semantic Annotation Process

  4. Semantic Annotation Concerns • Scale, Volume • Existing & new documents on the Web • Manual annotation • Expensive – economic, time • Subject to personal motivation • Schema Complexity • Storage • support for multiple ontologies • within or external to source document? • Knowledge base refinement • Access - How are annotations accessed? • API, custom UI, plug-ins

  5. Semantic Annotation Platforms • Why semantic annotation platforms (‘SAPs’)? • Reduces human involvement • Consistent application of ontologies • Reduced cost – economic & time • Scalability • Multiple ontologies for single document

  6. Semantic Annotation Platforms • Characteristics • Provide many services, not just annotation • Storage: ontology, KB, and annotation • Access APIs (query annotations) • Integrate information extraction methods • Support for IE (gazetteers) • Extensible

  7. SAP General Architecture

  8. SAP Classification

  9. SAP Classification • Pattern-based • Pattern-discovery • Iterative learning • provide initial seed set • find new entities  find new patterns • repeat • Rules • Manually define rules to find entities in text • Simple label matching

  10. SAP Classification • Machine-learning based • Wrapper Induction • LP2 • Uses structural and linguistic information • Produces tagging & correction rules as output • Statistical models • Hidden Markov Model

  11. SAP Classification • Multistrategy • Combine pattern and machine-learning approaches • Did not find a platform that implements this approach • Platform extensibility important for implementation

  12. Semantic Annotation Platforms • Selection • Idea is to get a representative sample of platforms using various information extraction techniques • System needed to be a platform offering services, not just algorithm

  13. Semantic Annotation Platforms

  14. Language Toolkits • GATE – language processing system • Component architecture, SDK, IDE • ANNIE (‘A Nearly-New IE system’) • tokenizer, gazetteer, POS tagger, sentence splitter, etc • JAPE – Java Annotations Pattern Engine • provides regular-expression based pattern/action rules • Amilcare • adaptive IE system designed for document annotation • based on LP2 • uses ANNIE

  15. KIM (2003) • ontology, kb, semantic annotation, indexing and retrieval server, front-ends (Web UI, IE plug-in) • KIMO ontology • 250 classes, 100 properties • 80,000 entities from general news corpus in KB • (plus >100,000 aliases) • IE • Uses GATE, JAPE • Gazetteers (from KB) Source: http://www.ontotext.com/kim/SemWebIE.pdf

  16. Ont-O-Mat (2002) • Uses Amilcare • Wrapper induction (LP2) • Extensible • Adapted in 2004 for PANKOW algorithm • Disambiguation by maximal evidence • Proper nouns + ontology  linguistic phrases Source: http://www.aifb.uni-karlsruhe.de/WBS/sha/papers/ kcap2001-annotate-sub.pdf

  17. MUSE (2003) • Pipeline of processing resources (PRs) • PRs called conditionally based on text attributes • Makes use of JAPE • Adaptive rules • Can link multiple resources together • Gazetteer + part-of-speech tagger • Resolve entity ambiguities Source: http://gate.ac.uk/sale/expertupdate/muse.pdf

  18. SemTag (2003) • Large-scale annotation • Annotations separate from source • “Semantic Label Bureau” • Uses the TAP taxonomy • Approach is: • Find match to label in taxonomy • Save window before & after match • Perform disambiguation • Main contribution is using taxonomy for disambiguation Source: http://www.almaden.ibm.com/webfountain/ resources/semtag.pdf

  19. Platform Effectiveness *as reported by platform authors

  20. Summary • Several platforms developed in last several years • Large implementation effort; many services • Differentiated by • IE methods used • Services provided • Future • IE integration will likely improve annotation accuracy • Extension of existing platforms will allow for quicker research

More Related