Slide1 l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 19

Technologies for (semi-) automatic metadata creation http://gate.ac.uk/ http://nlp.shef.ac.uk/ Diana Maynard University of Sheffield PowerPoint PPT Presentation


  • 114 Views
  • Uploaded on
  • Presentation posted in: General

Technologies for (semi-) automatic metadata creation http://gate.ac.uk/ http://nlp.shef.ac.uk/ Diana Maynard University of Sheffield KnowledgeWeb WP 1.3 meeting, Crete, 14 May 2004. USFD is mainly concerned in this WP with best practices and guidelines for ontology-based web applications

Download Presentation

Technologies for (semi-) automatic metadata creation http://gate.ac.uk/ http://nlp.shef.ac.uk/ Diana Maynard University of Sheffield

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Slide1 l.jpg

Technologies for (semi-) automatic metadata creation

http://gate.ac.uk/http://nlp.shef.ac.uk/

Diana Maynard

University of Sheffield

KnowledgeWeb WP 1.3 meeting, Crete, 14 May 2004

1


Overview l.jpg

USFD is mainly concerned in this WP with best practices and guidelines for ontology-based web applications

State-of-the-art systems and platforms for metadata creation

Metadata is created through semantic tagging

Metadata can be represented as inline (modification of the original document) or standoff (separate storage from the document)

Overview

2


Semi automatic v automatic metadata creation l.jpg

Semi-automatic methods are more reliable, but require human intervention

MnM: requires initial human annotation; pre-defined ontology

S-CREAM

AERODAML

Automatic methods less reliable, but suitable for large volumes of text, and offer a dynamic view

SemTag: semantic tagging from ontology

KIM: semantic tagging and ontology population

hTechSight: semantic tagging, ontology population and evolution

Semi-automatic v automatic metadata creation

3


Semi automatic methods l.jpg

MnM

S-CREAM

Semi-automatic methods

4


Slide5 l.jpg

Semi-automatic in that it requires initial training by user

Uses pre-defined set of concepts in ontology

User browses web and manually annotates his chosen pages

System learns annotation rules, tests them, and takes over annotation, populating ontologies with the instances found

Precision and recall are not perfect, however retraining is possible at any stage

MnM

5


S cream l.jpg

Semi-automatic CREAtion of Metadata

Uses Onto-O-Mat + Amilcare

Trainable for different domains

Aligns conceptual markup (which defines relational metadata) provided by e.g. Ont-O-Mat with semantic markup provided by Amilcare

S-CREAM

6


Annotated data in s cream l.jpg

Annotated data in S-CREAM

7


Amilcare l.jpg

Amilcare learns IE rules from pre-annotated data (e.g. using Ont-O-Mat)

Uses GATE (ANNIE) for pre-processing + applies rules learnt in training phase to new documents

Concepts need to be pre-defined, but system can be trained for new domain

Can be tuned towards precision or recall

Amilcare

8


Automatic methods l.jpg

SemTag

KIM

h-Techsight

Automatic methods

9


Semtag and kim l.jpg

SemTag and KIM both annotate webpages using instances from an ontology

Main problem is to disambiguate such instances which occur in multiple parts of the ontology

SemTag aims for accuracy of classification, whereas KIM aims more for recall (finding all instances)

KIM also uses IE to find new instances not present in ontology

SemTag and KIM

10


Semtag l.jpg

Automated semantic tagging of large corpora, using TAP ontology (contains 65K instances)

Largest scale semantic tagging effort to date

Uses concept of Semantic Label Bureau

Annotations are stored separately from web pages (standoff markup)

Uses corpus-wide statistics to improve quality of tagging, e.g. automated alias discovery

Tags can be extracted using a variety of mechanisms, e.g. search for all tags matching a particular object

SemTag

11


Semtag architecture l.jpg

SemTag Architecture

12


Slide13 l.jpg

KIM

  • Uses an ontology (KIMO) with 86K/200K instances

  • Lookup phase marks instances from the ontology

  • High ambiguity of instances with the same label (e.g. locations belonging to different countries)

  • Disambiguation uses an Entity Ranking algorithm, i.e., priority ordering of entities with the same label based on corpus statistics

  • Lookup is combined with rule-based IE system (from GATE) to recognise new instances of concepts and relations

  • Special KB enrichment stage where some of these new instances are added to the KB

13


Kim 2 l.jpg

KIM (2)

14


H techsight kmp l.jpg

Knowledge management platform for fully automatic metadata creation and ontology population, and semi-automatic ontology evolution, powered by GATE and ToolBox.

Data-driven analysis of ontologies enables trends of instances to be monitored

Uses GATE to support the instance-based evolution of ontologies in the Chemical Engineering domain.

Analysis of unrestricted text to extract instances of concepts from such ontologies

Instances populated into a domain-specific ontology and/or exported to an Access / Oracle database

h-TechSight KMP

15


Slide16 l.jpg

1

Ontology in

Employment

Web site URL

Visualisation of

New Instances

Analysis of Results

DB

Evolution of Ontologies

2

3

4

16


Ontology based ie in h techsight l.jpg

Ontology-Based IE for semantic tagging of job adverts, news and reports in chemical engineering domain

Semantic tagging used as input for ontological analysis

Fundamental to the application is a domain-specific ontology

Terminological gazetteer lists are linked to classes in the ontology

Rules classify the mentions in the text wrt the domain ontology

Annotations output into a database or as an ontology

Ontology-based IE in h-TechSight

17


Limitations l.jpg

h-Techsight uses rule-based IE system

Requires human expert to write rules

Accurate on restricted domains with small ontologies

Adaptation to a new domain / ontology may require some effort

Limitations

18


Summary l.jpg

Tradeoff between semi-automatic and fully automatic systems, dependent on application, corpus size etc

Tradeoff between rule-based and ML techniques for IE

Tradeoff between dynamic vs static systems

Summary

19


  • Login