1 / 16

Mini-Ontology Generation from Canonicalized Tables

Mini-Ontology Generation from Canonicalized Tables. Stephen Lynn Data Extraction Research Group Department of Computer Science Brigham Young University. Supported by the. TANGO Overview. TANGO: Table ANalysis for Generating Ontologies Project consists of the following three components:.

havyn
Download Presentation

Mini-Ontology Generation from Canonicalized Tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department of Computer Science Brigham Young University Supported by the

  2. TANGO Overview TANGO: Table ANalysis for Generating Ontologies Project consists of the following three components: • Transform tables into a canonicalized form • Generate mini-ontologies • Merge into a growing ontology

  3. Sample Input Sample Output

  4. Mini-Ontology GeneratOr (MOGO) • Concept/Value Recognition • Relationship Discovery • Constraint Discovery

  5. Concept/Value Recognition • Lexical Clues • Labels as data values • Data value assignment • Data Frame Clues • Labels as data values • Data value assignment • Default • Classifies any unclassified elements according to simple heuristic. Year 2002 2003 Region State Concepts and Value Assignments Location Population Latitude Longitude Northeast Northwest Delaware Maine Oregon Washington 2,122,869 817,376 1,305,493 9,690,665 3,559,547 6,131,118 45 44 45 43 -90 -93 -120 -120

  6. Relationship Discovery • Dimension Tree Mappings • Lexical Clues • Generalization/Specialization • Aggregation • Data Frames • Ontology Fragment Merge

  7. Constraint Discovery • Generalization/Specialization • Computed Values • Functional Relationships • Optional Participation

  8. Validation • Concept/Value Recognition • Correctly identified concepts • Missed concepts • False positives • Data values assignment • Relationship Discovery • Valid relationship sets • Invalid relationship sets • Missed relationship sets • Constraint Discovery • Valid constraints • Invalid constraints • Missed constraints

  9. Concept Recognition • What we counted: • Correct/Incorrect/Missing Concepts • Correct/Incorrect/Missing Labels • Data value assignments

  10. Relationship Discovery • What we counted: • Correct/incorrect/missing relationship sets • Correct/incorrect/missing aggregations and generalization/specializations

  11. Constraint Discovery • What we counted: • Correct/Incorrect/Missing: • Generalization/Specialization constraints • Computed value constraints • Functional constraints • Optional constraints

  12. Concept Recognition • Successes • 98% of concepts identified • Missing label identification • 97% of values assigned to correct concept • Common problems • Finding an appropriate label • Duplicate concepts

  13. Relationship Discovery • Recall of 92% for relationship sets • Missing aggregations and generalizations/specializations • Only found in label nesting

  14. Constraint Discovery • F-measure of 98% for functional relationship sets • Poor computed value discovery • Rows/Columns with totals

  15. Conclusions • Tool to generate mini-ontologies • Assessment of accuracy of automatic generation

  16. Future Work • Tool Enhancements • Linguistic processing • Data frame library • Domain specific heuristics • Alternate Uses • Annotation for the Semantic Web

More Related