1 / 11

WP4: Conceptual Mining from Text for Knowledge Engineering

WP4: Conceptual Mining from Text for Knowledge Engineering. State of the Art WP Coordinators: Alfonso Valencia Carlos Rodriguez. Why Concept/Semantic Mining?. Knowledge Acquisition Bottleneck Top-Down, manually-designed Ontologies are: sparse (non-exhaustive) shallow (not fine-grained)

clive
Download Presentation

WP4: Conceptual Mining from Text for Knowledge Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WP4: Conceptual Mining from Text for Knowledge Engineering State of the Art WP Coordinators:Alfonso ValenciaCarlos Rodriguez

  2. Why Concept/Semantic Mining? • Knowledge Acquisition Bottleneck • Top-Down, manually-designed Ontologies are: • sparse (non-exhaustive) • shallow (not fine-grained) • not mappable (to terms or other ontologies) • not easily updated or customized • Text-based ontologies reflect better diversity in knowledge as reflected by the literature and domain terminology

  3. Information for Ontology Learning

  4. State of the Art Methods • implicit relations • Corpus Distribuition • Machine Learning Algorithms • explicit relations • Symbolic (rule and syntax-based) • Hybrid, combining some or all • Bootstrap the ontology-learning process using existing resources

  5. An example Meiosis Cyclin Checkpoint Interphase Nucleoplasma Division Histone Replication Chromatid Blaschke, et al., Funct. Integ. Genomics 2001 Cell cycle Words 17 genes PCNA CDC2 MSH2 LBR TOP2A ... GO codes DNA replication DNA metabolism Cell Cycle control PCNA-MSH2The binding of PCNA to MSH2 may reflect linkage between mismatch repair and replication. LBR-CDC2 LBR undergoes mitotic phosphorylation mediated by p34(cdc2) protein kinase. Sentences 24 genes ABCA5 CAT ELF2 PIM1 WNT2 ... Dipeptidyl Prolyl nmr Collagen-binding Words Unknown

  6. Induce rules at different linguistic levels

  7. Lexical- and syntax-derived relationships from text • Complex relationships in CCO • degradates • participate_in • catalyses • adjacent_to • agent_in • What new ones can be learnt? LBR undergoes mitotic phosphorylation mediated by p34(cdc2) protein kinase. • mitotic phosphorylation mediated_by protein kinase • Can it be subsumed by others? • Are there other subcategories?

  8. Beyond the State of the Art • Optimal hybrid methodology for: • Extracting entities • Discovering relations • Providing ontology-relevant information(But what and how ?) • Comparing top-down with bottom-up ontologies • Providing definitional information • Application to CC-cancer domains(and possibly to gene regulation)

  9. In the context of project and other WPs… • Reasoning with text-generated ontologies: competing or complementing? • Reduction of lexical and semantic relationships to ontological relation inventory • How to present and use Text-Mined information for ontology design (especially for database annotation)? • How to curate, evaluate and compare ontologies?

  10. Information for Ontology Engineers • New Classes (ontology) and Instances (KB) • Definitions and glosses • Concept usage and entity examples • Terms and synonyms • Hierarchical and non-hierarchical relations • Possible reasoning rules

  11. To and Fro other WPs

More Related