1 / 37

The Harmony of Music and Computing

The Harmony of Music and Computing. Expanding a Domain-Specific Database. Jantine Trapman. Overview. Components LT4eL Cornetto Creation / expansion of Music Ontology Automatic Creation Watson Prompt Mapping Music Ontology Cornetto. Components. LT4eL Cornetto. Components: LT4eL.

bin
Download Presentation

The Harmony of Music and Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Harmony of Music and Computing Expanding a Domain-Specific Database Jantine Trapman

  2. Overview • Components • LT4eL • Cornetto • Creation / expansion of Music Ontology • Automatic Creation • Watson • Prompt • Mapping • Music Ontology • Cornetto

  3. Components • LT4eL • Cornetto

  4. Components: LT4eL Language Technology for eLearning www.lt4el.eu • Development of search and management facilities in the LMS: • Keyword Extractor • Glossary Candidate Finder • Semantic Search

  5. Semantic Search • Based on: • (multilingual) documents (LOs) for eight languages • semantic annotation of LOs • ontology • lexicon for each language involved • Corpus and ontology are restricted to Computing domain

  6. Computing Ontology (1) • Creation: • Manually annotated keywords in eight languages extracted from LOs • Translated into (English) concepts • Definitions collected on the WWW and added to concepts • Extension with additional concepts from: • Restrictions on existing concepts • Superconcepts of existing concepts • Missing subconcepts • Annotation of LOs

  7. DOLCE WordNet Computing German Polish Romanian LT4eL lexicons Maltese Portuguese English Bulgarian Czech Dutch Computing Ontology (2) • Domain ontology: • Domain: Computing • Manually created • 1406 concepts • 50 from DOLCE • 250 intermediate concepts from OntoWordNet • Use: • Lexicon development for 8 languages • Semantic annotation LOs • LO indexing

  8. Computing Ontology Part

  9. Computing Lexicon • Concepts were translated in all languages • Each entry contains three types of information: • Concept (and superconcept): CDDrive (is-a Drive) • Definition: a drive that reads a compact disc and that is connected to an audio system • Set of terms in a given language: CD-speler, CD drive

  10. Expansion of the LT4eL KB • Future: more domains needed • Task: • Expansion ontology and lexicons • Preferably semi-automatic • Three options: • Top-down • Bottom-up • Both, ingredients: • Cornetto, WordNet • Music ontology • Watson, Prompt

  11. SUMO/ MILO Wordnet Dutch WordNet (DWN) Cornetto Database Referentie Bestand Nederlands (RBN) Cornetto • Combinatorial and Relational • Network as Toolkit for Dutch Language Technology • Referentie Bestand Nederlands (RBN)  lexical units • Dutch part of EuroWordNet: Dutch WordNet (DWN)  synsets • SUMO/MILO plus extensions  terms and axioms • Core: table of Cornetto Identifiers (CIDs) http://www.let.vu.nl/onderzoek/projectsites/cornetto/index.html

  12. Example Lexical Entry Cornetto (1)

  13. [noun] zanger:1c_n-42316 • Morphology: type:derivation; structure:zingen[*er]; plurforms:zangers • Syntax: gender:m/f; article:de • Semantics: reference:common; countability:count; type:human; subclass:beroepsnaam/beoefenaar; resume:iemand die zingt • Pragmatics: domain:muz

  14. Example Lexical Entry Cornetto (2) • Combinatorics zanger1: • De redacteur van het woordenboek was ook een zanger • De zanger van de band • SUMO: (+, , hasSkill) • Synonyms: zanger, zangeres HAS_HYPERONYM musicus, musicienne, muzikant HAS_HYPONYM baszanger, sopraan, blueszanger, charmezanger, ... • Equivalence relations:EQ_SYNONYM singer, vocalist, vocalizer, vocaliser /ENG20-09908715-n  link with WordNet 2.0! • WordNet Domains: music

  15. Goal:

  16. Tasks • Extract music related terms from Cornetto • Create a domain ontology for Music • Map between terms from lexicon and concepts in ontology • Map music ontology to OntoWN and DOLCE • Adjust Cornetto data to LT4eL format

  17. Questions (1) • How can we automatize the process of ontology building and to which extent? • How can we profit from existing resources from the Semantic Web to enrich ontologies? • To which extent do Watson and PROMPT support the reuse of existing resources?

  18. Music Ontology • Automatic Creation • Expansion with: • Watson • Prompt

  19. Automatic Creation (1) • (Basili et al. 2007): automatic ontology extraction from open-domain corpus (BNC) • Designed for three tasks: • lexical ambiguity resolution within a specific domain • restricting a set of terms to a subset relevant for an ontology to be constructed • expanding this new ontology with other, novel and relevant concepts, relations and instances.

  20. Automatic Creation (2) • Preprocessing: • Corpus split in 40 sentence text segments • PoS tagging • Filtering of noun phrases • General steps: • Term extraction through Latent Semantic Analysis (Deerwester et al. 1990) • Ontology extraction from WordNet based on Conceptual Density (Agirre and Rigau 1996)

  21. Music Ontology Part

  22. Music Ontology (Basili et al. ‘07) • 46 primitive classes • Leaf concepts have a synset ID from WordNet • No properties, only super-/subconcept relation • So.. a rather small and shallow ontology expansion by exploiting Semantic Web techniques

  23. Watson (1) http://watson.kmi.open.ac.uk/WatsonWUI/ • Every URI is clickable: all resources are available • Information about: • Size • Representation language • Number of classes, properties, individuals etc. • Review rating • Interface for SPARQL queries • Possibility of (upwards) navigation

  24. Watson (2) • Also available as • Protégé plug-in (under development) • API • New concepts can be added • Manually • One by one • Much human action required • Faster than creation from scratch, but still a tedious exercise

  25. Watson (3) • Watson provides in • a list of URIs of available semantic databases • a list of candidate concepts • What is still lacking: • a (semi-)automatic way to merge or align new concepts or ontologies to an existing one. • Possible solution: Prompt

  26. PROMPT (1) http://protege.stanford.edu/plugins/prompt/prompt.html • Protégé plug-in • Functionalities: • Comparison • Inclusion • Merging • Alignment • Requirement: ontologies for merge etc. must be available offline • Prompt goes beyond purely syntactic matching • Evaluation shows that experts followed 90% of Prompt’s suggestions

  27. Prompt (2) • Saves time and effort: • linguistically similar classes are found quickly • inherited properties and subclasses can be added automatically • similar structures are automatically detected • automatic consistency check • Resources must have the exact same markup language • Merging: • faster but more complex • requires good insight in resources

  28. Mapping • Music Ontology • Cornetto

  29. Resources • Music Ontology: • Some nodes have WordNet ID (from the automatic process • Many haven’t, especially those added with Watson • Cornetto entries: • have synset ID from Dutch WN • have mapping to WordNet entry through equivalence or near-equivalence e.g.

  30. Questions (2) • To which extent does WordNet support a mapping between: • The Cornetto lexicon and a newly created ontology partly based on Wordnet; • The existing ontology and lexicon from LT4eL, and Cornetto + ontology

  31. Procedures • A concept either has or has not a WN synset ID • Mapping via WordNet synset ID: • Lookup synset ID in Cornetto • Establish related DWN synset(s) • Results: until now without problems although near-equivalence relations are expected to give mismatches • Mapping without synset ID: • Syntactic matching of conceptname with terms from WordNet synsets • compare definitions and glosses

  32. Examples “easy match” • zanger:1 d_n-20810 (iemand die zingt) is [EQ_SYNONYM] of: singer, vocalist, vocalizer, vocaliser /ENG20-09908715-n (a person who sings ) • strijkkwartet:1 d_n-14287 (ensemble van vier strijkers) and: strijkkwartet:2 d n-19905 (ensemble voor vier strijkers) are [EQ_NEAR_SYNONYM] of: soloist:1/ENG20-09931035 • Note: Cornetto contains mismatch between WN and DWN

  33. Matching without ID (1) • For each owl:Class in Music ontology • try to match with: • target attribute in relation element of Cornetto XML structure, where • Attribute relation_name is (EQ_)NEAR_SYNONYM e.g. • Add synset ID to concept (for mapping to OntoWordNet) <owl:Class rdf:about=“http:///myOntos/music.owl#orchestra"/> <relation relation_name="EQ_NEAR_SYNONYM" target20-previewtext="symphony orchestra:1, symphony:2" version="pwn_1_6" target20="ENG20-07750308-n" target="ENG16-06123240-n">

  34. Matching without ID (2) • Compare definitions and glosses: • many ontology classes have a definition • each WN synset has a gloss • preprocess: stemming and filtering nouns • Consider percentage of nouns in concept definition that match with a certain gloss • Evaluate results • Note: some definitions are equal to WN glosses

  35. Current work • Matching without ID on class name and definitions/glosses • Manually check results for precision and recall • Problem: MWEs, e.g. class Brass_Instrument: • has no precise WN counterpart, but • Brass does exist, but • it has multiple senses  how can we disambiguate? • Question: ID allows easy and reliable match, but can we do the task without?

  36. Remaining and Future work • Attuning format lexicon to LT4eL format • Mapping to OntoWordNet (semi-automatic) • Mapping to DOLCE (manual task) • Ontology evaluation • Experiments with WordNets from different languages • Involve additional lexical info to improve LT4eL search engine e.g. use morphological info about plural forms

More Related