1 / 28

Automatic Assignment of Domain Labels to WordNet

Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya. Automatic Assignment of Domain Labels to WordNet. Mauro Castillo V. Francis Real V. German Rigau C. GWC 2004. Outline. Introduction WordNet WN Domains Experimentation Evaluation and results

watlington
Download Presentation

Automatic Assignment of Domain Labels to WordNet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Automatic Assignment of Domain Labels to WordNet Mauro Castillo V. Francis Real V. German Rigau C. GWC 2004

  2. Outline • Introduction • WordNet • WN Domains • Experimentation • Evaluation and results • Discussion • Conclusions

  3. Introduction • To semantically enrich any WN version with the semantic domain labels of MultiWordNet Domains • WN is an standard resource for semantic processing • Effectiveness of Word Domain Disambiguation • The work presented explores the automatic and sistematic assignment of domain labels to glosses • Proposed Method can be used to correct and verify the suggested labeling

  4. WordNet • The version WN1.6 was used because of the availability of WN Domains

  5. pure_science mathematics geometry statistics biology botany zoology entomology anatomy ... ... ... WN Domains TOP WordNet Domain hierarchy developed at IRST (Magnini and Cavagliá, 2000)

  6. WN Domains • The synsets have been annotated semiautomatically with one or more labels • Most of synsets it has single a label Distribution of domain labels for synset noun = 1.170 verb = 1.078 adj = 1.076 adv = 1.033 Average labels for synset

  7. WN Domains • A domain may include synsets of different syntactic categories : e.g. MEDICINE • doctor#1 (n) • operar#7 (v) • medical#1 (a) • clinically#1 (r) • A domain label may also contain senses from different Wn subhierarchies. e.g. SPORT • athleta#1  life-form#1 • game-equipment#1  physical-object#1 • sport#1  act#2 • playing-field#1  location#1

  8. WN Domains • Synsets that have more than one label, do not seem to follow any pattern • sultana#n#1 (pale yellow seedless grape used for raisins and wine) Botany Gastronomy • morocco#n#2 (a soft pebble-grained leather made from goatskin; used for shoes and book bindings etc.) Anatomy Zoology • canicola_fever#n#1(an acute feverish disease in people and in dogs marked by gastroenteritis and mild jaundice) Medicine Physiology Zoology • blue#n#1, blueness#n#1 (the color of the clear sky in the daytime; "he had eyes of bright blue") Color Quality

  9. Applications of WN Domains • Word Sense Disambiguation • Word Domain Disambiguation • Text Categorization, etc. WN Domains • FACTOTUM : Used to mark the senses of WN that do not have a specific domain • STOP Senses: The synsets that appear frequently in different contexts, for instance: numbers, colours, etc.

  10. Experimentation • Process to automatically assign domain labels to WN1.6 glosses • Validation procedures of the consistency of the domains assignment in WN1.6, and especially, the automatic assignment of the factotum labels Distribution of synset with and without the domain label factotum in WN1.6

  11. Experimentación Test set was randomly selected (around 1%) and the other synsets were used as a training set Corpus test for nouns and verbs

  12. castle chess 68 castle sport 27 castle hystory 18 castle archictecture 57 castle law 12 castle tourism 24 … Experimentation castle#n#4, castling#n#1 CHESS SPORT castle castling | interchanging the positions of the king and a rook castle chess castle sport castling chess castling sport interchanging chess interchanging sport interchanging chess interchanging sport interchanging chess interchanging sport king chess king sport rook chess rook sport Calculation of frequency

  13. c(w,D) - 1/N*c(w)c(D) c(w,D) Experimentation Measures M1: Square root formula M2: Association Ratio Ar(w,D) = Pr(w|D)log2(Pr(w|D) / Pr(w)) M3: Logarithm formula log2(N*c(w,D) / c(w)c(D))

  14. orange botany 10.1739451057135 orange gastronomy 4.98225066954225 orange color 3.28232334801756 orange jewellery 1.49369255002054 orange entomology 1.23243498322359 orange quality 1.17822271128967 orange hunting 0.412524764820793 orange geology 0.293707167933641 orange chemistry 0.166183492890361 orange biology 0.110492358490017 Experimentation TRAINING MATRIX OF WEIGHTS CALCULATION VALIDATION

  15. 06950891 leader#n#1 PERSON law 2.70 factotum 2.09 computer_science 2.05 mathematics 1.83 grammar 1.68 play 1.57 linguistics 1.54 politics 1.35 person 19.94 law 8.01 economy 4.74 religion 4.24 anthropology 3.74 sexuality 3.53 politics 3.49 tourism 1.64 industry 1.54 person 1.46 mechanics 1.26 factotum 1.24 occultism 0.98 pedagogy 0.93 politics 4.30 history 3.33 religion 2.19 person 1.78 mythology 1.17 commerce 1.11 psychology 0.96 factotum 0.82 leader | a person who rules or guides or inspires others variant gloss person Experimentation POSITION 1: person = 30.23 POSITION 2: politics = 13.40 POSITION 3: law = 11.08 ... ... VD =  weigth(wi,dj)*percentage

  16. Evaluation y Results: nouns AP: Accuracy first label AT: Accuracy all labels P : Precision R : Recall F1 : 2PR/(P+R) MiA : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct MiD : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct (or subsumed as correct one in the domain hierarchy). Results for nouns without factotum SF Results for nouns with factotum CF

  17. Evaluation y Results: verbs AP: Accuracy first label AT: Accuracy all labels P : Precision R : Recall F1 : 2PR/(P+R) MiA : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct MiD : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct (or subsumed as correct one in the domain hierarchy). Results for verbs without factotum SF Results for verbs with factotum CF

  18. Evaluation y Results • On average, the method assigns: • Noun : 1.23 domains labels (1.170) • Verb : 1.20 domains labels (1.078) • We obtain better results with nouns • The best average results were obtained with the M1 measure • The first proposed label (noun): 70% accuracy • The results of verbs are worse than nouns, one of the reasons may be the high number of verbal synsets labels with factotum domain

  19. Discussion Monosemic words: credit application#n#1 (an application for a line of credit) Domains: SCHOOL Proposal 1. Banking Proposal 2. Economy Banking economy banking

  20. Discussion Relation between labels: Academic_program#n#1 (a program of education in liberal arts and sciences (usually in preparation for higher education)) Domains: PEDAGOGY Proposal 1. School Proposal 2. University pedagogy school university

  21. Discussion Relation between labels: shopping#n#1 (searching for or buying goods or services: "went shopping for a reliable plumber"; "does her shopping at the mall rather than down town") Domains: ECONOMY Proposal 1. Commerce social_science commerce economy

  22. Discussion Relation between labels: Fire_control_radar#n#1 (radar that controls the delivery of fire on a military target) Domains: MERCHANT_NAVY Proposal 1. Military social_science transport military merchant_navy

  23. Discussion Uncertain cases: birthmark#n#1 (a blemish on the skin formed before birth) Domains: QUALITY Proposal 1. Medicine bardolatry#n#1 (idolization of William Shakespeare) Domains: RELIGION Proposal 1. History Proposal 1. Literature

  24. Conclusions • The procedure to assign automatically domain labels to WN gloss seems to be dificult • The proposal process is very reliable with the first proposal labels • The proposal labels are ordered by priority • It is posible to add new correct labels or validate the old ones

  25. Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Automatic Assignment of Domain Labels to WordNet Mauro Castillo V. Francis Real V. German Rigau C. GWC 2004

  26. Discussion Relations WN: bowling#n#2 (a game in which balls are rolled at an object or group of objects with the aim of knocking them over) Domains: BOWLING Proposal 1. Play play sport hol play#n#16 game#n#2 play free_time hyp sport bowling#n#2 bowling

  27. WN Domains • Example (B. Magnini et. Al., 2001)

  28. WN Domains N SF DOMAINS SUMO TOP ONTOLOGY #1 Group Economy Corporation Function Group Human #2 Object Geography Geology Land-area Natural Place Substance #3 Possession Economy Keeping Function Moneyrepresentation Part #4 Artifact Architecture Economy Building Artifact Function Object #5 Group Factotum Collection Group #6 Artifact Economy Artifact Artifact Container Instrument Object #7 Object Geography Geology Land-area Natural Place Solid Substance #8 Possession Economy Play Currency-measure Function #9 Object Architecture Land-area Natural Place Substance #10 Act Transport Motion Agentive Boundedevent Cause Condition Dynamic Purpose

More Related