1 / 34

INFORMATICA UMANISTICA D: LESSICOGRAFIA & COMPUTER

INFORMATICA UMANISTICA D: LESSICOGRAFIA & COMPUTER. Dizionari elettronici WordNet. Dizionari elettronici. Strumenti informatici usati non piu’ solo per realizzare dizionari cartacei, ma per sviluppare nuovi tipi di dizionari che consentono nuove forme di ricerca.

lara
Download Presentation

INFORMATICA UMANISTICA D: LESSICOGRAFIA & COMPUTER

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INFORMATICA UMANISTICA D: LESSICOGRAFIA & COMPUTER Dizionari elettronici WordNet

  2. Dizionari elettronici Strumenti informatici usati non piu’ solo per realizzare dizionari cartacei, ma per sviluppare nuovi tipi di dizionari che consentono nuove forme di ricerca

  3. DIZIONARI PER L’INGLESE IN FORMA ELETTRONICA • Oxford English Dictionary, seconda edizione • Oxford Talking Dictionary • Concise Oxford Dictionary • Learner dictionaries: • Longman Dictionary of Contemporary English (LDOCE) • Collins COBUILD English Dictionary

  4. CONCISE OXFORD DICTIONARY • RICERCA: • Headword search (con *) • Hypertext search • Full text search (also of phrases / groups) • FILTRI: • etymology, phrasal verbs, suffixes

  5. COLLINS: COBUILD • Disponibile da: • http://www.biblio.unitn.it/BancheDati/BancheDati.asp

  6. DIZIONARI ELETTRONICI PER L’ITALIANO • Il VELI • Zanichelli: CD-ROM Multilingue, Scaffale Elettronico • Devoto-Oli • Garzanti: IPA  `parla’

  7. DEVOTO-OLI

  8. ESEMPIO: DEVOTO-OLI • Ricerca normale • Forme di citazione (incrementale) • Hyperlinks • Definizione / declinazione • Sinonimi / contrari • Ricerca avanzata • No: pronuncia; citazioni? • Limitato: storico

  9. DEVOTO-OLI: SINONIMI E CONTRARI

  10. ESEMPIO:ZINGARELLI INTERATTIVO

  11. MRDS • Distinzione importante: • Dizionari consultabili elettronicamente • Dizionari MACHINE READABLE • Dizionari MACHINE TRACTABLE • Particolarmente utili: dizionari creati per EFL: • LDOCE • COBUILD • Progetto piu’ ambizioso: ODE in XML

  12. ESEMPIO: ODE su CD-ROM (in XML) Esempio di database lessicografico in XML (= estremamente machine tractable)

  13. ODE IN XML: OVERVIEW

  14. ODE IN XML: FORMATO DELLE ENTRIES <se> <cn>815750</cn> - <hg> <hw>stock</hw> </hg> <s1> <ps>noun</ps> - <s2 num="1"> - <df>the goods or merchandise kept on the premises of a shop or warehouse and available for sale or distribution:</df> <ex>the store has a very low turnover of stock</ex> |   </S2> <S2 num=“2”> …… </S2> </S1> <s1> <ps>adjective</ps> …..

  15. ODE IN XML: INFORMAZIONI NLP -<nlp> <sup>merchandise</sup> <ss>Commerce</ss> - <morph id="01"> - <mu sy="NN"> <inf>stock</inf> <ph>stQk</ph> </mu> + <mu sy="NNS"> <ph>stQks</ph> </mu> </morph> </nlp>

  16. ELDIT • (Elektronisches Lern(er)wörterbuch Deutsch-Italienisch – Dizionario elettronico per apprendenti italiano-tedesco ) • Un esempio di dizionario • Per apprendimento • Nato in forma elettronica • Lezione su ELDIT: il 14/5

  17. WordNet

  18. EAT-LEX-1 SEMANTICA & LESSICO: UN RIASSUNTO “eat” “eats” eat0600 eat0700 “ate” “eaten” WORD-FORMS LEXEMES SENSES

  19. STOCK-LEX-1 STOCK-LEX-2 STOCK-LEX-3 L’ORGANIZZAZIONE DEL LESSICO stock0100 stock0200 stock0600 “stock” stock0700 stock0900 stock1000 WORD-FORMS LEXEMES SENSES

  20. CHEAP-LEX-1 CHEAP-LEX-2 INEXP-LEX-3 SINONIMIA cheap0100 “cheap” …. …… cheapXXXX inexp0900 “inexpensive” inexpYYYY WORD-FORMS LEXEMES SENSES

  21. WORDNET • A lexical database created at Princeton • Freely available for research from the Princeton site • http://www.cogsci.princeton.edu/~wn/ • Information about a variety of SEMANTICAL RELATIONS • Three sub-databases (supported by psychological research as early as (Fillenbaum and Jones, 1965)) • NOUNs • VERBS • ADJECTIVES and ADVERBS • Each database organized around SYNSETS

  22. SYNSETS • Senses (or `lexicalized concepts’) are represented in WordNet by the set of words that can be used in AT LEAST ONE CONTEXT to express that sense / lexicalized concept: the SYNSET • E.g., {chump, fish, fool, gull, mark, patsy, fall guy, sucker, shlemiel, soft touch, mug}(gloss: person who is gullible and easy to take advantage of)

  23. IL DATABASE DEI NOMI • About 90,000 forms, 116,000 senses • Relations:

  24. IPERNIMIA 2 senses of robin                                                       Sense 1robin, redbreast, robin redbreast, Old World robin, Erithacus rubecola -- (small Old World songbird with a reddish breast)       => thrush -- (songbirds characteristically having brownish upper plumage with a spotted breast)           => oscine, oscine bird -- (passerine bird having specialized vocal apparatus)               => passerine, passeriform bird -- (perching birds mostly small and living near the ground with feet having 4 toes arranged to allow for gripping the perch; most are songbirds; hatchlings are helpless)                   => bird -- (warm-blooded egg-laying vertebrates characterized by feathers and forelimbs modified as wings)                       => vertebrate, craniate -- (animals having a bony or cartilaginous skeleton with a segmented spinal column and a large brain enclosed in a skull or cranium)                           => chordate -- (any animal of the phylum Chordata having a notochord or spinal column)                               => animal, animate being, beast, brute, creature, fauna -- (a living organism characterized by voluntary movement)                                   => organism, being -- (a living thing that has (or can develop) the ability to act or function independently)                                       => living thing, animate thing -- (a living (or once living) entity)                                           => object, physical object --                                                => entity, physical thing --

  25. MERONIMIA wn beak –holon Holonyms of noun beak 1 of 3 senses of beak Sense 2 beak, bill, neb, nib PART OF: bird

  26. VERBI • About 10,000 forms, 20,000 senses • Relations between verb meanings:

  27. RELAZIONI TRA SIGNIFICATI VERBALI V1 ENTAILS V2 when Someone V1 (logically) entails Someone V2- e.g., snore entails sleep TROPONYMY when To do V1 is To do V2 in some manner- e.g., limp is a troponym of walk

  28. AGGETTIVI & AVVERBI • About 20,000 adjective forms, 30,000 senses • 4,000 adverbs, 5600 senses • Relations:

  29. COME USARLO • Online: http://cogsci.princeton.edu/cgi-bin/webwn • Scaricatevelo, poi da command line: • Get synonyms: • wn –synsn bank • Get hypernyms: • wn –hypen robin • (also for adjectives and verbs): get antonyms • wn –antsa right

  30. I LIMITI DI WORDNET • Coverage • words not in WordNet • Crocidolite, spinoff (spin-off) • Missing information: MERONYMY • Context-dependent senses: • slump, crash, bust all synonyms in the WSJ corpus • The structure of WordNet • Some information is encoded in complex ways (room, wall, floor) • But: MOVING TARGET!!

  31. MERONIMIA IN WORDNET: UN ESPERIMENTO • 100 bridging descriptions in a mereological relation • Ran a script trying to find a direct link in WordNet (1.7) between one of the senses of the BD and one of the senses of any of the previous NPs • Results: in only 6 cases there is in WordNet a direct lexical relation between a BD and one of the CFs

  32. ARTIFACT IS-A IS-A HOUSING BUILDING IS-A IS-A PART-OF HOUSE HOME ROOM PART-OF PART-OF WALL FLOOR John looked at the HOUSE. The WALL was crumbling.

  33. SOLUZIONE: ACQUISIZIONE LESSICALE • Parziale (aggiungi informazioni a WordNet, specialmente per domini specialistici) • Totale (crei un nuovo lessico a partire da zero)

  34. LETTURE • Jackson, cap. 6.7 • Marello, cap. 5.5 • C. Fellbaum. WordNet: An electronic lexical database. MIT Press, 1998 • cap. 1

More Related