1 / 24

NOOJ Conference Inalco, Saarbruecken June 5th, 2013

Russian Module for NooJ: Semantic annotation. Conception and realisation of semantic tags for the Russian language for Max Silberztein’s Nooj software. NOOJ Conference Inalco, Saarbruecken June 5th, 2013. Vincent BÉNET INALCO CREE Recherche assistée par ordinateur.

brita
Download Presentation

NOOJ Conference Inalco, Saarbruecken June 5th, 2013

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Russian Module for NooJ: Semantic annotation Conception and realisation of semantic tagsfor the Russian languagefor Max Silberztein’s Nooj software NOOJ Conference Inalco, Saarbruecken June 5th, 2013 Vincent BÉNET INALCO CREE Recherche assistée par ordinateur

  2. Russian Module for NooJ:design and implementation of lexical and grammatical ressources • one main dictionary (95000 entries) • twoannexdictionaries • one for propernouns • one for noun-adjectives

  3. Russian Module for NooJ:design and implementation of basic semantic ressources How ? • by adding tags to the general dictionary • by writing grammars • Semantic Tagging or Annotation ?

  4. Writing semantic resources for the Russian language The semantic tags of the Russian national Corpus: Taxonomy (a lexeme's thematic class) – for nouns, verbs, adjectives and adverbs. Mereology(“part – whole” and “element – aggregate” relationships) – for concrete and abstract nouns Topology – for concrete names Causation – for verbs Evaluation – for abstract and concrete nouns, adjectives and adverbs

  5. Writing semantic resources for the Russian language 27 semantic taxonomic tags for verbs t:move — movement (бежать, дергаться, бросить, нести) t:be — sphere of existence (жить, возникнуть, убить) t:loc — location (лежать, стоять, положить) t:poss — sphere of possession (иметь дать, подарить, приобрести) t:ment — mental sphere (знать, верить, догадаться, помнить) t:perc — perception (смотреть, слышать, нюхать, чуять) t:speech — speech (говорить, советовать, спорить, каламбурить) t:sound — sounds (гудеть, шелестеть) t:light — light (гаснуть, лучиться)

  6. Semantic information in the Russian national corpus (Verbs)

  7. Writing semantic resources for the Russian language khodit’,V+Mvt+Indet+ipf+intr+FLX=ходить Idti,V+Mvt+Det+ipf+intr+FLX=идти Vkhodit’,V+Mvt+Pvb+ipf+intr+FLX=ходить Vojti ’,V+Mvt+Pvb+pf+intr+FLX=идти Vykhodit’,V+Mvt+Pvb+ipf+intr+FLX=ходить Priezzhat’,V+Mvt+Pvb+ipf+intr+FLX=акать

  8. Grammar to locate the verbs of motion

  9. Searching for « verbs of motion » withNooj

  10. Searching for « verbs of motion » withNooj

  11. Writing semantic resources for the Russian language — concrete nouns (девочка, стол, молоко) — abstract nouns (вождение, яркость, время) — proper names (Иван, Эйнштейн, Петроград) — person (человек, учитель) — ethnonyms (эфиоп, итальянка) — kinship terms (брат, бабушка) — supernatural creatures (русалка, инопланетянин) — animals (корова, жираф, сорока, ящерица, муравей) — plants (береза, роза, трава) a.s.o.

  12. Semantic information in the Russian national corpus (Nouns)

  13. Semantic information in the Russian national corpus (Adjectives)

  14. Semantic information in the Russian national corpus (Adverbs)

  15. Writing basic semantic resources for the Russian language Nooj properties.def file N_Genre = m | f | n ; N_SGenr = an | inan ; N_Nombre = s | p; N_Cas = Im | Vi | Ro | R2 | Da | Tv | Pr | P2 | Zv ; … V_Type = Mvt; V_Morph = Pref | Suff;

  16. Writing basic semantic resources for the Russian language Nooj properties.def file A_Sem = Animal; Color ( Hum = App) N_Sem = Hum | Prof | Parents | Body Conc | Abstr | Org | Text | Animal | Food | Health | Arts | Lit | Music | Sports Topo | Country | River | City | Mount| Lake | Posit | Time | Color ; ADV_Sem = Time |Topo | Modal; V_Sem = Color | Topo | Posit |Modal;

  17. Writing semantic resources for the Russian language mal’chik, N+an+Hum+FLX=bul’dog pered tem kak,CONJ+UNAMB+Time Moskva,N+f+inan+City+FLX=Москва Don,N+m+inan+River+FLX=Дон Katar,N+Country+m+s+FLX=Ленинград Nora,N+Forename+Hum+f+an+FLX=Лена

  18. Writing semantic resources for the Russian language zelënyj,A+Color+FLX=novyj zelenovatyj,A+ Color+FLX= zelënen’kij, A+Color+FLX=novyj temno-zelënyj, A+Color+FLX=novyj zelen’,N+f+inan+Color+FLX=smes’ zelenet’,V+intr+ipf+Color+FLX=belet’ zazelenet’,V+intr+pf+Color+FLX=belet’ zazelenet’sja,V+sja+pf+Color+FLX=….

  19. Writing basic semantic resources for the Russian language Topo = 40 Country = 180 River = 15 City = 175 Mount = 5 Lake = 5 Posit = 25 Time = 135 Modal = 15 Color = 275 Prof = 900 Parent = 160 items Forenames = 2280 Animal = 370 Food = 280 (Liquid = 25 ) Body = 285 Health = 175 Arts = 65 Lit = 40 Music = 155 Sport = 65

  20. Searching for « colors » withNooj

  21. Searching for « body parts » withNooj

  22. Searching for « parents (relatives) » wordswithNooj

  23. Writing basic semantic resources for the Russian language NEXT WORK TO BE DONE…. -Completion of the dictionary for concrete nouns using thematic dictonaries -a new parameter to the dictionary +Translation= to use Nooj as a resource to build basic dictionaries for parallel corpuses.

  24. Russian Module for NooJ: Semantic annotation Thank you for your attention vincent.benet@inalco.fr NOOJ ConferenceInalco, Saarbruecken June 5th, 2013

More Related