390 likes | 427 Views
BabelNet and beyond: a huge multilingual semantic network and its potential for interconnecting migration routes. Roberto Navigli. http://lcl.uniroma1.it. 16th June 2016 – Rome. Roberto Navigli. Associate Professor in the Department of Computer Science (Sapienza, Rome)
E N D
BabelNet and beyond: a huge multilingual semantic network and its potential for interconnecting migration routes Roberto Navigli http://lcl.uniroma1.it 16th June 2016 – Rome
Roberto Navigli • Associate Professor in the Department of Computer Science (Sapienza, Rome) • Principal investigator of several projects: • ERC Starting Grant(MultiJEDI) • FP7 CSA (LIDER) • Google Focused Research Award(co-PI) • Managing a team of 10 researchers, out of which 6 Ph.D. students BabelNet, Babelfy and Beyond! Roberto Navigli
Outline of the talk • BabelNet: a huge multilingual semantic network • Babelfy: a state-of-the-art multilingual disambiguation system • What's next: how to help interconnect/detect migration flows with the aid of our technologies BabelNet, Babelfy and Beyond! Roberto Navigli
INTEGRATING KNOWLEDGE BabelNet, Babelfy and Beyond! Roberto Navigli
The resource diaspora BabelNet, Babelfy and Beyond! Roberto Navigli
The resource diaspora • There are many online dictionaries and encyclopedias • Each covers one or a limited number of languages • The knowledge found in different resources is often complementary • To get coverage of more languages • To get additional information about the entry • To obtain links to geographical information • However, each resource provides different meaning inventories BabelNet, Babelfy and Beyond! Roberto Navigli
BabelNet: Unifying Lexical Knowledge Resources into a Single Semantic Network Key Objective 1: create knowledge for all languages MultiWordNet WOLF BalkaNet MCR GermaNet WordNet BabelNet, Babelfy and Beyond! Roberto Navigli
WordNet [Miller et al., 1990; Fellbaum, 1998] semantic relation concepts BabelNet, Babelfy and Beyond! Roberto Navigli
Wikipedia [The Web Community, 2001-today] (unspecified) semantic relation • Playing with senses • Bla bla bla bla bla bla bla • Bla bla bla bla bla bla bla • Bla bla bla bla bla bla bla • Bla bla bla bla bla bla bla • Bla bla bla bla bla bla bla concepts BabelNet, Babelfy, Video games with a purpose & the Wikipedia Bitaxonomy Roberto Navigli
Merging entries from different resources into BabelNet • We collect lexicalizations, definitions, translations, images, etc. from each of the merged resources WordNet 10 BabelNet, Babelfy and Beyond! Roberto Navigli
BabelNet: concepts and semantic relations • We encode knowledge as a labeled directed graph: • Each vertex is a Babel synset (=synonym set) • Each edge is a semantic relation between synsets: • is-a (balloon is-a aircraft) • part-of (gasbag part-of balloon) • instance-of (Einstein instance-of physicist) • … • unspecified/relatedness (balloon related-to flight) balloonEN, BallonDE, aerostatoES, aerostatoIT, pallone aerostaticoIT, mongolfièreFR BabelNet, Babelfy and Beyond! Roberto Navigli
What is BabelNet? • A merger of resources of different kinds: META Prize 2015: BabelNet Roberto Navigli
What is BabelNet? • A merger of resources of different kinds: • WordNet: the most popular computational lexicon of English • Open Multilingual WordNet: a collection of open wordnets • WoNeF: a French WordNet • Wikipedia: the largest collaborative encyclopedia • Wikidata: the largest collaborative knowledge base • Wiktionary: the largest collaborative dictionary • OmegaWiki: a medium-size collaborative multilingual dictionary • GeoNames: a worldwide geographical database • Microsoft Terminology: a computer science thesaurus • High-quality automatic sense-based translations BabelNet, Babelfy and Beyond! Roberto Navigli
What is BabelNet? • A merger of resources of different kinds: BabelNet, Babelfy and Beyond! Roberto Navigli
Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages BabelNet, Babelfy and Beyond! Roberto Navigli
Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages • Coverage: 271 languages and 14 million entries! • 6Mconcepts and 7.7M named entities • 119M word senses • 378Msemantic relations (27 relations per concept on avg.) • 11M images associated with concepts • 41M textual definitions • 2M concepts with domains associated BabelNet, Babelfy and Beyond! Roberto Navigli
Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages • Coverage: 271 languages and 14 million entries! • Concepts and named entities together: dictionary and encyclopedic knowledge is semantically interconnected Multilingual Web Access – WWW 2015 Roberto Navigli META Prize 2015: BabelNet Roberto Navigli 20/06/2016 17
Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages • Coverage: 271 languages and 14 million entries! • Concepts and named entities together: dictionary and encyclopedic knowledge is semantically interconnected • "Dictionary of the future": semantic network structure with labeled relations, pictures, multilingual synsets Multilingual Web Access – WWW 2015 Roberto Navigli META Prize 2015: BabelNet Roberto Navigli 20/06/2016 18
Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages • Coverage: 271 languages and 14 million entries! • Concepts and named entities together: dictionary and encyclopedic knowledge is semantically interconnected • "Dictionary of the future": semantic network structure with labeled relations, pictures, multilingual synsets • Media coverage and prestigious prizes BabelNet, Babelfy and Beyond! Roberto Navigli 19
ADDRESSING LEXICAL AMBIGUITY BabelNet, Babelfy and Beyond! Roberto Navigli
Lexical ambiguity! • Thomas and Mario played as strikers in Munich. BabelNet, Babelfy and Beyond! Roberto Navigli
Word Sense Disambiguation and Entity Linking • Thomasand Mario are strikers playing in Munich BabelNet, Babelfy and Beyond! Roberto Navigli 22
Multilingual Joint Word SenseDisambiguation(MultiJEDI) Key Objective 2: use all languages to disambiguate one BabelNet, Babelfy and Beyond! Roberto Navigli
So what? BabelNet, Babelfy and Beyond! Roberto Navigli 24
BabelNet, Babelfy and Beyond! Roberto Navigli Step 1: Find all possible meanings of words Ambiguity! • “Thomas and Mario are strikers playing in Munich” Munich (City) Seth Thomas Mario (Character) striker (Sport) Mario (Album) Striker (Video Game) Thomas Müller FC Bayern Munich Mario Gómez Striker (Movie) Thomas (novel) Munich (Song) 20/06/2016 25
BabelNet, Babelfy and Beyond! Roberto Navigli Step 2: Connect all the candidate meanings • Thomasand Marioare strikersplaying in Munich 20/06/2016 26
BabelNet, Babelfy and Beyond! Roberto Navigli Step 3: Extract a dense subgraph • Thomas and Mario are strikers playing in Munich 20/06/2016 27
BabelNet, Babelfy and Beyond! Roberto Navigli Step 3: Extract a dense subgraph • Thomas and Mario are strikers playing in Munich 20/06/2016 28
BabelNet, Babelfy and Beyond! Roberto Navigli Step 4: Select the most reliable meanings • “Thomas and Mario are strikers playing in Munich” Munich (City) Seth Thomas Mario (Character) striker (Sport) Mario (Album) Striker (Video Game) Thomas Müller FC Bayern Munich Mario Gómez Striker (Movie) Thomas (novel) Munich (Song) 20/06/2016 29
Experimental Results: Fine-grained (Multilingual) Disambiguation SemEval-2007 task 17 SemEval-2013 task 12 Senseval-3 BabelNet, Babelfy and Beyond! Roberto Navigli 31
The Crazy Polyglot! Multilingual Web Access – WWW 2015 Roberto Navigli
Live demo – Crazy polyglot! EN In todayʼs knowledge and information society FR le paysage lexicographique est plus hétérogène que jamais. IT Possono le risorse stand-alone competere ES con múltiples funciones, portale lexicográficas multilingüe y servicios web, ZH Web服务,定 制 的 喜 好 和 个 人 用 户 的 个 人 资 料 ? BabelNet, Babelfy and Beyond! Roberto Navigli
1) Geographical named entities are interlinked • Each geographical entity comes with: • geolocation information • translations in dozens of languages • connections to other concepts and named entities (e.g. politicians, important places, concepts, events, etc.) BabelNet, Babelfy and Beyond! Roberto Navigli
2) Named entities, events and actions are expressed in any language • We can process tweets, facebook/instagram/blog posts and identify these entities and interconnect them independently of the language they are expressed in Οδεύουμε προς τη #Μακεδονία (We are moving to #Macedonia) Greek police started phase 2 of #Idomeni evacuation, emptying camp near Polykastro-1,828 people 2B moved إخلاء! حاصرت شرطة مكافحة الشغب محطة الغاز EKO! اللاجئين رافضا تركه! أي إشعار مسبق (EKO Evacuation! Riot police have surrounded EKO gas station! Refugees refusing to leave! No prior notice given) سيتم نقل سكانIdomeni إلى مخيمات جديدة، بما في ذلك في ثاني أكبر مدينة، سالونيك. (Idomeni residents will be moved to new camps, including in the second-largest city, Thessaloniki.) BabelNet, Babelfy and Beyond! Roberto Navigli
2) Named entities , events and actions are expressed in any language • We can process tweets, facebook/instagram/blog posts and identify these entities and interconnect them independently of the language they are expressed in Οδεύουμε προς τη #Μακεδονία (We are moving to #Macedonia) Greek police started phase 2 of #Idomeni evacuation, emptying camp near Polykastro-1,828 people 2B moved إخلاء! حاصرت شرطة مكافحة الشغب محطة الغاز EKO! اللاجئين رافضا تركه! أي إشعار مسبق (EKO Evacuation! Riot police have surrounded EKO gas station! Refugees refusing to leave! No prior notice given) سيتم نقل سكانIdomeni إلى مخيمات جديدة، بما في ذلك في ثاني أكبر مدينة، سالونيك. (Idomeni residents will be moved to new camps, including in the second-largest city, Thessaloniki.) BabelNet, Babelfy and Beyond! Roberto Navigli
3) Predicting where the migration flows are moving next • Intentions can be automatically identified and extracted from text • Including the next most popular actions and events (e.g. moving, evacuating, going back, etc.) • Integrated with GPS and satellite views of the places Recent achievements in multilingual NLP Roberto Navigli
Summarizing… + helping detect migration flows with our technologies BabelNet, Babelfy and Beyond! Roberto Navigli 38
Roberto Navigli Linguistic Computing Laboratory http://lcl.uniroma1.it @RNavigli