1 / 26

Presented By Asmita Rahman

Overview:. The YAGO-NAGA Approach to Knowledge DiscoveryIntroductionSystem ArchitectureYago Core ExtractorsYAGO consistency checkersGrowing YAGOQuerying Yago by NAGAMING: Mining Informative Entity Relationship SubgraphsIntroductionER-Based InformativesUser Study. Introduction:. Universal

dori
Download Presentation

Presented By Asmita Rahman

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Presented By Asmita Rahman ?         The YAGO-NAGA Approach to Knowledge Discovery Gjergji Kasneci, Maya Ramanath, Fabian Suchanek, Gerhard Weikum Max Planck Institute for Informatics D-66123 Saarbruecken, Germany   MING: Mining Informative Entity Relationship Subgraphs   Gjergji Kasneci Shady Elbassuoni Gerhard Weikum

    2. Overview: The YAGO-NAGA Approach to Knowledge Discovery Introduction System Architecture Yago Core Extractors YAGO consistency checkers Growing YAGO Querying Yago by NAGA MING: Mining Informative Entity Relationship Subgraphs Introduction ER-Based Informatives User Study

    3. Introduction: Universal, comprehensive knowledge bases have been an elusive AI goal for many years. Ontologies and thesauri such as OpenCyc, SUMO, WordNet, or UMLS (for the biomedical domain) are achievements along this route. But they are typically focused on intentional knowledge about semantic classes and disregard the extensional knowledge For Example: They would know that mathematicians are scientists, that scientists are humans (and mammals and vertebrates, etc.); and they may also know that humans are either male or female, cannot y (without tools) but can compose and play music, and so on.   However, none of the above mentioned ontologies knows more than a handful of concrete mathematicians (or famous biologists etc.).  

    4. A comprehensive knowledge should come with logical reasoning capabilities and rich support for querying. Potential applications include but would not be limited to: A machine-readable, formalized encyclopedia that can be queried with high precision like a semantic database; An enabler for semantic search on the Web A backbone for natural-language question answering A key asset for machine translation (e.g., English to German) and interpretation of spoken dialogs. A catalyst for acquisition of further knowledge and largely automated maintenance and growth of the knowledge base.

    5. Example:   Q1: Which Grammy winners were born in Europe? Q2: Which French politicians are married to singers? Q3: Which Nobel prize winners had an academic advisor who graduated from the same university? Q4: Give me a comprehensive list of HIV drugs that inhibit proteases This paper gives an overview of the YAGO-NAGA approach to automatically building and maintaining a conveniently searchable, large and highly accurate knowledge base, by applying information-extraction (IE) methods to Wikipedia and other sources of latent knowledge.

    6. Current YAGO knowledgebase:

    7. System Architecture:

    8. YAGO Core Extractors: Wikipedia Infoboxes: For example, the infobox for Nicolas Sarkozy gives us data such as birth date = 28 January 1955, birth place = Paris, occupation = lawyer, and alma mater = University of Paris X: Nanterre. YAGO uses a suite of rules for frequently used info box attributes to extract and normalize the corresponding values. Wikipedia Categories: Wikipedia Categories. As for the category system,the Wikipedia community has manually placed (the ar-ticle about) Nicolas Sarkozy into categories such as:Presidents of France, Legion d’honneur recipients, orAlumni of Sciences Po (the Paris Institute of PoliticalStudies). These give YAGO clues about instanceOf re-lations, and we can infer that the entity Nicolas Sarkozyis an instance of the classes PresidentsOfFrance, Le-gionD’HonneurRecipients, andAlumniOfSciencesPo.

    9. Ongoing Work: Temporal Validity. Temporal annotations for facts Example: 2007: Jacques Chirac presidentOf France, Currently: Nicolas Sarkozy presidentOf France. Both of the above facts have identifiers, say Id1 (for the fact about Chirac) and Id2 (for the fact about Sarkozy). Then they create additional facts like Id1 ValidSince 17 May 1995, Id1 ValidUntil 16 May 2007, and Id2 ValidSince 17 May 2007.

    10. Growing YAGO Maintained by periodically re-running the extractors on Wikipedia and WordNet. For adding other Natural Language text sources they have a tool called LEILA It uses a dependency-grammar parser for deep parsing of natural-language sentences, with heuristics for anaphora resolution (e.g., pronouns referring to subjects or objects in a preceding sentence). This produces a tagged graph representation, whose properties can be encoded as features for a statistical learner (e.g., an SVM) that classifies fact candidates into acceptable facts vs. false hypotheses.

    11. QUERYING YAGO BY NAGA For querying the YAGO knowledge base, we have designed a query language that builds on the concepts of SPARQL (the W3C standard for querying RDF data), but extends these capabilities by more expressive pattern matching. NAGA implements this query language and provides a statistical ranking model for query results. A query is a conjunction of fact templates, where each template would have to be matched by an edge and its incident nodes in the knowledge graph.

    12. Example:  Q1: Which Grammy winners were born in Europe? Q2: Which French politicans are married to singers? can be expressed as follows: Q1: $x hasWonPrize GrammyAward, $x bornIn $y, $y locatedIn Europe Q2: $x isa politician, $x citizenOf France, $x marriedTo $y, $y isa singer where $x and $y are variables for which we are seeking bindings so that all query patterns are matched together.

    13. The relation names in the query can also be regular expressions, which have to be matched by an entire path in the knowledge graph. the bornIn relation actually refers to cities and the locatedIn relation captures a city-county-state-country hierarchy, we should replace the last condition in Q1 by the fact template $y (locatedIn)* Europe. And if we do not care whether the persons that we are looking for are born in Europe or are citizens of a European country, we may use the template $y (citizenOf j bornIn j originatesFrom).(locatedIn)* Europe instead of the last two conditions of Q1. ($y locatedIn Europe)

    14. NAGA has further advanced features, most notably, for specifying relatedness queries among a set of entities For example, the query: connect (Nicolas Sarkozy, Zinedine Zidane, Gerard Depardieu, Miles Davis) asks for commonalities or other relationships among Sarkozy, the soccer player Zidane, the actor Depardieu, and the trumpet player Miles Davis. A possible answer (technically, a Steiner tree in the underlying knowledge graph) could be that all four are recipients of the French Legion d’honneur order.

    15. RANKING: NAGA employs a novel kind of statistical language model. Capturing the informativeness of a query result User prefer salient facts or interesting facts confidence that the result facts are indeed correct IE methods assign a confidence weight to each fact f in the knowledge base

    16. For informativeness, NAGA employs an LM for graph-structured data. Conceptually, we construct a statistical model for each possible result graph g with connected edges (facts) gi, and consider the probability that the query q, consisting of fact templates qi, was generated from g:

    17. Personalization Ongoing Work: Personalization. The notion of in-formativeness is, strictly speaking, a subjective mea-sure: an individual user wants to see a salient result that is also interesting to her. An elegant property of the LM approach pur-sued in NAGA is that we can easily compose multiple LM’s using a probabilistic mixture model For the personalized LM, we monitor the history of queries and browsing interactions on the online knowledge base. A click on a fact is interpreted as positive feedback that the fact is interesting to the user, and this vidence is spread to the graph neighborhood, with ex-ponential decay and attention to the edge types alongwhich propagation is meaningful

    18. As an example,assume that a user has intensively explored epic movies and orchestral music, and then poses query Q1. Thepersonalized ranking would prioritize European film-music composers such as Ennio Morricone, Hans Zim-mer, or Javier Navarrete.

    19. MING: Mining Informative Entity Relationship Subgraphs Many modern applications exploit information organized in entity-relationship (ER) graphs, such as domain-specific knowledge bases (e.g. metabolic or regulatory networks in biology, criminalistic networks for crime investigation,etc.) or social networks (such as data sharing or business-customer networks). Examples for ER graphs are GeneOntology or UMLS (in the biomedical domain), the graphs represented by IMDB (in the domain of movies and actors) A knowledge discovery task on such graphs is to determine an“informative”subgraph that can explain the relations between k(= 2) entities of interest. MING, a principled method for extracting an informative subgraph forgiven query nodes.

    20. Example, consider the query that asksfor the relation between Max Planck, Albert Einstein, andNiels Bohr.

    21. Two Sub-problems: What is a good measure for representing the informative-ness of relations between entities in ER graphs? How to determine the most informative subgraph for the given query nodes?

    22. ER Based Informativeness We believe that in order to compute the informativeness of nodes in ER graphs, the link structure has to be taken into account. But, as a matter of fact, edge directions in ER graphs do not always reflect a “clear” endorsement. For example, the fact Albert Einstein isA Physicist can be represented as Physicist hasInstance Albert Einstein. Our informativeness measure for nodes overcomes these problems by building on edge weights that are based on co-occurrence statistics for entities and relationships.

    23. ER Based Informativeness Statistics-based Edge Weights For each fact represented by an edge, we compute two weights; one for each direction of the edge. Each of these weights will represent a special kind of endorsement, obtained from co-occurrence statistics for entities and relation-ships. Consider the fact pattern x isA Physicist, x ? X. Thefacts Albert Einstein isA Physicist and Bob Unknown isAPhysicist are matches to the above fact pattern. In our ex-ample, the fact Albert Einstein isA Physicist should havea higher informativeness than Bob Unknown isA Physicist,since Einstein is an important individual among the scien-tists.

    24. Let (a, ß, ?) be a fact pattern, where a ? X. Let a be a binding of a. We estimate the informativeness of a given the relationship ß and the entity ? as:

    25. IRank for Node-based Informativeness Our aim is an informativeness measure for nodes based on random walks on the – now weighted – ER graph. Our measure, coined IRank (Informativeness Rank), is related to PageRank

    26. User Study: Setting The focus of our evaluation has been on the user perceived quality of MING’s answers. Therefore, in a userevaluation, we compared the answers of MING to those re-turned by CEPS. For each of the 60 queries, we presentedthe results produced by CEPS and MING (on the same sub-graph C) to human judges (not familiar with the project)on a graph-visualization Web interface, without telling themwhich method produced which graph.

More Related