1 / 35

Web Semantics: KB vs. DB

Web Semantics: KB vs. DB. Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005. Administrivia. Next readings and summaries: Bernstein on Model Management Dong and Halevy on Personal Info Management

kaethe
Download Presentation

Web Semantics: KB vs. DB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005

  2. Administrivia • Next readings and summaries: • Bernstein on Model Management • Dong and Halevy on Personal Info Management • 2 paragraph summary of the problems they focus on, key contributions

  3. Today’s Trivia Question

  4. Last Time… • The Semantic Web vision and goals • Core ideas: • RDF as “semantic” format • Also RDFS schema format • Ontologies as the standard way of defining concepts • Description logics are the way most ontologies are defined (OWL language)

  5. Description Logics(Borgida survey) • A class of languages based on FOL, like Datalog, Prolog • Key questions: subsumption of classes, recognition of members of classes • Prolog allows us to reason about instances: • ParentOf(liz,andy). Male(andy). • Child(_x) :- ParentOf(_z, _x) • Son(_y) :- Male(_y), ParentOf(_w, _y) • DLs allow us to make further inferences – that andy is a Child, i.e., they realize: • Child(x)  (9 z) ParentOf(z,x) • Son(y)  (9 w) Male(y) Æ ParentOf(w,y)

  6. Syntax and Semantics • Build variable-free composite terms from atoms using term constructors (e.g., at-most, all) • COURSE and at-most(10, takers) and all (takers, GRADS) • (:and COURSE (:at-most 10 takers) (:all takers GRADS) • COURSE \· 10 takers \8 takers:GRADS • Can be expressed in FOPC: • COURSE(a) Æ (9 x1 … x10) takers(a,x1) Æ … Æ takers(a, x10) Æ (x1 ≠ x2Æ x2 ≠ x3Æ … Æ x9 ≠ x10) Æ takers µ GRADS

  7. Questions for DLs • Is a description D consistent and coherent? • Not if the instance is empty for every possible relational structure • Are D and D’ mutually disjoint? • Yes if DI[ D’I = ; for every I • Are D and D’ equivalent? • Yes if DI = D’I for every I • Does D subsume some other description D’? • Yes if for every relational structure I, DIsubsumes D’I • Inconsistency: and(C,D)  NOTHING • Equivalence: D subsumes D’, D’ subsumes D

  8. DL Example • class STUDENT is-a PERSON with • studNumber: int, key; level: {1,2,3,4} • and(PERSON, all(studNumber, INTEGER), at-least(1,studNumber),at-most(1,studNumber), all(level, one-of(1,2,3,4)), at-least(1,level),at-most(1,level) • at-most(1, compose(studNumber, inverse(studNumber)) • ENROLLMENT := and( all(st,STUDENT) at-least(1,st) at-most(1,st) all(crs,COURSE) at-least(1,crs) at-most(1,crs) all(when,DATE) at-least(1,when) at-most(1,when)) • STUDENT := and( all(inverse(st), ENROLLMENT) at-least(1, inverse(st)) at-most(6, inverse(st)) • COURSE := and( all(inverse(crs), ENROLLMENT) at-least(1, inverse(crs)) at-most(300,inverse(crs))) • INSERT-IN(Cs431, COURSE). FILL-WITH(Cs431,taughtBy,Einstein). FILL-WITH(Cs431,takers,Anna)

  9. More on DLs • We can have both primitive classes (equivalent to extensional relations) and virtual ones • But we can make assertions over virtual classes that directly impact the primitive ones • Contrast with updates to views in databases • Many different levels of expressiveness in different DLs • Comparison with Datalog: • Both are subsets of FOL, with some limitations • DLs allow bidirectional inference; Datalog is unidirectional • DLs are equivalent to at most FOL with <= 3 variables; Datalog has an unbounded number of existential variables

  10. Coming Back to the SW • Lots of work on OWL, the Web Ontology Language • Based on different levels of DLs: • OWL Lite – classification hierarchy, simple constraints (cardinalities 0 or 1) • OWL DL – maximum expressiveness, computational completeness (always decidable and terminating) • OWL Full – no computational guarantees, allows classes as instances of other classes • Goal: each community builds an ontology • But how to relate ontologies? • “equivalentClass”, “equivalentProperty”, “sameAs” • Is this enough???

  11. The Data Management Argument • The Semantic Web is all about integration and translation • But there’s no notion of translation in the SW, except for equivalences • “Semantic normalization”??? • Does DB research have something to add? If so, what needs to change?

  12. Database Approaches to Semantic Integration Data warehouse • Design a single schema Do physical DB design • Map data into warehouse schema • Periodically update warehouse Virtual data integration (EII) • Design mediated schema • Map sources to mediated schema • Queries are rewritten and answered on demand from sources

  13. A Single Centralized Schema is a Bottleneck! Challenging to form a single schema for all domain data • People don’t agree on how concepts should be represented • Data warehouse: physical design is a strong consideration • Mediated schema very different from original users’ schemas Mappings may be challenging to create, and do not leverage work of previous source mappings Each source gets mapped to mediated schema separately Difficult to evolve this single schema as needs change • May “break” existing queries • Must build consensus for any schema changes

  14. Peer Data Management: Decentralized Mediation for Ad Hoc Extensibility DB Projects Stanford IIT Mumbai UPenn UW Data integration: 1 mediated schema, m mappings to sources Peer data management system (PDMS): • n mediated “peer schemas,” as few as (n - 1) mappings between them – evaluated transitively • m mappings to sources

  15. Peer-to-Peer at both Logical and Architectural Levels A “logical” peer-to-peer model: Every participant can contribute: • Extensional data • Mappings between schemas • Computation (query answering) and caching Can we do a database (say, XML) version of the SW?

  16. RDF vs. XML • RDF explicitly names relationships: (book, title, “ABC”) (book, writtenBy, author) (author, name, “John Smith”) • XML does not always: • <book> <title>ABC</title> <writtenBy> <author><name>John Smith</name></author> </writtenBy></book> • <book> <title>ABC</title> <author>John Smith</author></book> writtenBy author book title name

  17. RDF vs. XML 2 • RDF is subject-neutral (a graph) • XML centers around a subject (a tree): • <book> <title>ABC</title> <author>John Smith</author></book> • <author> <name>John Smith</name> <book>ABC</book></book> • This may result in duplication of contained objects

  18. An XML Version of the Semantic Web Data model: XML + Schema • Vast volumes of data already in XML (or exported as XML) • CAVEAT: not all relationships are labeled in XML (“XML has no semantics.”) Concepts: Views≈ classes; schemas ≈ ontologies • Views define membership via queries; can reason about containment • CAVEAT: less expressive than OWL classes Schema mappings: target schema as query over source Sophisticated reasoning about mappings is possible by extending existing data integration techniques • Can use mappings in in “forward” and “reverse” directions • Allows for “chaining” of mappings to answer queries

  19. Let’s Start with the Relational Model and then Extend GAV: mediated relations as views over sources • Easy to rewrite queries: unfold them using view definitions LAV: sources as views over mediated relations • More challenging to rewrite queries: answering queries using views (e.g., MiniCon [Pottinger & Levy 00]) • More flexible in representing source properties Med. Schema T1, … … S1(X) S2(Y) MST1(X’) :- S1(X),… MST2(Y’) :- S2(Y),… Med. Schema T1, … … S2(Y) S1(X) S1(X’)  MST1(X),… S2(Y’)  MST1(Y),…

  20. Answering Queries in a PDMS:Transitively Evaluating Mappings Start with schema being queried • Look up mappings to neighbors; expand • Continue iteratively until queries only over sources Mappings in a PDMS may be a combination of LAV, GAV techniques: • General form p1a(X, Y), p1b(Y,Z), … = p2a(Y, X), p2b(X, Z), …(see paper for examination of what is actually tractable) • Requires unfolding and AQUV We use a rule-goal “tree” to expand the mappings • Extend some of the ideas of MiniCon to avoid unnecessary expansions • Challenges to avoid redundancy – see paper for optimizations

  21. Example of Query Answering Query: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w) r1 r0 Author (a,w) SameProject (a1,a2,p) ProjMember (a1,p) CoAuthor (a1,a2) Sched(f,s,e) r3 r2 S2 S1 Mappings between peers’ schemas: r0: SameProject(a1,a2,p) :- ProjMember(a1,p), ProjMember(a2,p) r1: CoAuthor(a1,a2)  Author(a1,w), Author(a2,w) Mappings to data sources: r2: S1(a,p,s)  ProjMember(a,p), Sched(f,s,end) r3: CoAuthor(f1,f2) :- S2(f1,f2)

  22. Example Rule-Goal Tree Expansion q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w) q

  23. Example Rule-Goal Tree Expansion q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w) q SameProject(a1,a2,p) Author(a1,w) Author(a2,w)

  24. Mappings between peers’ schemas: r0: SameProject(a1,a2,p) :- ProjMember(a1,p), ProjMember(a2,p) r1: CoAuthor(a1,a2)  Author(a1,w), Author(a2,w) Example Rule-Goal Tree Expansion q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w) q SameProject(a1,a2,p) Author(a1,w) Author(a2,w)

  25. Mappings between peers’ schemas: r0: SameProject(a1,a2,p) :- ProjMember(a1,p), ProjMember(a2,p) r1: CoAuthor(a1,a2)  Author(a1,w), Author(a2,w) Example Rule-Goal Tree Expansion q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w) q SameProject(a1,a2,p) Author(a1,w) Author(a2,w) r0 r1 r1

  26. Mappings to data sources: r2: S1(a,p,s)  ProjMember(a,p), Sched(a,s,end) r3: CoAuthor(f1,f2) = S2(f1,f2) Example Rule-Goal Tree Expansion q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w) q SameProject(a1,a2,p) Author(a1,w) Author(a2,w) r0 r1 r1 ProjMember(a1,p) ProjMember(a2,p) CoAuthor(a1,a2) CoAuthor(a2,a1)

  27. Mappings to data sources: r2: S1(a,p,s)  ProjMember(a,p), Sched(a,s,end) r3: CoAuthor(f1,f2) = S2(f1,f2) Example Rule-Goal Tree Expansion q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w) q SameProject(a1,a2,p) Author(a1,w) Author(a2,w) r0 r1 r1 ProjMember(a1,p) ProjMember(a2,p) CoAuthor(a1,a2) CoAuthor(a2,a1) r3 r3 r2 r2

  28. Example Rule-Goal Tree Expansion q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w) q SameProject(a1,a2,p) Author(a1,w) Author(a2,w) r0 r1 r1 ProjMember(a1,p) ProjMember(a2,p) CoAuthor(a1,a2) CoAuthor(a2,a1) r3 r3 r2 r2 S1(a1,p,_) S1(a2,p,_) S2(a2,a1) S2(a1,a2)

  29. Example Rule-Goal Tree Expansion q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w) q SameProject(a1,a2,p) Author(a1,w) Author(a2,w) r0 r1 r1 ProjMember(a1,p) ProjMember(a2,p) CoAuthor(a1,a2) CoAuthor(a2,a1) r3 r3 r2 r2 S1(a1,p,_) S1(a2,p,_) S2(a2,a1) S2(a1,a2) Q’(a1,a2) :- S1(a1,p,_), S1(a2,p,_), S2(a1,a2)  S1(a1,p,_), S1(a2,p,_), S2(a2,a1)

  30. Stepping up to XML (WWW03) Goals: • Build on XQuery and XML (extended with RDF-style identity, following lead of [Patel-Schneider & Simeon 02]) • Remain computationally inexpensive • Capture the common mapping types Directional mapping language based on templates <output> {: $var IN document(“doc”)/path WHERE condition :} <tag>$var</tag></output> • Translates between parts of data instances • Restricted subset of XQuery that’s decidable to reason about • Supports special annotations and object fusion Can map XML-XML, XML-RDF, RDF-XML (at data level)

  31. Mapping Example between XML Schemas Source: authors author* full-name publication* title pub-type Target: pubs book* title author* name writtenBy author publication title pub-type name

  32. Example Piazza Mapping <pubs> <book piazza:id={$t}>{:$aIN document(“…”)/authors/author,$anIN$a/full-name,$tIN$a/publication/title,$typIN$a/publication/pub-typeWHERE$typ = “book”PROPERTY$t >= ‘A’ AND$t < ‘B’ :}<title>{$t}</title> <author><name>{$an}</name></author> </book></pubs>

  33. Challenges • Query reformulation for XML is significantly harder • Hierarchy, 1:n schema constraints, ability to map from values to tags, … • Can only do ~ the XML equivalent of conjunctive queries • See the WWW03 paper (plus later work by Yu and Popa, Deutsch et al., many others) for details

  34. What about Values? • Thus far, we’ve focused on schema mappings • Almost as important in the real world: mappings of values to values • Proteins to binding sites • SSNs to customer IDs • etc. • The Hyperion system (KAM 03) focuses on computing transitive relationships between mappings • In many cases, we only have partial transitive mappings • Key idea: divide all of the mappings into partitions, each of which can compute transitive closures separately

  35. Assessment: The Semantic Web • The KB world focuses on expressively capturing concepts • The DB world focuses on integrating and restructuring data (but views are less expressive in certain ways) • Do either of these seem likely to change the world? • What barriers need to be removed?

More Related