1 / 74

Keys for XML

Keys for XML. Peter Buneman, Susan Davidson, Wenfei Fan Carmem Hara , Wang-Chiew Tan University of Pennsylvania Temple University Universidade Federal do Parana, Brazil. Jonathan Mamou. Keys in DB design. Essential part of DB design

Download Presentation

Keys for XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Keys for XML Peter Buneman, Susan Davidson, Wenfei Fan Carmem Hara , Wang-Chiew Tan University of Pennsylvania Temple University Universidade Federal do Parana, Brazil Jonathan Mamou

  2. Keys in DB design Essential part of DB design • Invariant connection between the tuple and the real-world entity • Important in update • Guarantee that an update will affect precisely one tuple • … Keys for XML

  3. Keys in XML • XML documents are to do – at least - double duty as databases • Examination of existing DTDs reveals a number of cases in which some element or attribute is specified as a “unique identifier” in comments • Various key specifications in XML Standard, XML Data, XML Schema Keys for XML

  4. <db> <student> <name> Smith </name> <course> Math </course> <grade> B </grade> </student> <student> <name> Jones </name> <course> Math </course> <grade> A+ </grade> </student> <student> <name> Smith </name> <course> CS </course> <grade> A- </grade> </student> </db> Components: XML vs. relational DB Keys for XML

  5. DB If 2 tuples agree on their name and course attributes they agree everywhere XML If 2 elements agree on the name and course subelements then they are the same element Node identification? Equality? Components: XML vs. relational DB (cont’d) Keys for XML

  6. Nodes - Value Equality • namekey forperson nodes • name may have a complex structure: first-name, last-name db company government company university employee employee dept employee employee ... employee name @id name @id @id name firstName lastName “Bill Clinton” “Bill” “Clinton” Keys for XML

  7. Hierarchical structure • Hierarchically structured databases, e.g. scientific data formats • Top-level key to identify components of a document • Secondary key to identify sub-components • Book/chapter/section • Bible/book/chapter/verse Keys for XML

  8. Absolute and relative keys text “…” In an XML document, how to identify • A book? • a chapter? • a section? db book book book book chapter title chapter title chapter chapter “XML” number section section number section section “SGML” number number “1” number text number number number “1” “10” “6” “1” “5” “1” “10” “...” Keys for XML

  9. XML standard - ID attribute <!ATTLIST book title ID #required> <!ATTLIST chapter number ID #required> <!ATTLIST section number ID #required> • Internal “pointers” rather than keys • Scoping: ID attribute unique within the entire document rather than among a designated set of elements • can’t express relative keys, e.g., for chapters/sections. • Limit to using attributes rather than elements • unary: at most one ‘key’ can be defined, in terms of a single attribute • value equality: on text (string) • defined in a attribute type : keys must come with a DTD Keys for XML

  10. XML Data • Introduces a notion of keys explicitly <elementType id="booktable"> <element id="titleID" type="#title"> <element type="#author"> <element type="#pages"> <key id="bookkey"> <keyPart href="#titleID"/> </key> </elementType> • BUT • Can only be defined for element types rather than for certain collections of elements e.g. book, articles, … Keys for XML

  11. XPath • Possible to specify interesting fragments of a document • Syntax similar to navigating directories in a file system // arbitrary path . empty path / document root - path concatenator * any single node name Keys for XML

  12. XPath example • Select BBB elements which have any attribute<AAA>           <BBB id = "b1"/>           <BBB id = "b2"/>           <BBB name = "bbb"/>           <BBB/>      </AAA> • //BBB[@*] Keys for XML

  13. Xpath example (cont’d) <AAA> <BBB> </BBB> <XXX> <DDD><FFF> <GGG> </GGG>             </FFF>        </DDD>  </XXX> <CCC>    </CCC> </AAA> //GGG/ancestor::* Keys for XML

  14. XML-Schema <element name = “book”> <complexType> • <sequence> • <element name=“title” type=“string”/> • <element name=“chapters” max0occurs=“unbounded”> • <complexType> ... </complexType> • </element> • </sequence> </complexType> • <key name=“k” > • <selector xpath=“.”/> • <field xpath=“title”/> • </key> </element> Keys for XML

  15. XML Schema (cont’d) • Allow to specify keys in term of XPath expressions • BUT • XPath is a relatively complex language (move down, sideways, upwards, predicates and functions can be embedded) • Equivalence/containment of XPath expressions is unresolved  No efficient way to tell whether two keys are equivalent. • Value equality: restricted to text • Relative key not addressed • Structural requirement: key paths must exist and be unique. Keys for XML

  16. A new key constraint language for XML • Powerful enough to express absolute and relative keys • Simple enough to be reasoned about efficiently • Equivalence/containment • consistency (satisfiability) • implication (keys derived from others) • Capturing the semistructured nature of XML data: • independent of any types/schema • no structural requirements: tolerating missing/multiple key paths Keys for XML

  17. Outline • Node addresses – testing whether 2 nodes are the same node • Value equality – testing whether 2 nodes have the same value • Path expression language • Absolute key • Key Inference • Relative key • Strong key • Some issues Keys for XML

  18. Tree representation • DOM (Document Object Model) • Document is a hierarchical structure of nodes • Element nodes • Attribute nodes • Text nodes Keys for XML

  19. Tree representation (cont’d) <db> <composer> <name> J.S. Bach </name> <born> 1685 &</born> <work num="BWV82“> <title> Ich habe genug </title> </work> <work num="BWV552“> </work> </composer> <composer period="baroque“> <name> G.F. Handel </name> <work num="HWV19“> <title> Art Thou Troubled? </title> </work> </composer> </db< Keys for XML

  20. db 1 2 composer composer 1 4 2 3 1 2 name born work work name work 1 1 1 @num @periode 1 1 @num @num title title num num “J.S. Bach” “1685” 1 “BWV82” num “HWV19” 1 “G.F. Handel” “BWV552” periode “Iche abe genug” “Art Thou Troubled” “Baroque” Tree representation (cont’d) Keys for XML

  21. Tree representation (cont’d) • Attribute node: name+text, terminal • Text node: text, terminal • Element node: • name, may have children • Text and element children held in an array • Index in the array determined by the order of the subelement in the document • Attribute children held in a dictionary • Name of the attribute used as the index • Edge label uniquely identify children Keys for XML

  22. Node Address • A path of edge labels from the root uniquely identifies a node <l1#…#ln> • <1#2#1>, <1#3#@num> • An attribute node can only occur at the end of a node address • Order of attributes is unimportant • Order of subelements specified by their indexes • Address of a subnode relative to a node • Any subnode of a node with address <a> will have a node address of the form <a#b> where <b> is the address of the subnode relative to <a>. Keys for XML

  23. Value Equality • Value of a node • A set S of relative addresses of its subnodes • A partial function from S to names • A partial function from S to texts • 2 nodes are value-equalif they agree on 1, 2, 3 • Notation: a =v b Keys for XML

  24. Value Equality (example) S = {., <1>, <2>, <1,1>, <2,1>} db person ... person person person @pnone name name @phone 1 2 1 “234-5678” 2 “123-4567” lastName firstName lastName firstName 1 1 1 1 “George” “George” “Bush” “Bush” Keys for XML

  25. Path expressions • How to identify nodes in a tree? • Expression involving node names (tags + attributes) that describes a set of paths in the document tree • XPath (XML-Schema) • Regular expressions (semistructured data) Keys for XML

  26. db dept depts emps mgr emp emp name name name Regular Path Expressions In the normal syntax of regular expressions: db.emps.emp db.(depts.dept.mgr |emps.emp) db._*.name “Mary” “Bill” “John” Keys for XML

  27. Language for path expression • 2 necessary properties • Concatenation operation, not uniform presentation in XPath • Concatenate a/b with /c/d : a/b//c/d • A path should only move down the tree • Navigation axis in XPath Keys for XML

  28. Language for path expression • Empty path “ε” (“.”) • Node name (tag/attribute name) • Wild card “_”, single node name (“*”) • Arbitrary path “_*” (“//”) • Concatenation of paths P, Q is P.Q (“/”) • Notation • n[P]: set of nodes (node addresses) reached by starting at node n and following a path that conforms to P • [P] := root[P] Keys for XML

  29. Examples • Simple path • <2#2>[title] = {<2#2#1>} • [composer.work] = {<1#3>, <1#4>, <2#2>} • Complex path • <2#2>[_*] = {<2#2>, <2#2#1>, <2#2#1#1>, <2#2#@num>} • [composer._] = {<1#1>, <1#2>, <1#3>, <1#4>, <2#1>, <2#2>} • [_*.num] = {<1#3#@num>, <1#4#@num>, <2#2#@num>} Keys for XML

  30. Absolute key

  31. Key specification Necessary to specify • Set on which we are defining the key (relation) • “Attributes” (set of column names) • Pair (Q, {P1, …, Pn}) • Target path Q path expression: target set on which the key constraint is to hold • Key path {P1, …, Pn} set of simple path expressions Keys for XML

  32. Key specification (cont’d) • Target path Q • Key path {P1, …, Pn} • For any node n in [Q], there is a set of nodes n[Pi] found by following Pi from n (may be empty) • Examples • (person.employees, {name.firstname, name.lastname}) • (composer, {name}) • (composer, {born}) Keys for XML

  33. Formal Definition A node nsatisfies a key specification (Q,{P1,... , Pk}) iff for any n1, n2 in n[Q], if for all i, 1<= i <= k , there exist z1inn1[Pi] and z2 in n2[Pi] such that z1 =vz2 then n1 = n2. • Value equalityz1 =vz2 • Node equality: 2 nodes are equal if they have the same node address n1 = n2 • The values associated with key paths uniquely identify a node in the target set • Not part of the schema, data Keys for XML

  34. Remarks • For any n1, n2 in [Q], if Pi is missing at either n1 or n2 then n1[Pi] and n2[Pi] are by definition disjoint • Multiple nodes <db> <A> <B> 1 </B> </A> <A> <B> 1 </B> <B> 2 </B> </A> </db> Key (A, {B}) with respect to the root. The document does not satisfy the key. Keys for XML

  35. Example of keys • (_*.person, {id}) • 2 persons elements are disjoint on their id fields • (person, {ε}) • Any 2 person nodes immediately under the root have different values • (employee, {}) • Empty key. There is at most one employee under the root • (_*, {id}) • Any 2 nodes are disjoint on their id fields up to value-equality • Semantics of ID attribute in the XML standard Keys for XML

  36. XML, paths that define keys Need not exist (null-valued keys) Do not have to be unique Key paths specify a set of addresses within a document Relational DB Key values cannot be null, must exist Have to be unique 1NF requires each component of every tuple to be atomic value, not set XML vs. relational Keys for XML

  37. Remarks • Equivalence of 2 path expressions is decidable • Given a definition of equality on tree, do we need to have more than one key path in a key specification? • All key attributes must be represented as subnodes of some node • Constrain this node to contain only those subnodes • Too restrictive, unnecessary interference between key specifications and data models • Allow a (possible empty) set of nodes at the end of each key path • How to require each of the key paths to exist and to be unique? Keys for XML

  38. Remarks (cont’d) • Language of path expression • Need something more powerful to express Q (person.(mother | father)*, {id}) A person element followed by zero or more father or mother elements • Provisional language of path expressions • Does not change in the way of the theory Keys for XML

  39. Key inference • In relational DB • Infer some keys from the presence of others • If (Q, S) is a key and S  S’, then so is (Q, S’) • Counterpart of relational inference rule • If (Q.Q’, {P}) is a key, then so is (Q, {Q’.P}) • tree-like structure : if a node is identified in a tree then its ancestor are also determined I.e. if a key path P uniquely identifies a node n in [Q.Q’] then Q’.P is a key path for the ancestor of n in [Q]. Keys for XML

  40. Key Inference (cont’d) • If (Q,S) is a key and Q’ Q,then (Q’, S) is also a key • Any key of the set [Q] is also a key for any subset of [Q] • For any finite set Σ of keys, there exists an (finite) XML document satisfying Σ • Key paths may be missing, e.g. (_*,{id}) • If key path was required to exist at all nodes specified by the target path, the XML document would have to be infinite to satisfy the key • Only holds in the absence of DTDs Keys for XML

  41. X X X Key Inference • Key K = (X, {}) • DTD D: <!ELEMENT foo (X, X)> foo foo • No XML document that both conforms to D and satisfies K • DTDs interact with XML key constraint Keys for XML

  42. Relative Key

  43. Relative key - Motivation • Motivated by scientific data format, hierarchical structure, large set of entries at the top-level • Protein sequence database Swiss-prot • Accession number (key) for each entry • Within each entry, sequence of citations each identified by a number 1, 2, 3, … • Linguistic database – recording of speech • Data sets held in files • Metadata provided by directory structure • /timit/train/dr1/fcjjf0/sa1.wav • TIMIT corpus, training set, dialect region 1, female speaker, speaker-ID "cjf0", sentence text "sa1", speech waveform file Keys for XML

  44. An absolute key for books db book book book book chapter title chapter title chapter chapter “XML” section number section number section section “SGML” number number “1” number text number number number “1” “10” text “6” “…” “1” “5” “...” “10” “1” An absolute key to identify a book: (book, {title} ) • target path: book, starting from the root and identifying a collection of books • key path: title; its value uniquely identifies a book absolute: defined on the entire document Keys for XML

  45. Relative key - definition • Like the key of a weak entity set in DB Studios(name, address) Crews(number) A document satisfies a relative key specification (Q, (Q’,S)) iff for all nodes n in [Q], n satisfies the key (Q’,S). • Absolute keys are a special case of relative keys • (Q’,S) equivalent to (ε, (Q’,S)) Keys for XML

  46. A relative key for chapters db book book book book chapter title chapter title chapter chapter “XML” section number section number section section “SGML” number number “1” number text number number number “1” “10” text “6” “…” “1” “5” “...” “10” “1” A relative key: (book, (chapter, {number} ) ) A chapter number uniquely identifies a chapter within a book! • Context path: book • target path: chapter, starting at a book • key path: number relative: defined on sub-documents, relative to the context Keys for XML

  47. db book book book book chapter title chapter title chapter chapter “XML” section number section number section section “SGML” number number “1” number text number number number “1” “10” text “6” “…” “1” “5” “...” “10” “1” Absolute/Relative Key • What is the difference between • Absolute key (book.chapter, {number}) • Relative key(book, (chapter, {number} ) ) Keys for XML

  48. A relative key for sections db book book book book chapter title chapter title chapter chapter “XML” section section section section number number “SGML” number number text number “1” text number number number “1” “10” “6” “…” “1” “5” “...” “10” “1” Key: (book.chapter, (section, {number} ) ) A section number uniquely identifies a section within a particular chapter of a particular book! relative to the chapter containing the section, and to the book containing the chapter Keys for XML

  49. Transitivity of relative keys • A relative key such as (bible.book.chapter,(verse, {number})) does not uniquely identify a particular verse in the bible • Book name, chapter number, verse number  verse Keys for XML

  50. “immediately precedes” relation (Q1, (Q’1,S1)) immediately precedes (Q2, (Q’2,S2)) if Q2 = Q1.Q’1 • (bible, (book,{name})) immediately precedes (bible.book, (chapter,{number})) • Any absolute key immediately precedes itself Keys for XML

More Related