1 / 43

Representing and Querying XML with Incomplete Information

Representing and Querying XML with Incomplete Information. Serge Abiteboul INRIA. Luc Segoufin INRIA. Victor Vianu UCSD. Organization. Motivations Simplifying assumptions Model of incompleteness Answering queries Results Discussion Conclusion. Motivations.

tuan
Download Presentation

Representing and Querying XML with Incomplete Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Representing and QueryingXMLwith Incomplete Information Serge Abiteboul INRIA Luc Segoufin INRIA Victor Vianu UCSD

  2. Organization • Motivations • Simplifying assumptions • Model of incompleteness • Answering queries • Results • Discussion • Conclusion Abiteboul-Segoufin-Vianu

  3. Motivations

  4. The Web is a world of incompleteness • Information you get from the web is seldom complete: • Queries return you some - not all - data • Limited storage capability • Documents change on the Web: expiration • Sites are unavailable… • Context: A warehouse of XML documents from the Web, Xyleme Abiteboul-Segoufin-Vianu

  5. This work • This work: simple, practically appealing approach to managing incomplete information • Sequence of queries to the web • (q1,A1)+(q2,A2)+… • Answers are cached • Process a new query without access to the web • Give an incomplete answer • Explain incompleteness to user • Seek additional information, i.e., find minimal set of queries to fully answer Abiteboul-Segoufin-Vianu

  6. Related works • Semantic caching • Answering queries using views • keep (Qi,Ai) • try to rewrite query Q into Q’(A1,...,An) • reject if you cannot • Incomplete database • (Qi,Ai) is some incomplete knowledge of DB • Related to querying incomplete information – e.g. Lipski-Imielinski Abiteboul-Segoufin-Vianu

  7. Challenge: balance expressiveness and tractability • Choice of data model • Choice of the query language • Choice of a representation of incompleteness • Results • Simple, practical solution • Extra features lead to serious problems Abiteboul-Segoufin-Vianu

  8. Simplifying Assumptions

  9. Data is XML: trees <dealer> <UsedCars> <ad> <model>Honda</model> <year>96</year> </ad> </UsedCars> <NewCars> <ad> <model>Acura</model> </ad> </NewCars> </dealer> dealer UsedCars NewCars ad ad model year model Honda 96 Acura Abiteboul-Segoufin-Vianu

  10. unordered trees catalog labelling function value function product product =c.jpg name price category name price cat picture =nik =234 =electronic =can =444 =electronique subcategory subcategory =camera =camera Simplified XML Abiteboul-Segoufin-Vianu

  11. Simple XML types catalog 1 : 1 child (default) * : 0 or more + : 1 or more ? : 0 or 1 * product * name price cat picture subcategory Abiteboul-Segoufin-Vianu

  12. Prefix Selection Queries (ps-queries) catalog catalog Query1 Query2 product product name price cat=elec name picture <200 subcategory Abiteboul-Segoufin-Vianu

  13. Data No order No distinction attribute/element No recursion No links Query No complex path expressions No join No repeated child Simplifications product name cat=elec cat=toy Abiteboul-Segoufin-Vianu NO

  14. prod &245 prod &245 &245 prod + = c.jpg canon 120 elec canon 120 elec c.jpg camera camera Crucial assumption: XID • URLs • ID/IDrefs Abiteboul-Segoufin-Vianu

  15. Representation of incomplete information:Incomplete trees

  16. Set of rules: e  r e element name r regular expression Set of trees satisfying a DTD d: tree(d) Shortcoming of DTDs An element has a single definition independently of the context Type of ad depends on the context Document Type Definition (DTD) are used to represent incompleteness dealer usedcar newxar ad ad model year model Abiteboul-Segoufin-Vianu

  17. adused and adnew h(adused)=h(adnew )=ad Solution: specialization (decoupled tags) dealer dealer usedcar newxar usedcar newxar h adused adnew ad ad model year model model year model Abiteboul-Segoufin-Vianu

  18. DTDs + Specialization The sets of trees that can be specified: the regular unranked tree languages [Bruggeman—Klein+Murata+Wood] • Same closure properties: intersection, union, complement • Same complexity Abiteboul-Segoufin-Vianu

  19. Example Q1: name, subcat, price of electronic products with price less than $200 Q2: name, pictures of cameras at least pictured once ---------------------------- Q3: name, price, pictures of cameras costing less than $100 and at least pictured once can be completely answered using A1, A2 Q4: list all cameras can be partially answered using A1, A2 Abiteboul-Segoufin-Vianu

  20. * product product product * product1 product2 canon 120 elec nikon 199 elec sony 175 elec camera camera cdplayer catalog missing Q1: name, subcat, price of electronic products with price less than 200 Abiteboul-Segoufin-Vianu

  21. Missing data after Q1 product1 product2 * * name price cat picture name price cat picture =elec >200 !=elec subcategory subcategory Abiteboul-Segoufin-Vianu

  22. product1 * 3 3 c.jpg akai a.jpg elec camera catalog product2 * product2b * product2c missing product product product product2a canon 120 elec nikon 199 elec sony 175 elec camera camera cdplayer Q2: name, pictures of cameras at least pictured once Abiteboul-Segoufin-Vianu

  23. Incomplete information • Known information • Prefix of the real data tree • Missing information • Extended tree type • Conditions on data values • Specializations, disjunctions Abiteboul-Segoufin-Vianu

  24. product + product2a Missing data name pricecat picture =elec product1 >200 * subcategory no picture name price cat picture product3 !=elec no picture subcategory name price cat product2c elec product2b subcategory * namepricecat !=camera =elec >200 namepricecatpicture =elec >200 Known data subcategory subcategory Abiteboul-Segoufin-Vianu !=camera

  25. Answering Queries

  26. Complete answer to Q3 • Q3: name, price, pictures of cameras costing less than $150 and having at least one picture • Can be fully answered using available information • Need to check whether answer is complete catalog prod canon 120 c.jpg Abiteboul-Segoufin-Vianu

  27. price>200 and no picture more products name Incomplete answer to Q4 • Provide known cameras • Explain incompleteness akai canon nikon sony Abiteboul-Segoufin-Vianu

  28. Completing answer to Q4 • It suffices to ask: product 0 name price cat picture =elec >200 sub=camera Abiteboul-Segoufin-Vianu

  29. Revisit the types • DTD • Conditions • Specialization: same element name may have several types • Not sufficient • Need to extend again the types: disjunctions product2b * namepricecatpicture =elec >200 subcategory !=camera Abiteboul-Segoufin-Vianu

  30. Query1’ Query2’ Disjunction vehicle vehicle engine data data vehicle ? sail engine data description ? &322 sail vehicle Empty! description data=“….” description=“….” Abiteboul-Segoufin-Vianu

  31. Disjunction continued • Type of &322 vehicle1 + vehicle2 vehicle1 vehicle2 engine data data sail description description The type of &322 can not be described independently of that of data below Abiteboul-Segoufin-Vianu

  32. Results

  33. Representation of information Set of possible worlds T rep(T) rep q q Set of possible answers q(rep(T)) = rep(q(T)) Representation of result q(T) rep Representation System:Lipski’s+Imielinski’s Abiteboul-Segoufin-Vianu

  34. Representation System for PS-queries • Incomplete tree T to represent q1-1(A1)  …  qk-1(Ak) • PS-query q • q(T) can be computed in ptime (representation of the answer can be computed in ptime) Abiteboul-Segoufin-Vianu

  35. Querying Incomplete Trees • Given T and a query q, one can • Give in ptime the sure answers up to our current knowledge • Check in ptime whether query q can be fully anwered • Generate in ptime queries to complete answer Abiteboul-Segoufin-Vianu

  36. Relational model Relational calculus/algebra Conditional table Closed or open world Representation system XML tree model Weaker language (no join) Weaker system (no variable) + Closedandopen World Representation system Comparison with IL Abiteboul-Segoufin-Vianu

  37. Drawback: exponential blowup • Incomplete information may become exponential w.r.t the sequence of query/answer q1/A1;q2/A2… database database qi: Type: 1 1 b b=i a a=i Answers are empty Abiteboul-Segoufin-Vianu

  38. Dealing with exponential blowup • Make the representation more complex using disjunctions of types • Size of representation stays polynomial • Manipulations much more complex • Restrict tree types and PS-queries • Already very/too? simple • Accept to loose some information • Ask extra queries to simplify representation Abiteboul-Segoufin-Vianu

  39. Discussion

  40. Discussion: extend language • Some results in paper • Extensions often lead to intractability • E.G. : K-pebble transducers [Milo,Suciu,Vianu] that somehow subsume XML-QL and XSL • No (known) representation system • Testing rep(T) is empty is non-elementary Abiteboul-Segoufin-Vianu

  41. Discussion : node Ids Without node Ids • much less information to integrate results • more complex • tedious case analysis Abiteboul-Segoufin-Vianu

  42. Discussion: ordering • Ordering in XML, DTD, queries • Problem is totally different and very complex • Example: • Q1/A1: list of males; Q2/A2: list of females; Q3: list all • Depending on the type of input • (Male)*(Female)* A3= A1 || A2 • (Male Female)* A3= shuffle(A1,A2) • (Male + Female)* we cannot answer A3 • Regular expression processing Abiteboul-Segoufin-Vianu

  43. Conclusion • Framework for acquiring, maintaining, querying incomplete XML data • Limitations: • simple queries • no order and Id assumption • small extensions lead to problems • Possible to represent the incompleteness • Possible to answer with incompleteness • Possible to obtain queries to provide full answer Abiteboul-Segoufin-Vianu

More Related