1 / 36

Self Maintenance of materialized XML views with non-cooperative data sources

Self Maintenance of materialized XML views with non-cooperative data sources. DBDBD – 2006 Virginie Sans –ETIS/CNRS Laboratory– MIDI Team. Issue and context Pre-requisite The issue Context State of the art Contributions View computation with the XAlgebra

Download Presentation

Self Maintenance of materialized XML views with non-cooperative data sources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Self Maintenance of materialized XML views with non-cooperative data sources DBDBD – 2006 Virginie Sans –ETIS/CNRS Laboratory– MIDI Team

  2. Issue and context Pre-requisite The issue Context State of the art Contributions View computation with the XAlgebra Detection and Identification of source updates View maintenance Applications and performances Conclusion Summary

  3. Introduced by WiederHold The architecture mediator wrappers sources Query langague Mediation architecture 1.1 Pre-requisite

  4. Mediator Handle the user request: canonization, atomization Send atomic request to a source via its wrapper wrappers Translate query coming from the mediator into a query in the native langague of the web source Give the mediator an answer in XML Data sources heterogeneous distributed In a web context : Partially unavailable Mediation architecture Meditor Atomic request XML Wrapper SQL Tuples Source SQL 1.1 Pre-requisite

  5. What about views ? Data integration Access control, security Data-warehouses Why ? Interoperability Heterogeneous data Materializing views Fast access to complex query Better Availability Request optimization Views Mediator Materialized views Wrapper Wrapper Wrapper RDB SQL HTML 1.1 Pre-requisite

  6. Issue : View maintenance Maintenance process • Recomputation • Recompute the whole view from scratch When data sources are updated, the view consistency should be kept Maintenance View t+1 View t incremental Maintenance View computation Recomputation • Incremental maintenance • compute changes to view in response to changes to base sources Source t+1 Source t Update 1.2 Issue

  7. Context : semi-structured XML data <bib> <book> <price> 65.95 </price> <title> Advanced Programming in the Unix environment </title> </book> <book> <title> TCP/IP Illustrated </title> </book> <book> <price> 65.95 </price> <title> Advanced Programming in the Unix environment </title> </book> <book> <price>39.95</price> <title> Data on the Web </title> <title> Données sur le Web </title> </book> </bib> • XML views are materialized at the mediator level • Hierarchical data • No scheme, except the query scheme 1.3 Context

  8. Context : XQUERY Syntaxe FLWOR for $var in foret [$var in foret]* let $var:= sous-arbre Where condition Return result • XQuery • Dedicated to XML data • Relational operator (projection, select, join, union, …) • XML operator (tagging, unnesting, aggregation, ..) • FLWOR syntax …………(pronounced Flower !) <result> for $b in document("bib.xml")/bib/book let $a=$b/author where $b/price/text() < 60 Order by $b/year return <cheap_book> $b/title </cheap_book> </result> 1.3 Context

  9. Context : Other specificities • Views are computed using XAlgebra • Cf.View computation • Wrappers have limited resources • Few computation possibilities • A component named logger stores the last modification date and a checksum of sources • Non cooperative web sources • No information about their updates • Not always available • Not enough granularity 1.3 Context

  10. State of the art (1/2) • Relational views • Not fit for semi-structured data • Abiteboul and Al. • OEM (Object Embedded Model) • LOREL language • Some Operators are missing • VOX – Rainbow Team • Need to know the exact position in the XML Tree where the update has been done 1.4 State of the art

  11. State of the art (2/2) • Cobena and Al. • XDiff – an algorithm for XML files comparison • Need a copy of the source at the wrapper level • Bonnet and Al. /Papadimos and Al. • Parachute queries • A mutant query plan What about when sources are really unavailable ? Our goal : Reduce to the minimum sources access Use information that are stored in the view 1.4 State of the art

  12. View maintenance : The process • View computation • An algebraic approach using XAlgebra – Extension of the XAlgebra (identifiers) • Update detection • Comparison of the information of the source and those stored in the logger • Update identification • Recovering process • Diff Algorithm • View maintenance • Propagation rules for each operator 2.1 View computation

  13. View computation Steps : 2.1 View computation

  14. The XAlgebra data model • Operators : • XSource, XConstruct, XUnion, …. • Data structures : • XRelation, XTuple, XAttributes 2.1 View computation

  15. XSource Operator– Step 1 • XQuery analysis For $f in doc("informations.xml")/personnes/personne Let $a:=$f/nom Where $f/age<27 and $a="Durand" Return <nom>{$a}</nom> <prenom>{$f/prenom}</prenom> Path extraction : • Optional • Mandatory • Hidden We obtain : • A context • A set of patterns 2.1 View computation

  16. XSource Operator– Step 2 and 3 • From XML Sub-Trees to the tabular structure 1 Sub Tree => 1 Xtuple XRelation = set of XTuples 2.1 View computation

  17. XSource Operator– Extending the Algebra • adding identifiers : XTids An XTID is a set of pair : {(idsource, idfragment), …..} 2.1 View computation

  18. View computation - XOperator • XProject 2.1 View computation

  19. View computation - XOperator • XJoin XTids propagation : card (XTID)1for some nodes 2.1 View computation

  20. Update detection and Identification • Detection Comparison of the information of the source and those stored in the logger • The last modification date • The checksum of the source • Identification • Partial recovery of the source information based on Xtids • Comparison of the recovered XRelation with the updated source • Δ computation 2.2 Update detection and identification

  21. XRecover • Step 1 : Project XRv on XR1 patterns 2.2 Update detection and identification

  22. XRecover • Step 2 : filtering XTuples values 2.2 Update detection and identification

  23. XRecover • Step 3 : re-ordering XTuples XTidUnnest Xtuples are unnested depending on their XTids 2.2 Update detection and identification

  24. XRecover Step 3 : re-ordering Xtuples XTidnest Xtuples are nested by their Xtids Xtuples are re-ordered 2.2 Update detection and identification

  25. Update Identification – Comparison Algorithm • Comparison of XR1t+1 avec XRt’ • XR1t+1 is the XRelation obtained by applying Xsource to source 1 at t+1 • XRt’ is the partial recovery of Xrelation of source 1 at t Remark :XR1t+1 can also be filtered using predicates before comparison The Diff algorithm is based on Unix Diff (Hunt & McIllroy). The symbol is the Xtuple instead of being the line 2.2 Update detection and identification

  26. Update identification – Diff algorithm • Delta with hunks : • Insert(pos; Xtuple) • delete(pos;Xtuple) • Replace(pos; Xtupleold, Xtuplenew) Insert(2,{Leclerc,Avide,{(1,3)}} {John,Avide,{(1,3)}} } Delete(4,{Durand,Avide,{(1,11)}}, {Marcel,Avide,{(1,11)}} {Eric,Avide,{(1,11)}}} Etc… 2.2 Update detection and identification

  27. Maintenance RulesFrom Delta to view maintenance • Case of a deletion - delete(pos, xtuple) An Xtuple is associated to an Xtid {(x)} such that card=1, Each Xvalue of the view have xtids noted XTID 1) We delete from Xvalues each pair of the Xtid such that x  XTID Example : The XTuple where xtid is x=1,3 has been deleted The Xvalue {Alain}1,3;1,4 becomes XValeur {Alain}1,4 2)We delete each Xvalues such that card(XTID)=0 If XValue {Alain}1,3 become XValeur {Alain} We delete entirely the XValue 3) If the Xvalue was concenned by the predicate, we delete the XTuple • Join and restriction case 2.3 View maintenance

  28. Maintenance RulesFrom Delta to view maintenance • Case of an insertion - insert(pos; xtuple) 1) A new Xtid is created Goal : preserved Xtuples order for a later recovery 2) Depending on the operator; we obtain various maintenance instructions Projection: insert of the projection of the xtuple Select : xtuple satisfies the predicat  insertion Join XR1* XR2, computation of XT= xtuple * XR2. If XT    insertion of XT Union and Intersect: we keep the conservation des doublons  Union  Select where the predicate is always true  Intersect  join Depending on the predicate, we can request either XR2 or its recovery 2.3 View maintenance

  29. Maintenance RulesFrom Delta to view maintenance • Case of a modification- Replace(pos; Xtupleold, Xtuplenew) Xtuple modification = Xvalue modification OR Xvalues deletion followed by insertion Project and Union: modification of the concerned XValues Select and Intersect: If modification is applied an Xvalue that must verify the condition, • deletion of the Xtuple Else modification of the XValues Intersect select. Join deletion followed by insertion. 2.3 View maintenance

  30. Maintenance RulesFrom Delta to view maintenance 2.3 View maintenance

  31. Mediator Materialized views Wrapper Wrapper HTML SQL Maintenance rulesMissing Information • Missing Information (join ?) • Source Recovery • Multi-view strategy • Source request Goal : limited acces to the sources !!!! Example : View= S1*S2 Insertio : x * S2’ Computation of S2’ xtuple x is inserted in S1 2.3 View maintenance

  32. Applications • On the web When necessary sources are unavailable Goal : Limited access to them • With sensors (ANR Project ) With sensors that have no wire Goal: Preserve power ressources 2.4 Applications and performances

  33. Performances • Comparison between XRecover and Recomputation 2.4 Applications and performances

  34. Performances • Comparison between XRecover and Recomputation 2.4 Applications and performances

  35. Contributions • Maintenance process in the context of non-cooperative web sources • Contribution to the XAlgebra • New operators : XRecover, XTidUnnest, XTidNest • New data structure : XTids • Futur work • Order sensitive view maintenance • A better Diff algorithm Conclusion

  36. Thanks for you attention !Any questions ?

More Related