1 / 22

A Transactional Model for Data Warehouse Maintenance

A Transactional Model for Data Warehouse Maintenance. Authored by: Jun Chen, Songting Chen, Elke A. Rundensteiner Published in ER’2002, Finland D atabase S ystems R esearch G roup Worcester Polytechnic Institute. DWMS. Wrapper. Wrapper. Wrapper. Base. Base. Base.

candra
Download Presentation

A Transactional Model for Data Warehouse Maintenance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Transactional Model for Data Warehouse Maintenance Authored by: Jun Chen, Songting Chen, Elke A. Rundensteiner Published in ER’2002, Finland Database Systems Research Group Worcester Polytechnic Institute

  2. DWMS Wrapper Wrapper Wrapper . . . Base Base Base Data Warehousing • Data Integration from Remote Base Sources • Difficult and Labor-Intensive • Better Do it only ONCE and Materialize the Results • Share Materialized Data by Many Applications Data Warehouse

  3. Data Warehouse Maintenance • Motivation: Keep Data Warehouse (DW) Update-to-Date • Base Changes over Time • Source Data Updates • insert, delete, update • Source Schema Changes • add, drop, rename • Basic Idea: Incremental instead of Re-computation • Re-computation may take weeks

  4. General Maintenance Algorithms • View Maintenance (VM) • Incrementally incorporate source data updates • [BLT86], [GMS93], [ZGH+95], [SBC+00] • View Synchronization (VS) • Rewrite data warehouse view definition after the schema of the source changed • [NLR98], [LNR02] • View Adaptation (VA) • Adapt view extent after the view definition changed • [NR99], [GMR+01]

  5. Insert ( ‘Steve’, ‘Boston’) DW Maintenance Example CREATE VIEW Asia_Traveller AS SELECT C.Name, C.Address, F.FlightNo FROM Customer C, FlightRes F WHERE C.Name = F.Name AND F.Dest = ‘Asia’; View: Asia_Traveller Name Address FlightNo Dave WPI AA8384 Select FlightNo from FlightRes where Name=‘Steve’ Customer FlightRes Name Address Name Age FlightNo Dest Dave WPI Dave 22 AA8384 Asia Ellen MA Steve 22 UA77788 Europe

  6. View: Asia_Traveller Broken Query! Name Address FlightNo Dave WPI AA8384 3. Select FlightNo from FlightRes where Name=‘Steve’ Customer FlightRes Name Address Name Age FlightNo Dest Dave WPI Dave 22 AA8384 Asia Ellen MA Steve 22 UA77788 Europe 2. Rename (FlightRes, FlightReservation) 1. Insert ( ‘Steve’, ‘Boston’) Maintenance Anomaly Problem

  7. Inside Broken Query • Two Transactions • Base Update Transaction • w(Bi)c(Bi) • DW Maintenance Transaction • r(B1)r(B2)…r(Bn)w(DW)c(DW) • Read-write conflicts between two transactions • Two Independent Transactions • w(Bi) / r(Bi) • Data Update w(Bi): Incorrect Query Results [ZGH+95] • Schema Change w(Bi): Broken Query

  8. A Transactional Approach • A Global Transaction Model • DWMS_Transaction • Integrates both base update transaction and its corresponding DW maintenance transaction • w(Bi)c(Bi)r(B1)r(B2)…r(Bn)w(DW)c(DW) • Maintenance Anomaly • Rephrased to read-write conflicts of DWMS_Transactions • w(Bi)c(Bi)r(B1)r(B2)…r(Bj)…r(Bn)w(DW)c(DW) • w(Bj)c(Bj)r(B1)r(B2)…r(Bn)w(DW)c(DW)

  9. Serializability of DWMS_Transaction • Theorem • A history of DWMS_Transactions S is serializable iff it is equivalent to some serial schedule S’ of the same DWMS_Transactions. • Basis for Solving Anomaly Problems • To solve the anomaly problem, we need all DWMS_Transactions serializable.

  10. Traditional Serializability Algorithms • Lock-based • Reads / writes acquire locks for access to shared resources • Transactions block each other • Multiversion-based • Write on a version, read on another version • Transactions do not block each other

  11. Traditional Serializability Algorithms • Lock-based • Read / write would need to lock data in sources? • Not desirable in DW environment • Data sources are autonomous • Not realistic to impose locking on them • Multiversion-based • Do not block each other • Desirable in DW environment • DW and data sources do not block each other • Need to maintain versions somewhere

  12. CREATE VIEW Asia_Traveller AS SELECT C.Name, C.Address, F.FlightNo FROM Customer’ C,FlightRes’ F WHERE C.Name = F.Name AND F.Dest = ‘Asia’; View: Asia_Traveller Name Address FlightNo Customer’ Dave WPI AA8384 FlightRes’ Wrapper Meta Relation Wrapper Meta Relation Name Address Rel Attr N. A. F. D. Rel Attr Dave WPI Cust’ Name … … … … Fli’ Name Customer FlightRes Ellen MA Cust’ Address … … … … … … Name Address Name Age FlightNo Dest Dave WPI Dave 22 AA8384 Asia Ellen MA Steve 22 UA77788 Europe TxnWrap: A Multiversion Algorithm CREATE VIEW Asia_Traveller AS SELECT C.Name, C.Address, F.FlightNo FROM Customer C, FlightRes F WHERE C.Name = F.Name AND F.Dest = ‘Asia’;

  13. Versioned Wrapper Wrapper for Customer Relation Customer’ Meta Relation • Semantics: life time of a tuple is • #born <= time < #dead

  14. Name Address #born #dead Dave WPI 0  Relation Customer’ (state 2 ) Ellen MA 0  Name Address #born #dead Dave WPI 0 1 Ellen MA 0  Relation Customer’ (state 1 ) Stove Boston 1  Name Address #born #dead Meta Relation (state 2 ) Rel Attr Rel’ Attr’ #born #dead Dave WPI 0 1 C’ Name - - 0  Ellen MA 0  C’ Addr. - - 0 2 Steve Boston 1  Source Updates on Versioned Wrapper Relation Customer’ (Init) Transaction1: 1. DELETE FROM Customer C WHERE C.Name = ‘Dave’; 2. INSERT (‘Steve’, ‘Boston’); Transcation 2: Drop Customer.Address;

  15. Relation Customer’ (State 1 ) Name Address #born #dead Dave WPI 0 1 Ellen MA 0  Stove Boston 1  DW Maintenance Query Rewritten for Versioned Wrapper The maintenance query issued in Transaction2: SELECT Name, Address FROM Customer WHERE condition; Rewritten versioned maintenance query: SELECT Name, Address FROM Customer’ WHERE condition and #born <= 2 and #dead > 2;

  16. Performance Evaluation • Implementation • In Java • Platform: Oracle, JDBC on Windows NT • Embedded in DyDa [CCZ+01] System at WPI • Testbed • 6 data sources with one relation each • Each relation has 4 attributes and 100,000 tuples • One materialized joined view over these data sources • TxnWrap VS. compensation (SWEEP [AAS+97] & DyDa)

  17. Data Update Processing

  18. Schema Change Processing

  19. Related Work • View Maintenance • View Maintenance / Synchronization / Adaptation • Maintenance Anomaly • ECA [ZGH+95], SWEEP [AAS+97] handles only concurrent data updates • Compensation-based • Performance degrades at a high load • Multi-version Algorithms • 2-version, n-version, unlimited-version algorithms [MPL92]

  20. Conclusions • Identify the Maintenance Anomaly Problem in mixed model environment • Design a global Transaction DWMS_Transaction model that integrates both source update transaction and maintenance transaction. • Rephrase the maintenance anomaly in terms of serializability of DWMS_Transactions • Propose multiversion algorithm to achieve serializability • Implemented the maintenance solution in Dyda • Achieve stable performance under various workloads

  21. Other Activities and Future Work • Batching of updates into more complex maintenance plans • Parallelism of maintenance processes • Support more complex views, e.g., aggregation • Generalize to more change types • Provide alternate view synchronization algorithms • Discovery of changes by non-cooperating sources • Discovery of meta data in terms of source relationships of distributed sources • Move beyond relational middle-layer model

  22. Questions?

More Related