a transactional model for data warehouse maintenance
Download
Skip this Video
Download Presentation
A Transactional Model for Data Warehouse Maintenance

Loading in 2 Seconds...

play fullscreen
1 / 22

A Transactional Model for Data Warehouse Maintenance - PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on

A Transactional Model for Data Warehouse Maintenance. Authored by: Jun Chen, Songting Chen, Elke A. Rundensteiner Published in ER’2002, Finland D atabase S ystems R esearch G roup Worcester Polytechnic Institute. DWMS. Wrapper. Wrapper. Wrapper. Base. Base. Base.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Transactional Model for Data Warehouse Maintenance' - candra


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a transactional model for data warehouse maintenance
A Transactional Model for Data Warehouse Maintenance

Authored by:

Jun Chen, Songting Chen, Elke A. Rundensteiner

Published in ER’2002, Finland

Database Systems Research Group

Worcester Polytechnic Institute

data warehousing
DWMS

Wrapper

Wrapper

Wrapper

. . .

Base

Base

Base

Data Warehousing
  • Data Integration from Remote Base Sources
    • Difficult and Labor-Intensive
    • Better Do it only ONCE and Materialize the Results
    • Share Materialized Data by Many Applications

Data Warehouse

slide3
Data Warehouse Maintenance
  • Motivation: Keep Data Warehouse (DW) Update-to-Date
    • Base Changes over Time
      • Source Data Updates
        • insert, delete, update
      • Source Schema Changes
        • add, drop, rename
    • Basic Idea: Incremental instead of Re-computation
      • Re-computation may take weeks
slide4
General Maintenance Algorithms
  • View Maintenance (VM)
    • Incrementally incorporate source data updates
    • [BLT86], [GMS93], [ZGH+95], [SBC+00]
  • View Synchronization (VS)
    • Rewrite data warehouse view definition after the schema

of the source changed

    • [NLR98], [LNR02]
  • View Adaptation (VA)
    • Adapt view extent after the view definition changed
    • [NR99], [GMR+01]
slide5
Insert ( ‘Steve’, ‘Boston’)

DW Maintenance Example

CREATE VIEW Asia_Traveller AS

SELECT C.Name, C.Address,

F.FlightNo

FROM Customer C, FlightRes F

WHERE C.Name = F.Name

AND F.Dest = ‘Asia’;

View: Asia_Traveller

Name

Address

FlightNo

Dave

WPI

AA8384

Select FlightNo from FlightRes

where Name=‘Steve’

Customer

FlightRes

Name

Address

Name

Age

FlightNo

Dest

Dave

WPI

Dave

22

AA8384

Asia

Ellen

MA

Steve

22

UA77788

Europe

slide6
View: Asia_Traveller

Broken Query!

Name

Address

FlightNo

Dave

WPI

AA8384

3. Select FlightNo from FlightRes

where Name=‘Steve’

Customer

FlightRes

Name

Address

Name

Age

FlightNo

Dest

Dave

WPI

Dave

22

AA8384

Asia

Ellen

MA

Steve

22

UA77788

Europe

2. Rename (FlightRes, FlightReservation)

1. Insert ( ‘Steve’, ‘Boston’)

Maintenance Anomaly Problem

inside broken query
Inside Broken Query
  • Two Transactions
    • Base Update Transaction
      • w(Bi)c(Bi)
    • DW Maintenance Transaction
      • r(B1)r(B2)…r(Bn)w(DW)c(DW)
  • Read-write conflicts between two transactions
    • Two Independent Transactions
    • w(Bi) / r(Bi)
      • Data Update w(Bi): Incorrect Query Results [ZGH+95]
      • Schema Change w(Bi): Broken Query
a transactional approach
A Transactional Approach
  • A Global Transaction Model
    • DWMS_Transaction
      • Integrates both base update transaction and its corresponding DW maintenance transaction
      • w(Bi)c(Bi)r(B1)r(B2)…r(Bn)w(DW)c(DW)
  • Maintenance Anomaly
    • Rephrased to read-write conflicts of DWMS_Transactions
      • w(Bi)c(Bi)r(B1)r(B2)…r(Bj)…r(Bn)w(DW)c(DW)
      • w(Bj)c(Bj)r(B1)r(B2)…r(Bn)w(DW)c(DW)
serializability of dwms transaction
Serializability of DWMS_Transaction
  • Theorem
    • A history of DWMS_Transactions S is serializable

iff it is equivalent to some serial schedule S’ of the

same DWMS_Transactions.

  • Basis for Solving Anomaly Problems
    • To solve the anomaly problem, we need all DWMS_Transactions serializable.
traditional serializability algorithms
Traditional Serializability Algorithms
  • Lock-based
    • Reads / writes acquire locks for access to shared resources
    • Transactions block each other
  • Multiversion-based
    • Write on a version, read on another version
    • Transactions do not block each other
traditional serializability algorithms1
Traditional Serializability Algorithms
  • Lock-based
    • Read / write would need to lock data in sources?
    • Not desirable in DW environment
      • Data sources are autonomous
      • Not realistic to impose locking on them
  • Multiversion-based
    • Do not block each other
    • Desirable in DW environment
      • DW and data sources do not block each other
    • Need to maintain versions somewhere
txnwrap a multiversion algorithm
CREATE VIEW Asia_Traveller AS

SELECT C.Name, C.Address,

F.FlightNo

FROM Customer’ C,FlightRes’ F

WHERE C.Name = F.Name

AND F.Dest = ‘Asia’;

View: Asia_Traveller

Name

Address

FlightNo

Customer’

Dave

WPI

AA8384

FlightRes’

Wrapper

Meta Relation

Wrapper

Meta Relation

Name

Address

Rel

Attr

N.

A.

F.

D.

Rel

Attr

Dave

WPI

Cust’

Name

Fli’

Name

Customer

FlightRes

Ellen

MA

Cust’

Address

Name

Address

Name

Age

FlightNo

Dest

Dave

WPI

Dave

22

AA8384

Asia

Ellen

MA

Steve

22

UA77788

Europe

TxnWrap: A Multiversion Algorithm

CREATE VIEW Asia_Traveller AS

SELECT C.Name, C.Address,

F.FlightNo

FROM Customer C, FlightRes F

WHERE C.Name = F.Name

AND F.Dest = ‘Asia’;

versioned wrapper
Versioned Wrapper

Wrapper for Customer

Relation Customer’

Meta Relation

  • Semantics: life time of a tuple is
    • #born <= time < #dead
source updates on versioned wrapper
Name

Address

#born

#dead

Dave

WPI

0

Relation Customer’ (state 2 )

Ellen

MA

0

Name

Address

#born

#dead

Dave

WPI

0

1

Ellen

MA

0

Relation Customer’ (state 1 )

Stove

Boston

1

Name

Address

#born

#dead

Meta Relation (state 2 )

Rel

Attr

Rel’

Attr’

#born

#dead

Dave

WPI

0

1

C’

Name

-

-

0

Ellen

MA

0

C’

Addr.

-

-

0

2

Steve

Boston

1

Source Updates on Versioned Wrapper

Relation Customer’ (Init)

Transaction1:

1. DELETE FROM Customer C

WHERE C.Name = ‘Dave’;

2. INSERT (‘Steve’, ‘Boston’);

Transcation 2:

Drop Customer.Address;

dw maintenance query rewritten for versioned wrapper
Relation Customer’ (State 1 )

Name

Address

#born

#dead

Dave

WPI

0

1

Ellen

MA

0

Stove

Boston

1

DW Maintenance Query Rewritten for Versioned Wrapper

The maintenance query issued in Transaction2:

SELECT Name, Address

FROM Customer

WHERE condition;

Rewritten versioned maintenance query:

SELECT Name, Address

FROM Customer’

WHERE condition and

#born <= 2 and #dead > 2;

performance evaluation
Performance Evaluation
  • Implementation
    • In Java
    • Platform: Oracle, JDBC on Windows NT
    • Embedded in DyDa [CCZ+01] System at WPI
  • Testbed
    • 6 data sources with one relation each
    • Each relation has 4 attributes and 100,000 tuples
    • One materialized joined view over these data sources
    • TxnWrap VS. compensation (SWEEP [AAS+97] & DyDa)
related work
Related Work
  • View Maintenance
    • View Maintenance / Synchronization / Adaptation
  • Maintenance Anomaly
    • ECA [ZGH+95], SWEEP [AAS+97] handles only

concurrent data updates

      • Compensation-based
      • Performance degrades at a high load
  • Multi-version Algorithms
    • 2-version, n-version, unlimited-version algorithms [MPL92]
conclusions
Conclusions
  • Identify the Maintenance Anomaly Problem in mixed model environment
  • Design a global Transaction DWMS_Transaction model that integrates both source update transaction and maintenance transaction.
  • Rephrase the maintenance anomaly in terms of

serializability of DWMS_Transactions

  • Propose multiversion algorithm to achieve serializability
  • Implemented the maintenance solution in Dyda
  • Achieve stable performance under various workloads
other activities and future work
Other Activities and Future Work
  • Batching of updates into more complex maintenance plans
  • Parallelism of maintenance processes
  • Support more complex views, e.g., aggregation
  • Generalize to more change types
  • Provide alternate view synchronization algorithms
  • Discovery of changes by non-cooperating sources
  • Discovery of meta data in terms of source relationships of distributed sources
  • Move beyond relational middle-layer model
ad